How to Use Built Machine Learning Models for Property Prediction and Verification of Unseen Molecules

OpenEye Machine Learning Floes build machine learning models that predict physical properties of small molecules. In this guide, a previously built and trained fully connected neural network model will be used to predict molecular solubility.

First, find the ML Predict: Use Fingerprint based Pretrained Regression Models to Predict Properties of Molecules Floe on the Floe page and select it so the floe information appears in the right-hand window. Next, click on the “Analyze Enable” bar chart icon on the upper right.

Analyze Enable

Figure 1. How to analyze enable a floe.

Let’s select a previously built neural network model for property prediction. This model has to be built using the ML Build: Regression Model with Tuner using Fingerprints for Small Molecules Floe. Using the table and model analysis from the Floe Report, a well-fitted model can be chosen (refer to the previous guide on model optimization). For this guide, assume that the second model is best for our needs and note the record number, 29 in this case.

Find Model

Figure 2. List of generated models shown in the Floe Report.

Go to the Jobs tab on the Floe page and click on the job. This will take you to the Floe Report, shown in Figure 3. Next, you can choose “View in Project Data” for any of the results files. This will take you to the Data page, where you can activate the output dataset, as shown in Figure 4.

Find Model

Figure 3. The Floe Report.

Find Model

Figure 4. How to activate a dataset on the Data page.

Once active, models can be found in the Analyze page. Right-click on the required model ID, that is, 29, and select “Send to Workfloe.”

This selected model will be sent to the property prediction floe which has already activated.

Find Model

Figure 5. Send the model to a workfloe.

Find Model

Figure 6. Select an analyze-enabled floe.

Next, add the small molecule dataset with a property to be predicted.

Note

Sometimes the model is sent to the wrong input (small molecule) instead of the Tensorflow model input. Make sure you have model ID 29 from the Analyze page in the TensorFlow model input and a dataset of small molecules in the first input.

Find Model

Figure 7. Input parameters on the Job Form.

If the molecule dataset is used as validation, and already has said properties precalculated, then select the appropriate column in the Validation Field parameter and the floe will produce R2 and other measures between the prediction and the baseline.

Find Model

Figure 8. Explanation and validation parameters.

That’s it! Click “Start Job” to run the floe.

Note

The output floe report will look very similar to this Floe Report and Analysis.

Library Details of the Floe