How to Use Built Machine Learning Models for Property Prediction and Verification of Unseen Molecules

OpenEye Machine Learning Floes build machine learning models that predict physical properties of small molecules. In this guide, a previously built and trained fully connected neural network model will be used to predict molecular solubility.

First, find the ML Predict: Use Fingerprint based Pretrained Regression Models to Predict Properties of Molecules Floe on the Floe page and select it so the floe information appears in the right-hand window. Next, click on the “Analyze Enable” bar chart icon on the upper right.

Analyze Enable — **Figure 1.** How to analyze enable a floe.

Let’s select a previously built neural network model for property prediction. This model has to be built using the ML Build: Regression Model with Tuner using Fingerprints for Small Molecules Floe. Using the table and model analysis from the Floe Report, a well-fitted model can be chosen (refer to the previous guide on model optimization). For this guide, assume that the second model is best for our needs and note the record number, 29 in this case.

Find Model — **Figure 2.** List of generated models shown in the Floe Report.

Go to the Jobs tab on the Floe page and click on the job. This will take you to the Floe Report, shown in Figure 3. Next, you can choose “View in Project Data” for any of the results files. This will take you to the Data page, where you can activate the output dataset, as shown in Figure 4.

Once active, models can be found in the Analyze page. Right-click on the required model ID, that is, 29, and select “Send to Workfloe.”

This selected model will be sent to the property prediction floe which has already activated.

Next, add the small molecule dataset with a property to be predicted.

Note

Sometimes the model is sent to the wrong input (small molecule) instead of the Tensorflow model input. Make sure you have model ID 29 from the Analyze page in the TensorFlow model input and a dataset of small molecules in the first input.

Input Data

Pyrrolamides dataset

If the molecule dataset is used as validation, and already has said properties precalculated, then select the appropriate column in the Validation Field parameter and the floe will produce R2 and other measures between the prediction and the baseline.

That’s it! Click “Start Job” to run the floe.

Note

The output floe report will look very similar to this Floe Report and Analysis.

Library Details of the Floe

F_nn built on Tensorflow Package
Molecule explanation built on Lime
Domain of application built on Tensorflow Probability