Tutorial: Use Pretrained Model to Predict Generic Property of Molecules

OpenEye Model Building is a tool to build machine learning models that predicts molecular properties.

In this tutorial, we will use a trained fully connected neural network model to predict physical property of molecules. The floe predicts the property of each molecule and explans it. It also generate confidence intervals and other info. The floe report provides a summary of the output.

This tutorial uses the following Floe:

  • Physical Property Prediction for Small Molecule using Machine Learning.

It is suggested to read the previous tutorial to learn how to build a machine learning model to use for prediction.

Create a Tutorial Project

Note

If you have already created a Tutorial project you can re-use the existing one.

Log into Orion and click the home button at the top of the blue ribbon on the left of the Orion Interface. Then click on the ‘Create New Project’ button and in the pop up window enter Tutorial for the name of the project and click ‘Save’.

create_project_ui

Orion home page

Floe Input

The inputs required are:

  • Molecule Dataset (P_1)

  • Tensorflow Machine Learning Model Dataset (M_1)

The P_1 dataset contains several OERecord (s). The floe expects an OEMol from each record. This is the molecules the model will predict physical property of. Let this dataset be P_1.

Here is a sample record from the dataset:

OERecord (
    Molecule(Chem.Mol) : c1ccc(c(c1)NC(=O)N)OC[C@H](CN2CCC3(CC2)Cc4cc(ccc4O3)Cl)O
)

There can be another float field containing the physical property values to validate against. The following dataset does not contain said field.

The M_1 dataset contains one or more machine learning models. To learn how to generate these models, read the previous tutorial on building models in Orion. We attach a sample trained Tensorflow Model to predict Solubility. In addition, you can have a Tensorflow probability model for domain of application(DOA) and error bar analysis. Here are the sample trained model.

Run Generic Property Prediction Floe

  • Click on the ‘Floes’ button in the left menu bar

  • Click on the ‘Floes’ tab

  • Under the ‘Categories’ tab select ‘OpenEye Model Building’ package

  • In the search bar enter ML Predict

  • A list of Floes will now be visible to the right

  • Launch the floe ML Predict: Regression using Fingerprints for Small Molecules and a Job Form will pop up. Specify the following parameter settings in the Job Form.

select_db
select_db
  • For the Input Tensorflow Model, choose M_1 from above

  • For the Input Tensorflow Probability Model, choose T_1 from above

  • For the Input Small Molecule Dataset, choose P_1 from above

  • All the molecules predicted will be saved to name in the field Output Property. Change it to something recognizable.

select_db
  • In case your M_1 contains more than one trained model, use the Which Model To Use To Predict field to identify which model you would like to predict the properties of your molecules. More information to select this model is in the Howto-Guide: Use built machine learning model for property prediction/verification of unseen molecules.

  • Additionally if your small molecule dataset has a field containing the predicted value (sourced from elsewhere), you can turn the validation option on to get a statistical comparison between this value and the prediction.

That’s it ! Things should run, generate an output and a floe report

Analyze OEModel Floe Report

Here is a sample image of how the floe report should look (Assuming you ran a fully connected neural network model like M_1):

select_db

The top part contains the hyperparameters on which the model was trained. Then we have histogram of the output prediction, and the confidence with each prediction. We also have a plot for confidence of prediction versus the actual output. These overall statistics help analyze the input molecules predicted.

select_db

We also have outlier prediction using:

Lastly, there is a link to a page under Interesting Molecules and it shows the annotated ../images that explain the outlier and central molecules.

Analyze Output

  • Go to the data section of Orion and Activate the data the floe produced. This should have the same name you chose for the Output Prediction field of your floe. The data can be activated by clicking on the small plus sign in a circle right next to it.

select_db
  • Now going to the analyze page in Orion, you should be able to see the molecules, their predicted pyrrolamide values, and the explanation of the output.

The output columns and their explanations are:

  • Confidence: How confident the Model is with its prediction on a scale of 0-1

  • Contributions: Explanation of prediction based on a local model. If the image has a dark background, it means there is an error or warning issued. Based on the choice of molecule explainer (Fragment by default), different parts will be color annotated with blue denoting more contribution towards the physical property (solubility) while red denotes the opposite.

  • HighestTaniSimilarity: what is the highest 2d Tanimoto similarity with any molecule in the training set,

  • HighestTaniSimilarityProperty: what is the NegIC50 of that training-set molecule on record

    • These two fields basically tell us if there is a similar molecule in the training set and if so, what is its physical property value

  • Scope: if there is an error or warning, what caused the issue

  • Class Predict(Physical Property): Predicts Property as High, Medium or Low. Background color suggests how confident the model is with green (most confidence), yellow(average confidence), and red(less confidence/out of scope)

  • Prediction(Physical Property): Physical property prediction of the molecule

select_db

Note

We assign IDs (#) on each record molecule. This follows a linear ordering over all molecule. So if you activate both the successful and failure predictions, and sort them based on #, the order should be same as input.