ML Predict: Classification using Fingerprints for Small Molecules

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Hit to Lead/Properties

  • Task-based/ADME & Tox Assessment

  • Solution-based/Hit to Lead/Properties/Model Building

Description

This floe predicts properties of small, drug-like molecules using a pretrained machine learning model. Predictions are based on discreet string classes that the model was trained on.

It runs a TensorFlow-based fully connected neural network regression model for prediction. This user-provided model can be generated using the ML Build: Classification Model with Tuner using Fingerprints for Small Molecules Floe.

The floe uses convex box and Monte Carlo approaches for domain of application predictions. The TensorFlow input dataset also contains a model agnostic system to explain the predictions on the molecule.

It is very cheap and quick, taking only a few cents for the property prediction of 50 molecules.

Outputs:

  • Failure Data: The molecule (a) is too large or too small or (b) has an unknown atom.

  • No confidence Data: The molecule’s property falls out of scope of the training set. In this case, the model predicts with no guarantees. The explainer image has a red background.

  • Success Data: The molecule falls (a) within scope and the explainer has green background or (b) at the edge of scope and the explainer has a yellow background.

Molecules outside the scope of the training set will be sent to the “No Confidence” port, as a prediction cannot be considered reliable. Specifically, the scope is defined as a range in molecular weight, atom count, polar surface area, and calculated logP from the training set molecules. These ranges are given in the Floe Report. If the trained model had the Preprocess Molecule parameter On, it is recommended to set it to On here. More details on how the floe operates can be found in this tutorial and the How-to Guide to Analyze the Machine Learning Predictions .

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Small Molecule(s) Dataset to predict property of (in): The dataset(s) to read records from

  • Required

  • Type: data_source

Input tensorflow Model (tfm): Machine learning model to predict property.

  • Required

  • Type: data_source

Outputs

Output Property Prediction for Fingerprint based Classification (out): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: Successful Fingerprint ML Classification Prediction

Failed Property (failed_out): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: Failed Fingerprint ML Classification Prediction

No-confidence Fingerprint ML Classification Prediction (noconf): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: No-confidence Fingerprint ML Classification Prediction

Machine Learning Model Options

Model ID of which Tensorflow model to use to predict. (tfmid): Which model to select. Make sure this matches with input Model ID

  • Required

  • Type: integer

Preprocess Molecule (Preprocess Molecule): For every molecule, stores only largest component, adjusts ionization to neutral pH, rejects molecules that fail typecheck

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Apply Blockbuster filter (Blockbuster Filter): Accept or reject molecules based on closeness to Blockbuster molecule properties. For details check toolkit oemolprop.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Explanation and Validation

Property Validation Field (val): If the input small molecule(s) dataset has a baseline of the physical property, the Floereport provides a comparison between model prediction and baseline. Make sure it is in the exact same unit as the prediction.

  • Type: field_parameter::string

Molecule Explainer Type (molecule_explainer_type): Select explainer visualisation. Atom: annotate atoms only, Fragment: Annotate Fragments, Combined: Annotate Both

  • Type: string

  • Default: Atom

  • Choices: [‘Combined’, ‘Fragment’, ‘Atom’]