ML Predict: Regression using Fingerprints for Small Molecules

This floe predict properties of small, drug-like molecules using a pretrained ML model.

It runs a TensorFlow-based fully connected neural network regression model for prediction. This user-provided model can be generated using the ML Build: Regression Model with Tuner using Fingerprints for Small Molecules Floe.

The floe uses a convex box approach for domain of application prediction. The input TensorFlow dataset also contains a model agnostic system to explain the predictions on the molecule.

The user can also provide an optional TensorFlow-based probabilistic fully connected neural network for better error bar prediction. All models run on 2D fingerprints.

This floe is very cheap and quick. It costs a few cents for a property prediction of 10 molecules.

Outputs:

Failure Data: The molecule (a) is too large or too small, or (b) has an unknown atom.

No Confidence Data: The molecule’s property falls out of scope of the training set. In this case, the model predicts with no guarantees. The explainer image has a red background.

Success Data: The molecule falls (a) within scope and the explainer has a green background or (b) at the edge of scope and the explainer has a yellow background.

Molecules outside the scope of the training set will be sent to the “No Confidence” port, as a prediction cannot be considered reliable. Specifically, the scope is defined as a range in molecular weight, atom count, polar surface area, and calculated logP from the training set molecules. These ranges are reported in the Floe Report.

Inputs
Name	Description	Type
Input Small Molecule(s) Dataset to Predict Property of	The dataset(s) to read records from.	Molecule Dataset
Input TensorFlow Model	Machine Learning model to predict property.	Machine Learning TensorFlow Model Dataset
Input TensorFlow Probability Model	The dataset(s) to read records from.	Machine Learning TensorFlow Probability Model Dataset

Machine Learning Model Options
Name	Description	Type
Model ID of which TensorFlow Model to Use to Predict	Which model to select. Make sure this matches the input model ID.	Int
Model ID of which TensorFlow Probability (TFP) Model to Use to Predict	Which model to select. Make sure this matches the model ID.	Int
Preprocess Molecule	For every molecule, stores only largest component, adjusts ionization to neutral pH.	Bool
Apply Blockbuster Filter	Apply Blockbuster filter.	Bool

Explanation and Validation
Name	Description	Type
Property Validation Field	If the dataset has a baseline, the floe reports a comparison between predictions in the Floe Report.	Float
Molecule Explainer Type	Select explainer visualization. Atom: annotate atoms only Fragment: annotate fragments Combined: annotate both	List

Outputs
Name	Description	Type
Output Property	Output dataset to write to.	Dataset
Failure Property	Output dataset to write to.	Dataset