ML Predict: Regression using Feature Input
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Hit to Lead/Properties
Task-based/ADME & Tox Assessment
Solution-based/Hit to Lead/Properties/Model Building
Description
This floe predicts the properties of small, drug-like molecules using a pretrained machine learning model.
It runs a TensorFlow-based fully connected neural network regression model for prediction. This user-provided model can be generated using the ML Build: Regression Model with Tuner using Feature Input Floe. Every molecule need user-provided features as float vectors for inputs.
The floe uses a convex box approach for domain of application predictions. The TensorFlow input dataset also contains a model agnostic system, to explain the predictions on the molecule.
It is very cheap and quick, taking only a few cents for the property prediction of 50 molecules.
Outputs:
Failure Data: The molecule (a) is too large or too small or (b) has an unknown atom.
No confidence Data: The molecule’s property falls out of scope of training set. In this case, the model predicts with no guarantees. The explainer image has a red background.
Success Data: The molecule falls (a) within scope and the explainer has a green background or (b) at the edge of scope and the explainer has a yellow background.
Molecules outside the scope of the training set will be sent to the “No Confidence” port, as a prediction cannot be considered reliable. Specifically, the scope is defined as a range in molecular weight, atom count, polar surface area, and calculated logP from the training set molecules. These ranges are reported in the Floe Report. More details on how the floe operates can be found in this tutorial and in the How-to Guide to Analyze the Machine Learning Predictions .
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Small Molecule(s) Dataset to predict property of. Needs to have a feature vector (floatvec) matching what machine learning model was built on. (in): The dataset(s) to read records from
Required
Type: data_source
Input tensorflow Model (tfm): Machine learning model to predict property.
Required
Type: data_source
Outputs
Output Property for Feature based Regression (out): Output dataset to which to write.
Required
Type: dataset_out
Default: Successful Feature ML Regression Prediction
Failed Property for Feature based Regression (failed_out): Output dataset to which to write.
Required
Type: dataset_out
Default: Failed Feature ML Regression Prediction
No-confidence Property for Feature based Regression (noconf): Output dataset to which to write.
Required
Type: dataset_out
Default: No-confidence Feature ML Regression Prediction
Machine Learning Model Options
Model ID of which Tensorflow model to use to predict. (tfmid): Which model to select. Make sure this matches with input Model ID
Required
Type: integer
Preprocess Molecule (Preprocess Molecule): For every molecule, stores only largest component, adjusts ionization to neutral pH, rejects molecules that fail typecheck
Type: boolean
Default: True
Choices: [True, False]
Apply Blockbuster filter (Blockbuster Filter): Accept or reject molecules based on closeness to Blockbuster molecule properties. For details check toolkit oemolprop.
Type: boolean
Default: False
Choices: [True, False]
Number of features to explain (Top Feature Count): Number of top features to provide LIME explanations for
Required
Type: integer
Default: 5
Explanation and Validation
Custom Feature (custom_feature): Mandatory: Field containing feature vector. Must match feature input used to train model
Required
Type: field_parameter
Property Validation Field (val_r): If the input small molecule(s) dataset has a baseline of the physical property, the Floereport provides a comparison between model prediction and baseline. Make sure it is in the exact same unit as the prediction.
Type: field_parameter::float