ML Predict: Regression Using Feature Input

This is a floe that predicts the properties of small, drug-like molecules using a pretrained machine learning (ML) model.

It runs a TensorFlow-based fully-connected neural network regression model for prediction. This model needs to be provided by the user and can be generated using the ML Build: Regression Using Feature Input Floe. Every molecule needs user-provided features and float vectors as inputs.

The floe uses a convex box approach for domain of application predictions. The input TensorFlow dataset also contains a model agnostic system to explain the predictions on the molecule.

This floe runs quickly and is very inexpensive, costing about one cent for a property prediction of 50 molecules.

Outputs:

Failure Data: (a) The molecule is too large or too small, or (b) the molecule has an unknown atom.

No confidence Data: The molecule’s property falls out of scope of the training set. In this case, the model predicts with no guarantees. The explainer image has a red background.

Success Data: (a) The molecule falls within scope; the explainer has a green background, or (b) it falls at the edge of scope; the explainer has a yellow background.

Molecules outside the scope of the training set will be sent to the “No Confidence” port, as a prediction cannot be considered reliable. Specifically, the scope is defined as a range in molecular weight, atom count, polar surface area, and calculated logP from the training set molecules. These ranges are given in the Floe Report.

Inputs

Name

Description

Type

Input Small Molecule Dataset(s)
to Predict Property of

The dataset(s) to read records from.

Molecule Dataset

Input TensorFlow Model

Machine learning model to predict property.

Machine Learning TensorFlow Model Dataset

Machine Learning Model Options

Name

Description

Type

Model ID of TensorFlow Model to Use to Predict*

Which model to select. Make sure this matches input model ID.

Int

Preprocess Molecule

For every molecule, stores only largest component and adjusts ionization to neutral pH.

Bool

Apply Blockbuster Filter

Apply blockbuster filter.

Bool

Number of Features to Explain

Number of top features to provide results for LIME explanations.

Int

Explanation and Validation

Name

Description

Type

Property Validation Field

If the dataset has a baseline, the floe provides
a comparison between predictions in the Floe Report.

Float

Custom Feature

Field containing feature vector to train model on.

FloatVec

Outputs

Name

Description

Type

Output Property

Output dataset to write to.

Dataset

Failure Property

Output dataset to write to.

Dataset