Solubility Prediction for Small Molecule using ML and Cheminfo Fingerprints

A Floe that predicts solubility of small, drug-like molecules in log uM. It runs a Tensorflow-based fully-connected neural network regression model for prediction. Uses ConvexBox-approach and Tensorflow-based probabilistic fully-connected neural network for domain of application and error bar prediction. Both these Tensorflow models have been trained on 2D Fingerprints.

Finally, it uses LIME, a model agnostic system to explain the solubility of the molecule(s). The Floe is cheap and quick adding about 1.5 seconds for property prediction of 10 molecules.

Outputs:

Failure Dataset : (a) Molecule is too large or too small. or, (b) Molecule has an atom not encountered in the training set.

No Confidence Dataset: Molecule’s deemed out of scope compared to the training set (details below). In this case, the model predictions are unreliable. Explainer image has a red background.

Success Dataset: (a) Falls within scope; explainer has green background. (b) Falls at the edge of scope; explainer has yellow background.

Molecules outside the scope of the training set will be sent to the ‘No Confidence’ port, as a prediction is unreliable. Specifically, the scope is defined as a range in molecular weight, atom count, polar surface area, and calculated logP from the training set molecules. These ranges are reported in the Floe report.

Inputs

Name

Description

Type

Input Small Molecule(s) Dataset
to predict property of

The dataset(s) to read records from

Molecule Dataset

Explanation and Validation

Name

Description

Type

Molecule Explainer Type

Select explainer visualisation.
Atom: annotate atoms only,
Fragment: Annotate Fragments,
Combined: Annotate Both

List

Property Validation Field

If the dataset has a baseline, the floe reports
a comparison between prediction in Floereport

Float

Outputs

Name

Description

Type

Output Solubility

Output dataset to write to

Dataset

No-confidence Solubility

Output dataset to write to

Dataset

Failed Dataset Name

Output dataset to write to

Dataset