Solubility Prediction for Small Molecule using Machine Learning and Cheminfo Fingerprints

Solubility Prediction for Small Molecule using Machine Learning and Cheminfo Fingerprints is a floe that predicts solubility in log uM. It runs a Tensorflow based Fully-connected Neural Network Regression Model for prediction. Uses Tensorflow based probabilistic Fully-connected Neural Network for Domain of Application and Error Bar prediction. Finally, it uses Lime a model agnostic system to explain the predictions on the molecule. Very cheap and quick. Takes about a cent for property prediction of 50 molecules.

  • The machine learning model to predict Solubility has been trained on the ChEMBL30 dataset.

  • ChEMBL30 data is from http://www.ebi.ac.uk/chembl - the version of ChEMBL is ChEMBL_030.

  • Post processing, the training set contains 30,000 molecules. For classification of solubility into Low, Medium and High we have used the following standard from the USP Solubility chart.

Solubility Value

Gradation

Value in loguM

High

>=3

Med

>=1.51 and <3

Low

<1.51

Inputs

Name

Description

Type

Input Small Molecule(s) Dataset
to predict property of

The dataset(s) to read records from

Molecule Dataset

Explanation and Validation

Name

Description

Type

Molecule Explainer Type

Select explainer visualisation.
Atom: annotate atoms only,
Fragment: Annotate Fragments,
Combined: Annotate Both

List

Property Validation Field

If the dataset has a baseline, the floe reports
a comparison between prediction in Floereport

Float

Outputs

Name

Description

Type

Output Property

Output dataset to write to

Dataset

Failure Property

Output dataset to write to

Dataset