Solubility Prediction for Small Molecule using Machine Learning and Cheminfo Fingerprints¶
Solubility Prediction for Small Molecule using Machine Learning and Cheminfo Fingerprints is a floe that predicts solubility in log uM. It runs a Tensorflow based Fully-connected Neural Network Regression Model for prediction. Uses Tensorflow based probabilistic Fully-connected Neural Network for Domain of Application and Error Bar prediction. Finally, it uses Lime a model agnostic system to explain the predictions on the molecule. Very cheap and quick. Takes about a cent for property prediction of 50 molecules.
The machine learning model to predict Solubility has been trained on the ChEMBL30 dataset.
ChEMBL30 data is from http://www.ebi.ac.uk/chembl - the version of ChEMBL is ChEMBL_030.
Post processing, the training set contains 30,000 molecules. For classification of solubility into Low, Medium and High we have used the following standard from the USP Solubility chart.
Gradation |
Value in loguM |
---|---|
High |
>=3 |
Med |
>=1.51 and <3 |
Low |
<1.51 |
Name |
Description |
Type |
---|---|---|
Input Small Molecule(s) Dataset
to predict property of
|
The dataset(s) to read records from |
Molecule Dataset |
Name |
Description |
Type |
---|---|---|
Molecule Explainer Type |
Select explainer visualisation.
Atom: annotate atoms only,
Fragment: Annotate Fragments,
Combined: Annotate Both
|
List |
Property Validation Field |
If the dataset has a baseline, the floe reports
a comparison between prediction in Floereport
|
Float |
Name |
Description |
Type |
---|---|---|
Output Property |
Output dataset to write to |
Dataset |
Failure Property |
Output dataset to write to |
Dataset |