hERG Toxicity Prediction for Small Molecules using ML and Cheminfo Fingerprints

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Hit to Lead/Properties/Solubility

  • Task-based/ADME & Tox Assessment

  • Solution-based/Hit to Lead/Properties/Model Building

Description

A floe that predicts hERG toxicity of small, drug-like molecules as active (toxic) or inactive (nontoxic). Trained on a combination of chEMBL240](https://www.ebi.ac.uk/chembl/explore/target/CHEMBL240) and [Riken dataset. Toxicity value of <=10uM is predicted as active. It runs a TensorFlow-based fully connected neural network regression model for prediction. The floe uses convex box and Monte Carlo based approaches for domain of application and error bar predictions. The TensorFlow models have been trained on 2D Fingerprints.

Finally, it uses LIME, a model agnostic system to explain hERG toxicity of the molecule(s). The floe is cheap and quick, taking about only a few seconds for the property prediction of 10 molecules.

Outputs:

  • Failure Dataset : The molecule (a) is too large or too small or (b) has an atom not encountered in the training set.

  • No Confidence Dataset: The molecule is deemed out of scope compared to the training set (details below). In this case, the model predictions are unreliable. The explainer image has a red background.

  • Success Dataset: The molecule falls (a) within scope and the explainer has a green background or (b) at the edge of scope and the explainer has a yellow background.

Molecules outside the scope of the training set will be sent to the “No Confidence” port, as a prediction cannot be considered reliable. Specifically, the scope is defined as a range in molecular weight, atom count, polar surface area, and calculated logP from the training set molecules. These ranges are reported in the Floe Report. More details on how the floe operates can be found in this tutorial and in the How-to Guide to Analyze Machine Learning Predictions .

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Small Molecule dataset for Prediction (in): The dataset(s) to read records from

  • Required

  • Type: data_source

Outputs

Output hERG Toxicity (out): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: Predict hERG Toxicity

No-confidence Output hERG Toxicity (noconf): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: No-confidence Predict hERG Toxicity

Failed Dataset Name (failed_out): Output dataset to which to write.

  • Required

  • Type: dataset_out

  • Default: Failure hERG Toxicity

Explanation and Validation

Molecule Explainer Type (molecule_explainer_type): Select explainer visualisation. Atom: annotate atoms only, Fragment: Annotate Fragments, Combined: Annotate Both

  • Type: string

  • Default: Atom

  • Choices: [‘Combined’, ‘Fragment’, ‘Atom’]

Property Validation Field (val_r): If the input small molecule dataset has a baseline hERG toxicity value, the Floe Report provides a comparison between model prediction and baseline. Make sure the baseline contains active and inactive fields, as the ML model was trained on them.

  • Type: field_parameter::string