ML Regression Model Building using Fingerprints for Small Molecules

‘ML Regression Model Building using Fingerprints for Small Molecules’ is a floe that train multiple Neural Network (Full or Probabilistic) Regression models on physical properties of small molecules. It builds Machine Learning models for all possible combination of cheminformatics and neural network hyperparameters provided below. Generates floe report containing details of the best models built. User can pick any model and use it to predict properties of other molecules in a separate floe (Predict Physical Properties). The floe report presents detailed statistics on the hyperparameters so as to tweak them and build better models (See documentation). In addition to prediction, the built models provide explanation of predictions and confidence interval. NOTE: This floe by default, builds about 1k machine learning models. On a large dataset, this maybe pricey. Refer to documentations on how to build a cheaper version of the same

Inputs

Name

Description

Type

Input Small Molecules to train
machine learning models on.
Input dataset file with each record containing
molecule and response value(float) to train on

Molecule Dataset

Options

Name

Description

Type

Response Value Field

Name of the field containing the primary data being trained on and predicted.

Float

Number of Models to show in Floe report

How many best models to provide in FloeReport. By default, keeps best
20 models (based on r2 score) such that it meet memory requirement

Int

Training tensorflow probability models

True: Builds Tensorflow Probability based Neural Network Model for
finding the Domain of Application/ Error Bar,
False: Build Tensorflow Neural Network Model for prediction and explanation

Bool

Preprocess Molecule

Preprocess by Neutral Ph, Largest Mol, Blockbuster Filter

Bool

Apply Blockbuster filter

For every molecule, stores only largest component, adjusts ionization to Neutral Ph

Bool

Negative Log

Transform Learning Value to Negative Log.
Only for Regression. False: Build Tensorflow Neural Network Model
for prediction and explanation (Deterministic Model)

Bool

Molecule Explainer Type

Select explainer visualization. Atom: annotate atoms only,
Fragment: Annotate Fragments, Combined: Annotate Both

List

Cheminfo Fingerprint Options: Build models for all possible combination of Fingerprints

Name

Description

Type

Min Radius

Minimum radius for cheminfo fingerprint.

IntVec

Max Radius

Maximum radius for cheminfo fingerprint.

IntVec

Bit Length of FP

Bit Length of cheminfo fingerprint

IntVec

Type of FP

Type of cheminfo fingerprints

IntVec

Neural Network Hyperparameter Options: Build models for all possible combination of Hyperparameters

Name

Description

Type

Dropouts

List of dropout hyperparameters.

FloatVec

Sets of Hidden Layers

list(s) of hidden layers separated by -1. Input and output layer will be determined by data.
Eg: 150,100,50 will create NN with 3 hidden layers of size 150, 100, 50.

IntVec

Sets of Regularisation Layers

list(s) of regularisation layers separated by -1.
No regularisation on Input and output layer.

FloatVec

Learning Rates

List of all the learning rate hyperparameters to train model.

FloatVec

Max Epochs

Maximum number of epochs to train model.

Int

Activation

Activation Functions: ReLU, LeakyReLU, PReLU, tanh, SELU, ELU

List

Batch Size

Batch size for training regressor

Int

Outputs

Name

Description

Type

Models Built

Output of Generated Models

Dataset

Failure Output

Output of Failure

Dataset