ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Hit to Lead/Properties/Model Building

  • Task-based/ADME & Tox Assessment

  • Task-based/Data Science

  • Task-based/Cheminformatics

Description

This floe performs transfer learning on pre-built machine learning neural network regression models on properties of small molecules.

The input models are retrained on 2D fingerprints which will be generated in the floe itself. Every molecule in the input dataset needs to have a property column to train on; otherwise, it will be ignored.

It builds machine learning models for all possible combinations of cheminformatics (fingerprint) hyperparameters provided in the advanced sections. Read the documentation to learn more about these parameters and how they should be set for a given training data set.

By default, the front layers are withheld from training and only the later layers are retrained on the new data. The Freeze Layer parameter can be used to control how many top layers to keep constant during training.

THe floe generates a Floe Report containing details of the best models built. The user can pick any model and use it to predict properties of other molecules in the ML Predict: Regression using Fingerprints for Small Molecules Floe. The Floe Report presents detailed statistics on the hyperparameters, adjusts them, and reruns the floe to build better models.

In addition to prediction, the built models provide an explanation of predictions, a confidence interval, and the domain of application.

Warning: By default, this floe builds about 1,000 machine learning models. On a large dataset, this may be expensive. Since multiple parameters lead to this cost, refer to documentation on how to build a cheaper version for practice. The dataset to build decent models needs to be at least 100 molecules (barring exceptions). We have stress tested up to 50,000 molecules. It is recommended to increase the memory and disk space requirements of the cubes to run on larger datasets.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Small Molecules to train machine learning models on (in): Input dataset file with each record containing molecule and response value (float) to train on.

  • Required

  • Type: data_source

Input tensorflow Model (tfm): Machine learning model to predict property.

  • Required

  • Type: data_source

Outputs

Transfer Learning Models Built (out): Output of generated models.

  • Required

  • Type: dataset_out

  • Default: Transfer Learning Output Model

Failure Output (failed_out): Output of failure.

  • Required

  • Type: dataset_out

  • Default: Failed Transfer Learning Model

Options

Select models for transfer learning training (tfr_id): default:-1, trains all models in the input data. Else: Enter multiple model ID to train on

  • Required

  • Type: integer

  • Default: [-1]

Response Value Field. Must match what the model was trained on. (val_field_r): Name of the field containing the primary data being trained on and predicted. Every molecule needs to have this value (will be ignored otherwise).

  • Required

  • Type: field_parameter::float

Number of Models to show in Floe Report (number_of_models_to_show_in_floe_report): How many best models to provide in Floe Report. By default, keeps best 5 models (based on r2 score) such that it meets memory requirement

  • Type: integer

  • Default: 5

Preprocess Molecule (Preprocess Molecule): For every molecule, stores only largest component, adjusts ionization to neutral pH, rejects molecules that fail typecheck

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Apply Blockbuster filter (Blockbuster Filter): Accept or reject molecules based on closeness to Blockbuster molecule properties. For details check toolkit oemolprop.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Negative Log (Negative Log): Transform Response Value Field to Negative Log. Helpful to convert IC50 to pIC50 for instance.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Molecule Explainer Type (explainer_type): Select explainer visualization. Atom: annotate atoms only, Fragment: Annotate Fragments, Combined: Annotate Both

  • Type: string

  • Default: Fragment

  • Choices: [‘Combined’, ‘Fragment’, ‘Atom’]