ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Hit to Lead/Properties/Model Building
Task-based/ADME & Tox Assessment
Task-based/Data Science
Task-based/Cheminformatics
Description
This floe performs transfer learning on pre-built machine learning neural network regression models on properties of small molecules.
The input models are retrained on 2D fingerprints which will be generated in the floe itself. Every molecule in the input dataset needs to have a property column to train on; otherwise, it will be ignored.
It builds machine learning models for all possible combinations of cheminformatics (fingerprint) hyperparameters provided in the advanced sections. Read the documentation to learn more about these parameters and how they should be set for a given training data set.
By default, the front layers are withheld from training and only the later layers are retrained on the new data. The Freeze Layer parameter can be used to control how many top layers to keep constant during training.
THe floe generates a Floe Report containing details of the best models built. The user can pick any model and use it to predict properties of other molecules in the ML Predict: Regression using Fingerprints for Small Molecules Floe. The Floe Report presents detailed statistics on the hyperparameters, adjusts them, and reruns the floe to build better models.
In addition to prediction, the built models provide an explanation of predictions, a confidence interval, and the domain of application.
Warning: By default, this floe builds about 1,000 machine learning models. On a large dataset, this may be expensive. Since multiple parameters lead to this cost, refer to documentation on how to build a cheaper version for practice. The dataset to build decent models needs to be at least 100 molecules (barring exceptions). We have stress tested up to 50,000 molecules. It is recommended to increase the memory and disk space requirements of the cubes to run on larger datasets.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Small Molecules to train machine learning models on (in): Input dataset file with each record containing molecule and response value (float) to train on.
Required
Type: data_source
Input tensorflow Model (tfm): Machine learning model to predict property.
Required
Type: data_source
Outputs
Transfer Learning Models Built (out): Output of generated models.
Required
Type: dataset_out
Default: Transfer Learning Output Model
Failure Output (failed_out): Output of failure.
Required
Type: dataset_out
Default: Failed Transfer Learning Model
Options
Select models for transfer learning training (tfr_id): default:-1, trains all models in the input data. Else: Enter multiple model ID to train on
Required
Type: integer
Default: [-1]
Response Value Field. Must match what the model was trained on. (val_field_r): Name of the field containing the primary data being trained on and predicted. Every molecule needs to have this value (will be ignored otherwise).
Required
Type: field_parameter::float
Number of Models to show in Floe Report (number_of_models_to_show_in_floe_report): How many best models to provide in Floe Report. By default, keeps best 5 models (based on r2 score) such that it meets memory requirement
Type: integer
Default: 5
Preprocess Molecule (Preprocess Molecule): For every molecule, stores only largest component, adjusts ionization to neutral pH, rejects molecules that fail typecheck
Type: boolean
Default: True
Choices: [True, False]
Apply Blockbuster filter (Blockbuster Filter): Accept or reject molecules based on closeness to Blockbuster molecule properties. For details check toolkit oemolprop.
Type: boolean
Default: False
Choices: [True, False]
Negative Log (Negative Log): Transform Response Value Field to Negative Log. Helpful to convert IC50 to pIC50 for instance.
Type: boolean
Default: False
Choices: [True, False]
Molecule Explainer Type (explainer_type): Select explainer visualization. Atom: annotate atoms only, Fragment: Annotate Fragments, Combined: Annotate Both
Type: string
Default: Fragment
Choices: [‘Combined’, ‘Fragment’, ‘Atom’]