ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules¶
This floe performs transfer learning on prebuilt ML neural network regression models on properties of small molecules.
The input models are retrained on 2D fingerprints which will be generated in the floe itself. Every molecule in the input dataset needs to have a property column to train on (it will be ignored otherwise).
The floe builds machine learning models for all possible combinations of cheminformatics (fingerprint) hyperparameters provided in the advanced sections. Read the documentation to learn more about these parameters and how they should be set for a given training data set.
By default, the front layers are withheld from training and only the later layers are retrained on the new data. The Freeze Layer parameter should be changed to decide how many top layers should remain constant during training.
The floe generates a Floe Report containing details of the best models built. The user can pick any model and use it to predict properties of other molecules in the ML Predict: Regression using Fingerprints for Small Molecules Floe. The Floe Report presents detailed statistics on the hyperparameters, adjusts them, and reruns the floe to build better models (See documentation).
In addition to prediction, the built models provide an explanation of predictions, a confidence interval, and the domain of application.
Warning: By default, this floe builds about 1,000 machine learning models. On a large dataset, this may be expensive. Since multiple parameters lead to this cost, refer to documentation on how to build a cheaper version for practice. The dataset size to build decent models needs to be at least 100 molecules (barring exceptions). We have stress tested up to 50,000 molecules. It is recommended to increase the memory and disk space requirements of the cubes to run on larger datasets.
Name |
Description |
Type |
---|---|---|
Input Small Molecules to Train
Machine Learning Models On
|
Input dataset file with each record containing
molecule and response value (float) to train on.
|
Molecule Dataset |
Input TensorFlow Model |
Machine Learning model to predict property. |
Machine Learning Tensorflow Model Dataset |
Name |
Description |
Type |
---|---|---|
Select Models for Transfer Learning Training |
Default: -1, trains all models in the input data. Other: Enter multiple model IDs to train on. |
Int, Many |
Response Value Field |
Name of the field containing the primary data being trained on and predicted. |
Float |
Number of Models to Show in Floe Report |
How many best models to provide in the Floe Report. By default, keeps best
20 models (based on r2 score) such that it meets memory requirements.
|
Int |
Preprocess Molecule |
For every molecule, stores only largest component, adjusts ionization to neutral pH. |
Bool |
Apply Blockbuster Filter |
Apply Blockbuster filter. |
Bool |
Negative Log |
Transform learning values to negative log.
Only for regression. Off: Build TensorFlow neural network model
for prediction and explanation (Deterministic Model).
|
Bool |
Molecule Explainer Type |
Select explainer visualization.
Atom: annotate atoms only
Fragment: annotate fragments
Combined: annotate both
|
List |
Name |
Description |
Type |
---|---|---|
Models Built |
Output of generated models. |
Dataset |
Failure Output |
Output of failure. |
Dataset |