ML ReBuild: Transfer Learn ML Regression Model Using Fingerprints for Small Molecules

This floe performs transfer learning on prebuilt machine learning (ML) neural network regression models on properties of small molecules.

The input models are retrained on 2D fingerprints which will be generated in the floe itself. Every molecule in the input dataset needs a property column to train on (it will be ignored otherwise).

It builds machine learning models for all possible combinations of cheminformatics (fingerprint) hyperparameters provided in the advanced sections. Read the documentation to learn more about these parameters and how they should be set for a given training data set.

By default, the front layers are withheld from training and only the later layers are retrained on the new data. Change the Freeze Layer parameter to decide how many top layers should remain constant during training.

The floe generates a Floe Report containing details of the best models built. The user can pick any model and use it to predict properties of other molecules in the ML Predict: Regression Using Fingerprints for Small Molecules Floe. The Floe Report presents detailed statistics on the hyperparameters. The user can adjust them and rerun the floe to build better models (see documentation).

In addition to prediction, the built models provide an explanation of predictions, a confidence interval, and the domain of application.

Warning: By default, this floe builds approximately 1,000 machine learning models. On a large dataset, this may be expensive. Since multiple parameters lead to this cost, refer to this tutorial for how to build a cheaper version. The dataset to build decent models needs to be at least 100 molecules (barring exceptions). We have performed stress tests for as many as 50,000 molecules. We recommended increasing the memory and disk space requirements of the cubes to run on larger datasets.

Inputs

Name

Description

Type

Input Small Molecules to Train
Machine Learning Models On
Input dataset file with each record containing a
molecule and response value (float) to train on.

Molecule Dataset

Input TensorFlow Model

Machine learning model to predict property.

Machine Learning TensorFlow Model Dataset

Outputs

Name

Description

Type

Models Built

Output of generated models.

Dataset

Failure Output

Output of failure.

Dataset