ML Build: Regression Model with Tuner using Fingerprints for Small Molecules
This floe trains multiple ML neural network regression models on the physical properties of small molecules.
The models train on 2D fingerprints which are generated in the floe itself. Every molecule in the input dataset needs to have a physical property column to train on (it will be ignored otherwise).
It builds machine learning models for all possible combinations of cheminformatics (fingerprint) and neural network hyperparameters provided in the advanced sections. Read the documentation to learn more about these parameters and how they should be set for a given training data set.
Furthermore, it picks the best models and fine-tunes them using a Keras Tuner to provide the best models.
The floe generates a Floe Report containing details of the best models built. The user can pick any model and use it to predict properties of other molecules in the ML Regression using Fingerprints for Small Molecules Floe. The Floe Report presents detailed statistics on the hyperparameters, to adjust them and rerun the floe to build better models.
In addition to prediction, the built models provide an explanation of predictions, a confidence interval, and the domain of application.
Warning: By default, this floe builds approximately 2,000 machine learning models. On a large dataset, this may be expensive. Since multiple parameters lead to this cost, refer to the tutorial to build a cheaper and faster model. Datasets to build decent models should be at least 100 molecules (barring exceptions). We have stress tested up to 30,000 molecules. It is recommended to increase the memory and disk space requirements of the cubes to run on larger datasets. Please refer to docs on how to build models on a larger dataset.
Name |
Description |
Type |
---|---|---|
Input Small Molecules to Train
Machine Learning Models On
|
Input dataset file with each record containing
molecule and response value (float) to train on.
|
Molecule Dataset |
Name |
Description |
Type |
---|---|---|
Models Built |
Output of generated models. |
Dataset |
Failure Output |
Output of failure. |
Dataset |
Name |
Description |
Type |
---|---|---|
Response Value Field |
Name of the field containing the primary data being trained on and predicted. |
Float |
Are We Using the Keras Tuner |
If this is On, we fine-tune our algorithm using the Keras Tuner.
|
Bool |
What Kind of Keras Tuner to Use |
Choose between Hyperband, RandomSearch, Bayesian Optimization.
|
String |
Number of Models to Show in Floe Report |
How many best models to provide in the Floe Report. By default, keeps best
20 models (based on r2 score) such that it meets memory requirements.
|
Int |
Are We Training TensorFlow Probability |
True: Builds TensorFlow probability-based neural network model for
finding the domain of application or error bar.
False: Builds TensorFlow neural network model for prediction and explanation.
|
Bool |
Preprocess Molecule |
For every molecule, stores only largest component, adjusts ionization to neutral pH. |
Bool |
Apply Blockbuster Filter |
Apply blockbuster filter. |
Bool |
Negative Log |
Transform learning value to negative log.
Only for regression. False: Build TensorFlow neural network model
for prediction and explanation (deterministic model).
|
Bool |
Molecule Explainer Type |
Select explainer visualization. Atom: annotate atoms only,
Fragment: annotate fragments, Combined: annotate both.
|
List |
Name |
Description |
Type |
---|---|---|
Min Radius |
Minimum radius for cheminfo fingerprints. |
IntVec |
Max Radius |
Maximum radius for cheminfo fingerprints. |
IntVec |
Bit Length of Fingerprint (FP) |
Bit length of cheminfo fingerprints. |
IntVec |
Type of Fingerprint (FP) |
Type of cheminfo fingerprints. |
IntVec |
Name |
Description |
Type |
---|---|---|
Dropouts |
List of dropout hyperparameters. |
FloatVec |
Sets of Hidden Layers |
List(s) of hidden layers separated by -1. Input and output layers will be determined by data.
Example: 150,100,50 will create NN with 3 hidden layers of size 150, 100, 50.
|
IntVec |
Sets of Regularization Layers |
List(s) of regularization layers separated by -1.
No regularization on input and output layers.
|
FloatVec |
Learning Rates |
List of all the learning rate hyperparameters to train model. |
FloatVec |
Max Epochs |
Maximum number of epochs to train model. |
Int |
Activation |
Activation functions: ReLU, LeakyReLU, PReLU, tanh, SELU, ELU. |
List |
Batch Size |
Batch size for training regressor. |
Int |