Layout of the Machine Learning Model Building Package
The OpenEye, Cadence Molecular Sciences Machine Learning Model Building Floes Package is a tool to build machine learning (ML) models that predict properties of small molecules.
Here we present the overview and structure of the package. This outline explains how to use the package to execute specific tasks.
Broadly, the floes can be divided into three categories:
Data preprocessing
Machine learning model building
Molecule property prediction using built models
Data Preprocessing Floe
The first operation is to run the data preprocessing floe. It cleans the data and prepares it for any machine learning operations and general preparation use cases.
The Data Processing of Small Molecules for ML Model Building Floe performs all data preparation for the ML Floe package.
Machine Learning Model Building
To build ML models, we need to choose from floes that are prefaced by ML Build.
These floes build several ML models in parallel on the training data provided and deliver a detailed statistical report of the best models built. The floes build either regression or classification models based on the inputs shown in Table 1. The architecture also depends upon these inputs.
Input Type |
Type of Model Built |
Architecture of Model Built |
---|---|---|
Fingerprint
|
Regression
|
TensorFlow
|
Custom User Input (Regression only)
|
Classification
|
TensorFlow Probability (For confidence interval of regression models only)
|
The model building floes are:
ML Build: Regression Model with Tuner using Fingerprints for Small Molecules
ML Build: Classification Model with Tuner using Fingerprints for Small Molecules
ML Build: Regression Model using Feature Input
ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules
Molecular Property Prediction using Built Models
To use our ML models for molecular property prediction, you will use the floes that are prefaced by ML Predict. For any given built model, you’ll need to use the corresponding prediction floe. For example, if you built your model using the ML Build: Classification Model with Tuner using Fingerprints for Small Molecules Floe, you should then use the ML Predict: Classification using Fingerprints for Small Molecules Floe for prediction.
The prediction floes are:
ML Predict: Regression using Fingerprints for Small Molecules
ML Predict: Classification using Fingerprints for Small Molecules
ML Predict: Regression using Feature Input
hERG Toxicity Prediction for Small Molecules using ML and Cheminfo Fingerprints
Solubility Prediction for Small Molecules using ML and Cheminfo Fingerprints