Layout of the Machine Learning Model Building Package

The OpenEye, Cadence Molecular Sciences Machine Learning Model Building Floes Package is a tool to build machine learning (ML) models that predict properties of small molecules.

Here we present the overview and structure of the package. This outline explains how to use the package to execute specific tasks.

Broadly, the floes can be divided into three categories:

  1. Data preprocessing

  2. Machine learning model building

  3. Molecule property prediction using built models

Data Preprocessing Floe

The first operation is to run the data preprocessing floe. It cleans the data and prepares it for any machine learning operations and general preparation use cases.

The Data Processing of Small Molecules for ML Model Building Floe performs all data preparation for the ML Floe package.

Machine Learning Model Building

To build ML models, we need to choose from floes that are prefaced by ML Build.

These floes build several ML models in parallel on the training data provided and deliver a detailed statistical report of the best models built. The floes build either regression or classification models based on the inputs shown in Table 1. The architecture also depends upon these inputs.

Table 1. Types of ML Build Floes

Input Type

Type of Model Built

Architecture of Model Built

Fingerprint
Regression
TensorFlow
Custom User Input (Regression only)
Classification
TensorFlow Probability (For confidence interval of regression models only)

The model building floes are:

  • ML Build: Regression Model with Tuner using Fingerprints for Small Molecules

  • ML Build: Classification Model with Tuner using Fingerprints for Small Molecules

  • ML Build: Regression Model using Feature Input

  • ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules

Molecular Property Prediction using Built Models

To use our ML models for molecular property prediction, you will use the floes that are prefaced by ML Predict. For any given built model, you’ll need to use the corresponding prediction floe. For example, if you built your model using the ML Build: Classification Model with Tuner using Fingerprints for Small Molecules Floe, you should then use the ML Predict: Classification using Fingerprints for Small Molecules Floe for prediction.

The prediction floes are:

  • ML Predict: Regression using Fingerprints for Small Molecules

  • ML Predict: Classification using Fingerprints for Small Molecules

  • ML Predict: Regression using Feature Input

  • hERG Toxicity Prediction for Small Molecules using ML and Cheminfo Fingerprints

  • Solubility Prediction for Small Molecules using ML and Cheminfo Fingerprints