Setup Directions for the Machine Learning Model Building Tutorials

This tutorial explains the first steps you need for any of the tutorials within this floe package. From this tutorial, you can continue to any of the subsequent lessons.

Remember, the ML floes can be separated into three categories:
1. Data preprocessing
2. Machine learning model building
3. Molecule property prediction using built models

Create a Tutorial Project

Note

If you have already created a tutorial project, you can reuse the existing one.

Log into Orion and click the “Home” button at the top of the blue navigation bar on the left side of the Orion User Interface. Then click the “Create New Project” button and in the pop-up window, enter Tutorial for the name of the project and click the “Save” button.

create_project_ui

Orion home page

Floe Input

The basic floe inputs are described here. Each individual tutorial will describe any additional information needed for that floe and will provide the necessary dataset(s) to download.

Each floe requires an input dataset file with each record in the file containing an OEMolField. Uploaded files such as .csv, .sdf, or a similar format in Orion are automatically converted to datasets. In each record, a separate field containing the Float property must be included to train the network on.

One of the required inputs is a molecule dataset, which will be P1. It contains several OERecord(s). The floe expects two things from each record:

  • An OEMol to either train the models on or to predict the physical properties of these molecules.

  • A Float or String value which contains the regression or classification property to be learned.

The following floes use only the P1 dataset:

Data Preprocessing:

  • Data Processing of Small Molecules for ML Model Building

Machine learning model building:

  • ML Build: Regression Model with Tuner using Fingerprints for Small Molecules

  • ML Build: Classification Model with Tuner using Fingerprints for Small Molecules

  • ML Build: Regression Model using Feature Input

Molecule property prediction using built models:

  • Solubility Prediction for Small Molecule using ML and Cheminfo Fingerprints

  • hERG Toxicity Prediction for Small Molecules using ML and Cheminfo Fingerprints

For the last two floes, the response field is not required. However, you can use a float value (for solubility) or string value (for hERG) to validate the results against.

For some floe tutorials, a second dataset may be needed (and sometimes a third, T1, for confidence interval calculation on regression models). The M1 dataset contains one or more machine learning models. To learn how to generate these models, read the tutorial <link> on building models in Orion.

The following floes use an M1 dataset in addition to the P1 dataset:

Machine learning model building:

  • ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules

Molecule property prediction using built models:

  • ML Predict: Regression using Fingerprints for Small Molecules

  • ML Predict: Classification using Fingerprints for Small Molecules

  • ML Predict: Regression using Feature Input

Run the OEModel Building Floe

For all floes, begin with the following directions in order to find your desired floe.

select_db
  • Click the “Floe” button on the navigation bar to reach the Floe page.

  • Click on the Floes tab.

  • Under Categories, click on “Packages” and select the OpenEye Model Building package.

  • A list of the ML floes will now be visible to the right. Click on the one you would like to use.

  • Alternatively, you can enter the name of the desired floe in the search bar.

  • Click “Launch Floe” for your desired floe, and a Job Form will pop up. Specify the parameter settings as indicated for each tutorial.