Setup Directions for the Machine Learning Model Building Tutorials

This tutorial explains the first steps you need for all tutorials within this floe package. From here, you can continue to any of the subsequent lessons.

Remember, the ML floes can be separated into three categories:

  1. Data preprocessing

  2. Machine learning model building

  3. Molecule property prediction using built models

Create a Tutorial Project

Note

If you have already created a tutorial project, you can reuse the existing one.

Log into Orion and navigate to the Project page on the blue navigation bar on the Orion User Interface. Then click the “New Project” button, and in the pop-up window, enter Tutorial for the name of the project and click the “Create” button. Alternatively, you can choose the Project List caret and click “Add a New Project” to reach the pop-up window.

Floe Input

The basic floe inputs are described here. Each individual tutorial will include any additional information needed for that floe and will provide the necessary dataset(s) to download.

Each floe requires an input dataset with each record containing an OEPrimaryMolField. Uploaded files such as .csv, .sdf, or a similar format in Orion are automatically converted to datasets. In each record, a separate field containing the Float property must be included to train the network on.

One of the required inputs is a molecule dataset, which will be P1. It contains several OERecord(s). The floe expects two things from each record:

  • An OEPrimaryMol to either train the models on or to predict the physical properties of these molecules.

  • A Float or String value which contains the regression or classification property to be learned.

The following floes use only the P1 dataset.

Data preprocessing:

  • Data Processing of Small Molecules for ML Model Building

Machine learning model building:

  • ML Build: Regression Model with Tuner using Fingerprints for Small Molecules

  • ML Build: Classification Model with Tuner using Fingerprints for Small Molecules

  • ML Build: Regression Model using Feature Input

Molecule property prediction using built models:

  • Solubility Prediction for Small Molecules using ML and Cheminfo Fingerprints

  • hERG Toxicity Prediction for Small Molecules using ML and Cheminfo Fingerprints

For the last two floes, the response field is not required. However, you can use a float value (for regression) or string value (for classification) to validate the results against.

For some floes, a second dataset M1 may be needed (and sometimes a third, T1, for confidence interval calculations on regression models). The M1 data record contains one or more trained machine learning models. To learn how to generate these trained models, read the tutorials prefaced with ML Build on building models in Orion.

The following floes use an M1 dataset in addition to the P1 dataset.

Machine learning model building:

  • ML ReBuild: Transfer Learn ML Regression Model using Fingerprints for Small Molecules

Molecule property prediction using built models:

  • ML Predict: Regression using Fingerprints for Small Molecules

  • ML Predict: Classification using Fingerprints for Small Molecules

  • ML Predict: Regression using Feature Input

Run the ML Model Building Floes

For all floes, begin with the following directions in order to find your desired floe.

select_db

Figure 1. List of floes for the OpenEye Model Building package.

  • Click the “Floe” button on the navigation bar to reach the Floe page.

  • Click on the Floes tab.

  • From the Categories Floe Filters, click on the ‘Packages’ drop-down to expand the list of packages, then select the OpenEye Model Building package.

  • A list of the ML floes will now be visible to the right. Click on the one you would like to use.

  • Alternatively, you can enter the name of the desired floe in the search bar.

  • Click “Launch Floe” for your desired floe, and a Job Form will pop up. Specify the parameter settings as indicated for each tutorial.

  • Click “Start Job” to begin the floe.