Tutorial: Create and Inspect a Reaction and Reagent Database

The reaction and reagent database is an essential part of the workflow. It contains a set of building block structures that have been indexed for particular transformations, using logic found in the reactions and reagents definitions file. For lead optimization applications, users are strongly encouraged to index their own reaction and reagent database, either using their own in-house collection building blocks or using vendor catalogs of specific interest to them. This will lead to the generation of ideas that are actionable as output at the end of the process.

The default databases that are included in Organization Data on Orion are “sample” databases that allow users to experiment with the generative floes while also being small and curated enough to include reactants that can be used in supported transformations. If using these databases, you should expect only a few ideas, not a dataset of compounds that exhaustively generates possible analogs.

For this tutorial, we will index a reaction and reagent database using the building blocks catalog from MCule, which is publicly available on their website. We will use the following floes:

  • Reaction & Reagent Database – Create from BULK SMILES

  • Reaction & Reagent Database – Directory Listing


Download the Building Blocks Catalog

  1. Navigate to https://mcule.com/database/.

  2. Download the 2D .smi file of MCule supplier building block catalogs. As of the date in Figure 1, the file contained approximately five million structures.

../../../../../_images/Mcule-db.png

Figure 1. Table of available databases on the Mcule database website with the chosen database highlighted.

  1. Upload the decompressed SMILES file to an Orion project. The uploaded ETL Floes will automatically convert the file you import to a dataset. Datasets are not designed to have more than 100,000 records and the conversion of the SMILES file to a dataset is not necessary to complete this workflow. Thus, you should disable the automatic conversion in the “Upload to Orion” window. Click on “Show Advanced Options” and choose None for the Processing Method (Figure 2).

../../../../../_images/upload_Mcule.png

Figure 2. The “Upload to Orion” window with the option None selected for the Processing Method.

Create a Reaction and Reagent Database

Choose the Reaction & Reagent Database – Create from BULK SMILES Floe. Click “Launch Floe” to bring up the Job Form. The parameters can be specified as below.

Input Parameters

  • Reaction Definition File: This file resource can be found in Organization Data on the Orion Data page, using this path: Organization Data / OpenEye Data / Generative Design Data. Select 2024_2_sample_reaction_classification.txt.

  • SMI File Resource: Choose the Mcule SMILES file as the input. This can be uploaded to Orion at this time if you did not do so earlier. Use the directions in Step 3 above.

Output Reaction & Reagent Database Name: Name the output database with a recognizable name reflecting the input structure source.

Filtering and Processing Options

  • Functional Group Transformation: Turn this Off. If On, the floe may generate new building blocks using simple functional group transformations, creating analogs that require preprocessing in the lab for use. For example, an ester can be converted to a carboxylic acid, resulting in a new molecule not listed in the catalog.

Structure Normalization Options

  • Strip Salts: Turn this On to ensure that you will use the largest (and most relevant) fragment in a salt. For example, the floe will consider only the carboxylate in a sodium carboxylate salt.

  • Neutralize Charges: Turn this On. If you have charges in some building blocks (e.g., benzoxonium chloride), it is advisable to neutralize such charges to facilitate the functional group recognition.

Check that the parameters in your Job Form are as shown in Figure 3, then click “Start Job” to begin the floe.

../../../../../_images/RR_db_SMILES_parameters.png

Figure 3. Filled parameters of the Job Form for the Reaction & Reagent Database – Create from SMILES Floe.

Inspect a Reaction and Reagent Database: Generating a Directory

Once generated, a reaction and reagent database can be used as input for any of the reaction-based generative design floes. At this point, you have an Orion file resource type. It does not need to be regenerated unless you wish to change the logic or change the reagents contained in it. In this manner, creating the database is “one-stop-shopping”: there is no need to go back to chemical structure sources to filter based on functional groups, SMARTS, and so on. This has been done, for all transformations, by the indexing floe. It is, however, a static file that cannot be easily altered, filtered, and so on. It can be inspected in a couple of different ways. The following tasks will help you to get a sense of what is in the database and to extract reagents that are classified for a particular transformation; this can be used to pull out only reagents of interest into a smaller and more focused database by applying filters (if desired).

  1. Navigate to the Floe page and open the OpenEye Generative Design – Advanced Floe Package. Select the Reaction & Reagent Database – Directory Listing Floe.

  2. Select the Mcule database file resource that you just wrote. This is the only input required for the floe. Click the “Start Job” button.

  3. To review the results of the floe, navigate to the Jobs Tab and click on the floe you just ran. Select the Floe Report tab to view the generated directory. This shows an exemplar of each reaction in the database and counts the number of each type of reagent that has been classified within the database. The Product Count column shows the number of compounds that would be enumerated if all reagents of each type within the database were allowed to react with one another (Figure 4).

../../../../../_images/RR_db_directory.png

Figure 4. Floe Report of the Reaction and Reagent Database – Directory Listing Floe.

In addition to interrogating the reagents of the entire database, you can interrogate and retrieve the reagents of a particular reaction using the Reaction & Reagent Database – Retrieve Reaction Reagents Floe. This floe can output the reagents in a dataset if you have less than 100,000 records. If you have over 100,000 reagents, the floe can generate a SMILES or CSV file to import to your local machine.