Reaction & Reagent Database - Create from BULK SMILES

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Hit to Lead/Generative Design/Reaction-based Libraries

  • Task-based/Library Prep & Design/Reaction-based Enumeration

  • Role-based/Cheminformatician/Medicinal Chemistry Support

  • Role-based/Cheminformatician/Corporate Collection Support

Description

This floe is used to populate a Reaction & Reagent Database from a previously uploaded SMI file resource. The floe assumes a large input SMI file resouce, generally in the 100M or more record size.

Launching the Floe

The floe requires a valid reaction definition file that defines the reactions and associated reagent chemistry for classification of input structures. A sample reaction definition file, 2024_2_sample_reaction_classification.txt is available from the OpenEye Organization resources. At the time of this release, the documentation for generating a custom version of this file is not available. If you have the need to undertake this activity, contact OpenEye Support (mailto:support@eyesopen.com) for additional details.

The name of the output Reaction & Reagent Database should be specified with a scheme that serves as a reminder of the SMI source and filtering used for the structure selection. The Reaction & Reagent Database file generated will be an Orion file resource with the floe user’s credentials in the specified output folder.

Promoted Parameters

Title in user interface (promoted name)

Input Parameters

Reaction Definition File (rxndefs): The name of the file resource containing the reaction definitions.

  • Required

  • Type: file_in

SMI File Resource (SMI_file): A previously uploaded SMILES (.smi) or CXSMILES (.cxsmiles) file resource. This should be an Orion file resource that minimally provides a SMILES and a unique ID for each structure. Note that the default Orion ETL (conversion to dataset) activity should be suppressed for such file uploads.

  • Type: file_in

Output Parameters

Output Reaction & Reagent Database Name (rrdb_output): Name for the Orion file resource being generated.

  • Required

  • Type: file_out

Filtering & Processing Options

Functional Group Transformations (enablefngroups): If ON, allows interconversion of simple functional groups per the reaction definitions during reagent classification.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Molecule Filtering (filter_mols): If ON, performs filtering of reagents prior to classification.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Custom Filter File (filter_file): A filter file resource to load (supersedes the default).

  • Type: file_in

Filter Summary Report (filter_summary): If ON, will enable a summary report of the rules for the filtered molecules.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Structure Normalization Options

Strip Salts (saltchop): If set, retains only the largest fragment from each input structure prior to indexing.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Neutralize Charges (neutralize): If ON, removes all formal charges other than quaternary amines, correcting hydrogen counts.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Valence Handling (fixvalences): How to handle valence issues for input building blocks: ignore: ignore valence errors, fix: attempt to repair valence issues, reject: reject valence errors. Valence issues after attempted fixes are also rejected.

  • Type: string

  • Default: fix

  • Choices: [‘ignore’, ‘fix’, ‘reject’]

Advanced Options

Record Batch Size (batchsize): The number of records to emit in each shard/block.

  • Required

  • Type: integer

  • Default: 10000

SMI Disk Space Limit (smidiskspace): The minimum amount of disk space in (MiB) to accomodate the input bulk SMILES file.

  • Required

  • Type: decimal

  • Default: 20000

Classifier Memory Limit (classifiermem): The memory limit for the reaction classifier. It may need to be increased for large R&R Databases.

  • Required

  • Type: decimal

  • Default: 10240

Verbosity (verbosity): Sets the output logging verbosity.

  • Type: string

  • Default: warning

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]