Reaction & Reagent Database - Reaction Definition Validator

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Role-based/Cheminformatician/Medicinal Chemistry Support

Description

This floe will perform a series of validations on the provided Reaction & Reagent Database to verify and diagnose issues with the reaction definition file used in the database creation.

Required Inputs:

A Reaction & Reagent Database generated by any of the Reaction & Reagent Database - Create from Dataset, Reaction & Reagent Database - Create from SMILES, or Reaction & Reagent Database - Create from ZINC Download Floes is required for input.

The validation output is a floe report with validation status for each reaction from the reaction definition file and the classified reagents in the database.

Optional Activities:

Reagent Sampling: If “On”, reagents will be sampled from the full set of reagents in the Reaction & Reagent database. If “Off”, a limited number of reagents are extracted, generally in reagent registration order. Sampling adds overhead to extract the full set of reagents, but it can provide a better overview of the variations in the chemistry of the reagents.

Reaction Validations: This parameter can be used to provide a list of reaction IDs (space- or comma-delimited) for the validation activity, or left blank to process all the reactions from the Reaction & Reagent Database.

Reaction & Reagent Directory: If “On”, it will generate a floe report that is a listing of all the reactions in the database.

Output Generated Products: If “On”, all the valid products generated during the validation will be output to the Generated Products Dataset. If “Off”, just a count of the products generated is provided.

Output Unreacted Products: If “On”, reagents that fail to transform to products will be output to the Unreacted Reagents Dataset. If “Off”, just a count of the products generated is provided.

Output Valence Errors: If “On”, all the products that fail valence validations will be output to the Valence Error Dataset. If “Off”, just a count of the products generated is provided.

Max Reagents: For N-components, this value to the Nth power will be used for the validation. Generally something less that 1,000 for 2-component reactions is recommended.

Strict Valences: This parameter controls the behavior of the selected Check Valences parameter behavior by rejecting valence errors outright or by rejecting valence errors after a valence repair is attempted.

Max [Pass|Valence|Fail] Results: Caps the number of output structures for each category.

Sample Seed: A specific random seed (generally a 6-digit integer) can be provided to allow reproducibility of the floe when Reagent Sampling is employed.

Logging Verbosity: Generally only warning level verbosity is recommended, but specific problems with the floe may need information level reporting for support tickets.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Reaction & Reagent Database (rxndb): The name of the Reaction & Reagent Database to use.

  • Required

  • Type: file_in

Reaction Selection (reactions): One or more reaction selections from the sample reaction database list or All for every reaction.

  • Type: string

  • Default: [‘All’]

  • Choices: [‘All’, ‘3-nitrile-pyridine’, ‘Buchwald-Hartwig’, ‘Buchwald_cross_coupling1’, ‘Buchwald_cross_coupling2’, ‘Ester_hydrolysis-amide_synthesis1’, ‘Ester_hydrolysis-amide_synthesis2’, ‘Grignard_alcohol’, ‘Grignard_carbonyl’, ‘Heck_non-terminal_vinyl’, ‘Heck_terminal_vinyl’, ‘Huisgen_disubst-alkyne’, ‘Mitsunobu_imide’, ‘Mitsunobu_phenol’, ‘Mitsunobu_sulfonamide’, ‘Mitsunobu_tetrazole_1’, ‘Mitsunobu_tetrazole_2’, ‘N-alkylation1’, ‘N-alkylation2’, ‘N-arylation_heterocycles’, ‘Negishi’, ‘Niementowski_quinazoline’, ‘O-alkylation’, ‘O-biarylation’, ‘Pictet-Spengler’, ‘Reductive_amination1’, ‘Reductive_amination2’, ‘Schotten-Baumann_amide’, ‘SnAr1’, ‘SnAr2’, ‘Sonogashira’, ‘Stille’, ‘Suzuki_cross_coupling’, ‘Wittig’, ‘benzimidazole_derivatives_aldehyde’, ‘benzimidazole_derivatives_carboxylic-acid/ester’, ‘benzofuran’, ‘benzothiazole’, ‘benzothiophene’, ‘benzoxazole_arom-aldehyde’, ‘benzoxazole_carboxylic-acid’, ‘decarboxylative_coupling’, ‘heteroaromatic_nuc_sub’, ‘imidazole’, ‘indole’, ‘nucl_sub_aromatic_ortho_nitro’, ‘nucl_sub_aromatic_para_nitro’, ‘oxadiazole’, ‘phthalazinone’, ‘piperidine_indole’, ‘pyrazole’, ‘spiro-chromanone’, ‘sulfon_amide’, ‘tetrazole_connect_regioisomere_1’, ‘tetrazole_connect_regioisomere_2’, ‘tetrazole_terminal’, ‘thiazole’, ‘triaryl-imidazole’, ‘urea’]

Custom Reaction Names (customreactions): One or more reactions from the Reaction & Reagent Database (blank delimited). Any value for this parameter supersedes a Reaction Selection above.

  • Type: string

Max Reagents (maxreagents): Limit the number of validation reagents to this value: the number of reaction validations performed is the square of this value.

  • Type: integer

  • Default: 10

Reaction & Reagent Directory (rxndir): If ON, generates a directory listing for the Reaction & Reagent Database.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Output Product Options

Output Generated Products (pass): If OFF, just counts product records, but does not output them.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Max Pass Results (maxpass): Output limited to this number of passing validation results for each reaction validated, or 0 for all.

  • Type: integer

  • Default: 10

Generated Products Dataset (output): Output dataset containing generated products.

  • Required

  • Type: dataset_out

  • Default: Generated_products

Output Failure Options

Output Unreacted Reagents (fail): If OFF, just counts unreacted reagents, but does not output them.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Max Fail Results (maxfail): Output limited to this number of failure results for each reaction validated, or 0 for all.

  • Type: integer

  • Default: 10

Unreacted Reagents Dataset (failures): Output dataset containing input failures and reagents that failed to react.

  • Required

  • Type: dataset_out

  • Default: Unreacted_reagents

Output Valence Failure Options

Output Valence Errors (valfail): If OFF, just counts valence errors, but does not output them.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Max Valence Errors (maxvalerr): Output limited to this number of valence errors for each reaction validated, or 0 for all.

  • Type: integer

  • Default: 10

Valence Error Dataset (valfailures): Output dataset containing products with valence errors.

  • Required

  • Type: dataset_out

  • Default: Valence_errors

Output Half-reaction Failure Options

Output Half Rxn Errors (halfrxnfail): If OFF, just counts half reaction validation errors, but does not output them.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Max Half Reaction Errors (maxhalfrxnerr): Output limited to this number of half reaction errors for each reaction validated, or 0 for all.

  • Type: integer

  • Default: 10

Half Rxn Error Dataset (halfrxnfailures): Output dataset with half reaction validation errors.

  • Required

  • Type: dataset_out

  • Default: Halfrxn_errors

Advanced Options

Strict Valences (strictval): If Check Valences is active, any valence issues found after the transformation is applied terminate further application.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Check Valences (valence): How to handle valence issues for the generated products.

  • Required

  • Type: string

  • Default: reject

  • Choices: [‘reject’, ‘allow’, ‘fix’]

Sample Seed (seed): Uses the specified seed for any sampling activities.

  • Type: integer

Reagent Sampling (reagsampling): If ON(slower) or OFF(faster), will sample the number of reagents specified by the Max Reagents parameter. Otherwise, the first Max Reagents will be used for the validation.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Classifier Memory Limit (classifiermem): The memory limit for the reaction classifier. It may need to be increased for large R&R Databases.

  • Required

  • Type: decimal

  • Default: 10240

Logging Verbosity (verbose): How much logging output to generate during validation activities.

  • Type: string

  • Default: warning

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]