Reaction & Reagent Database ValidatorΒΆ

This floe will perform a series of validations on the provided reaction & reagent database to verify and diagnose issues with the reaction definition file used in the database creation.

Required Inputs:

A reaction & reagent database generated by either the Dataset to Reaction & Reagent Database or ZINC Download to Reaction & Reagent Database floes is required for input.

The validation output is a floe report with validation status for each reaction from the reaction definition file and the classified reagents in the database.

Optional Activities:

Reagent Sampling: If ON, reagents will be sampled from the full set of reagents in the reaction & reagent database. If OFF, a limited number of reagents are extracted, generally in reagent registration order. Sampling add overhead to extract the full set of reagents, but can provide a better overview of the variations in chemistry of the reagents.

Reaction Validations: This parameter can be used to provide a list of reaction ids (space- or comma-delimited) for the validation activity, or blank to process all the reactions from the reaction & reagent database.

Reaction & Reagent Directory: if ON, will generate a floe report that is a listing of all the reactions in the database.

Output Generated Products: if ON, all the valid products generated during the validation will be output to the Generated Products Dataset. If OFF, just a count of the products generated is provided.

Output Unreacted Products: if ON, reagents that fail to transform to products will be output to the Unreacted Reagents Dataset. If OFF, just a count of the products generated is provided.

Output Valence Errors: if ON, all the products that fail valence validations will be output to the Valence Error Dataset. If OFF, just a count of the products generated is provided.

Max Reagents: For N-components, this value to the Nth power will be used for the validation. Generally something less that 1000 for 2-component reactions is recommended.

Strict Valences: This parameter controls the behavior of the selected Check Valences parameter behavior by rejecting valence errors outright, or rejecting valence errors after a valence repair is attempted.

Max [Pass|Valence|Fail] Results: Caps the number of output structures for each category.

Sample Seed: A specific random seed (generally a 6-digit integer) can be provided to allow reproducibility of the floe when Reagent Sampling is employed.

Logging Verbosity: generally only warning level verbosity is recommended, but specific problems with the floe may need info level reporting for support tickets.

Extra Required Parameters

  • Output Half Rxn Errors (boolean) : If OFF, just count half reaction validation errors, but do not output them
    Default: False
  • Unreacted Reagents Dataset (dataset_out) : Output dataset containing input failures and reagents that failed to react
    Default: Unreacted_reagents
  • Output Generated Products (boolean) : If OFF, just counts product records, but does not output them
    Default: True
  • Generated Products Dataset (dataset_out) : Output dataset containing generated products
    Default: Generated_products
  • Output Valence Errors (boolean) : If OFF, just counts valence errors, but does not output them
    Default: False
  • Check Valences (string) : How to handle valence issues for the generated products
    Default: reject
    Choices: reject, allow, fix
  • Reagent Validation (Field Type: String) : Field name containing the reagent validation results
    Default: COUNTS
  • Half Reactions (Field Type: String) : Field name containing the half reaction validation result
    Default: HalfRxns
  • Reaction Name (Field Type: String) : Field name containing the reaction to be validated
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use
  • SMIRKS Validation (Field Type: String) : Field name containing the SMIRKS validation result
    Default: SMIRKS
  • Reagent Classify Stats (Field Type: String) : Field name containing the reagent classify stats
    Default: STATS
  • Reagent Validation (Field Type: String) : Field name containing the reagent validation results
    Default: COUNTS
  • Half Reactions (Field Type: String) : Field name containing the half reaction validation result
    Default: HalfRxns
  • Reaction Name (Field Type: String) : Field name containing the reaction to be validated
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use
  • SMIRKS Validation (Field Type: String) : Field name containing the SMIRKS validation result
    Default: SMIRKS
  • Reagent Classify Stats (Field Type: String) : Field name containing the reagent classify stats
    Default: STATS
  • Reaction Name (Field Type: String) : Field name containing the reaction to be validated
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use
  • Output Unreacted Reagents (boolean) : If OFF, just counts unreacted reagents, but does not output them
    Default: False
  • Half Rxn Error Dataset (dataset_out) : Output dataset with half reaction validation errors
    Default: Halfrxn_errors
  • Valence Error Dataset (dataset_out) : Output dataset containing products with valence errors
    Default: Valence_errors