Focused Library - Molecule Input

This floe will apply reactions to the input lead molecule, generating an output dataset of products.

Required Inputs:

Both the reaction & reagent database and an input lead molecule dataset are required.

Required Outputs:

The name of an output dataset should be specified, as the Output Data parameter is ON by default. See discussion of prospective runs below.

Optional Activities:

The Molecule ID Field should generally match the source of the input lead molecules for the reaction & reagent database file. In the case of ZINC as the source, zinc_id is the standard structure id field.

Enabling the DB Listing option will generate a reaction directory floe report from the input reaction & reagent database. This is the same directory that the Reaction & Reagent Database Directory floe provides.

A boolean (Filter Output) enables or disables the specific type of molecule filter selected by Mol Filter.

There is a small set of pre-selected properties, Compute Molecule Properties, that can be computed on the generated products, or this activity can be disabled by removing all the properties from the list.

For prospective and trial activities, the Output Data, Output Failures, and Output Specific Failures booleans, when set to OFF, will provide counts of the outputs from the floe without creating dataset(s) - this is useful for validating the input options against a specific input lead molecule dataset prior running a capture run to generate output dataset(s).

The Check Valences and Strict Valences options control whether rejecting or fixing of valence issues are allowed and/or whether any illegal valence in the product results in rejection from the output products.

The Strict Classification option controls whether lead molecules are classified according to both the required and disallowed chemical features (defined by the reaction & reagent database reactions), or simply by the required features. Turning OFF the strict option may generate altenate (or even surprising) products due to reactions at undesireable sites.

General Considerations

There are alternate ways to run this floe.

  1. Allow the lead molecule to be auto-classified as to their reagent types

  2. Provide a specific reagent id (or ids as a space-delimited list) as the reagent type for the lead molecule with validation

  3. Provide a specific reagent id (or ids) as the reagent type for the lead molecule without validation

Approach #1 - the reagent classifier from the provided reaction & reagent database input is used to identify the reagent types for the lead molecules on the fly.

Approach #2 - the user asserts that the provided reagent id (or ids) matches the chemistry of the lead molecule. The reagent classifier from the reaction & reagent database is used to certify that assertion, and only the lead molecules that match the specified reagent chemistry id(s) are sent downstream for processing as those reagents.

Approach #3 - the user asserts that a provided reagent id (or ids) matches the chemistry of the lead molecule. No validation of the classification is attempted, and the lead molecules are used in the provided context of that specific reaction without restriction. If the provided id(s) are incorrect, or the lead molecule does not correspond to the provided id(s), a large number of reaction failures should generally be expected.

Extra Required Parameters

  • Output Specific Failures (boolean) : If OFF, just counts records, but does not output them
    Default: False
  • Output Dataset (dataset_out) : Output dataset containing generated products
    Default: Reaction_products
  • SMILES Dedupe (boolean) : If ON, performs a deduplication of the product smiles
    Default: True
  • Filter Output (boolean) : Enable molecule filtering of the generated products (see type specified by [Mol Filter])
    Default: True
  • Reagent Class (Field Type: String) : Name of the string field used to annotate the reagent class
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use
  • Output Failures (boolean) : If OFF, just counts records, but does not output them
    Default: False
  • Specific Product Failures (dataset_out) : Output dataset containing specific reagent combinations that failed to react
    Default: Product_failures
  • Check Valences (string) : How to handle valence issues for the generated products
    Default: reject
    Choices: reject, allow, fix
  • Reagent Class (Field Type: String) : Name of the string field containing the the reagent class
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use
  • Output Data (boolean) : If OFF, just counts records, but does not output them
    Default: True
  • General Failures (dataset_out) : Output dataset containing input failures and reagents that failed to react
    Default: Input_failures
  • Lead Molecule Dataset (data_source) : A dataset containing the lead molecule(s) to be transformed by reactions from the reaction & reagent database