Focused Library - Synthon Analogs

This floe performs a single step retro-synthetic analysis of the input lead molecule(s) and applies the corresponding reaction transformations to generate analog libraries. All applied transforms are provided in the Reaction & Reagent database.

Required Inputs:

Both the reaction & reagent database and an input lead molecule dataset are required. Sample databases are available as File resources in the ‘Organization Data/OpenEye Data/Generative Design Data’ folder as ZINC_2021_2_lowcomplexity.db and ZINC_2021_2_highinterest.db. The former samples ZINC reagents (for each reagent class in the database) with low molecular complexity values, while the latter contains ZINC reagents of high medchem interest scores.

Required Outputs:

The name of an output dataset should be specified, as the Output Data parameter is ON by default. See discussion of prospective runs below.

Optional Activities:

The Molecule ID Field should generally match the source of the input lead molecules for the reaction & reagent database file. In the case of ZINC as the source, zinc_id is the standard structure id field.

There is a small set of pre-selected properties, Compute Molecule Properties, that can be computed on the generated products, or this activity can be disabled by removing all the properties from the list.

For prospective and trial activities, the Output Data, Output Failures, and Output Specific Failures booleans, when set to OFF, will provide counts of the outputs from the floe without creating dataset(s) - this is useful for validating the input options against a specific input lead molecule dataset prior running a capture run to generate output dataset(s).

The Check Valences and Strict Valences options control whether rejecting or fixing of valence issues are allowed and/or whether any illegal valence in the product results in rejection from the output products.

The Strict Classification option controls whether lead molecules are classified according to both the required and disallowed chemical features (defined by the reaction & reagent database reactions), or simply by the required features. Turning OFF the strict option may generate alternate (or even surprising) products due to reactions at undesirable sites.

The Fragmentation Size option adds a constraint to the size of the reagents generated from the retro reaction transformation(s) application, where a smaller value allows smaller reagents and a larger value requires larger reagents, specified as a heavy-atom percentage of the input molecules.

General Considerations

If specific reagents or reactions are specified, the analysis of the input lead molecule will be restricted to those reactions.

If one or more reagent classes are specified and the retro synthetic analysis of the input molecule is productive for that reaction, the unspecified reagent of the reaction kept fixed, and the specified reagent is varied based on sampled reagents from the Reaction & Reagent database.

Extra Required Parameters

  • Output Data (boolean) : If OFF, just counts records, but does not output them
    Default: True
  • General Failures (dataset_out) : Output dataset containing input failures and reagents that failed to react
    Default: Input_failures
  • Reagent Class (Field Type: String) : Name of the string field containing the the reagent class
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use. Sample databases are available as File resources in the ‘Organization Data/OpenEye Data/Generative Design Data’ folder
  • Output Specific Failures (boolean) : If OFF, just counts records, but does not output them
    Default: False
  • Output Dataset (dataset_out) : Output dataset containing generated products
    Default: Reaction_products
  • SMILES Dedupe (boolean) : If ON, performs a deduplication of the product smiles
    Default: True
  • Filter Output (boolean) : Enable molecule filtering of the generated products (see type specified by [Mol Filter])
    Default: True
  • Output Failures (boolean) : If OFF, just counts records, but does not output them
    Default: False
  • Specific Product Failures (dataset_out) : Output dataset containing specific reagent combinations that failed to react
    Default: Product_failures
  • Lead Molecule Dataset (data_source) : A dataset containing the lead molecule(s) to be transformed by reactions from the reaction & reagent database
  • Check Valences (string) : How to handle valence issues for the generated products
    Default: reject
    Choices: reject, allow, fix
  • Reagent Class (Field Type: String) : Name of the string field containing the the reagent class
  • Reaction & Reagent Database (file_in) : The name of the reaction & reagent database to use. Sample databases are available as File resources in the ‘Organization Data/OpenEye Data/Generative Design Data’ folder