Generative Structure Floe - Site selection¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Hit to Lead/Generative Design/Match Molecular Pairs (MMP)
Solution-based/Hit to Lead/Generative Design/Fragment-based
Task-based/Virtual Screening - Structure-Based
Description
This floe provides methods to perform Generative Design alterations on the input lead molecule.
There are currently 5 methods: a custom method called Graft, Join, Matched Molecular Pairs (MMP), Sprout and a Trim method.
Site Selection This floe provides an option of designating a specific “site” for the analog transformations to occur on the single input molecule provided. The goal is to generate analogs involving changes at the specified site.
Input/Output molecules - Lead molecules will be transformed to analogs and saved to the output dataset on the output molecule field. For comparisons, a new field is added to the output records that contains the original molecule, stored as a SMILES. The lead molecule is removed from the output records unless the input field name is different than the output molecule field.
Methods - You can select one or more of Graft, Join, MMP, Sprout or Trim for examining the generative output, but at least one method must be selected. Methods that require an external index provide a default version for use. There are additional floes in this package that allow users to generate their custom Graft or MMP or indexes based on their own structures. Each method has tunable properties available on their respective Cubes for additional control of the method’s activities: Graft – Graft Database and Graft Analogs; Join – Join Analogs; MMP – Filter Transforms and MMP Analogs; Sprout – Sprout Analogs; and Trim – Trim Analogs.
Graft - A fragment replacement algorithm that replaces single fragments sequentially from the lead molecule, preserving 80% of the original structure. A default fragment index is provided, or user-generated index can be selected. The default index is based on tcams (malaria dataset), but other indexes may be available in the OpenEye Org stack data, or custom indexes can be generated from the Advanced Floes offerings.
Join - A method that uses provided pre-prepared reagents (with attachment site(s)) to join the reagent fragments to the specified site. Checking of the grafted group is performed to avoid labile or other unrealistic chemistry involving the joined group. If the prepared reagents being joined have >1 attachment point defined, the method selects the lowest numbered attachment point to make the connection to the input structure. An option is available to convert any other attachment points to implicit hydrogens, which can be suppressed. A default reagent set is provided based on the Topliss fragments.
MMP - A method that applies Matched Molecular Pair transformations from an internal default index or user-generated index. The default index is a set of matched pairs derived from ChEMBL, but other indexes may be available in the OpenEye Org stack data, or custom indexes can be generated from the Advanced Floes offerings.
Sprout - A simplistic atom sprouting method that can sprout atoms at site(s) specified by the user.
Trim - A simple side-chain trimming editor that trims atoms from terminal sites on the input structure(s). After each atom is removed, the new terminal atom alpha to the deletion site is validated against the user selected atom type, and deletion continues until the specified shell-depth, or until no new terminal atoms are found or are valid.
Properties - Simple properties can be computed on the generated analogs; any, or none, of the properties can be selected for output.
Filtering - Molecule filtering of various types can either be applied to the generated analogs or disabled entirely.
Promoted Parameters
Title in user interface (promoted name)
GD Promoted Parameters
Input Mol Field (gd_in_molfield): Name of the field containing the molecule(s) to be transformed
Type: field_parameter::mol
Output Mol Field (gd_out_molfield): Name of the field to contain the modified molecule(s)
Type: field_parameter::mol
Select Generative Method (gd_method): The generative method(s) to be used
Required
Type: string
Default: [‘Graft’, ‘Join’, ‘Matched Molecular Pairs’, ‘Sprout’, ‘Trim’]
Choices: [‘Graft’, ‘Join’, ‘Matched Molecular Pairs’, ‘Sprout’, ‘Trim’]
Compute Molecule Properties (gd_mol_props): Which molecule properties to calculate
Type: string
Default: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]
Choices: [‘HeavyAtoms’, ‘MedChemInterest’, ‘MolComplexity’, ‘MolWeight’, ‘TPSA’, ‘XLogP’]
Filter Output (gd_filtering): Enable molecule filtering of the generated analogs (see type specified by [Mol Filter])
Required
Type: boolean
Default: True
Choices: [True, False]
Mol Filter (gd_filter_type): Default type of molecule filter to apply to the generated analogs
Type: string
Default: BlockBuster
Choices: [‘Lead’, ‘Drug’, ‘BlockBuster’, ‘BlockBuster+PAINS’, ‘PAINS’, ‘Custom’]
GD Advanced Parameters
Lead Molecule Minimum Records (gd_rec_min): The minimum number of lead molecule records allowed (default:1) Input lead molecule datasets that do not meet this threshold will terminate the floe
Type: integer
Default: 1
Lead Molecule Maximum Records (gd_rec_max): The maximum number of lead molecule records allowed (default:1) Input lead molecule datasets that exceed this threshold will terminate the floe
Type: integer
Default: 1
Similarity FP Type (gd_sim_fptype): Select one fingerprint type to use for generating analog similarity values or None to disable. Only the first specified fingerprint type will be used
Type: string
Default: [‘None’]
Choices: [‘None’, ‘Circular’, ‘CircularVS’]
Analog Similarity (gd_sim_range): Targeted similarity range for generated analogs, or use min/max for a custom range
Type: string
Default: Unspecified
Choices: [‘Unspecified’, ‘dissimilar (0-0.3)’, ‘somewhat similar (0.3-0.6)’, ‘moderately similar (0.6-0.8)’, ‘highly similar (0.8-1.0)’]
Minimum FP Similarity (gd_sim_min): Minimum Tanimoto similarity value threshold for analogs
Type: decimal
Default: 0.1
Maximum FP Similarity (gd_sim_max): Maximum Tanimoto similarity value threshold for analogs
Type: decimal
Default: 1.0
Annotate Method (gd_dedupe_annotate): Annotate output records with the method that generated each unique structure (On), or use a faster method that just deduplicates (Off)
Required
Type: boolean
Default: True
Choices: [True, False]
Deduplication Memory Limit (gd_dedupe_memory): Structure deduplication may require significant memory resources, specify the desired memory limit in Mb
Type: decimal
Default: 1800
Retain Input Dataset Fields (gd_keepfields): If ON copies the input datarecord, if OFF, discards all but the structure (which will change) and sends it downstream for processing
Type: boolean
Default: True
Choices: [True, False]
Output Linked Field Format (gd_linkfields): If ON, a link to the original input record is output rather than a copy of the input fields - preferable for very large input records
Type: boolean
Default: False
Choices: [True, False]
Atom Limit (gd_atom_limit): Only generate analogs for input molecules that have <= this limit on the number of heavy atoms
Type: integer
Default: 500
Check Valences (gd_check_val): How to handle valence issues in generated analog structures
Required
Type: string
Default: reject
Choices: [‘reject’, ‘allow’, ‘fix’]
Eliminate halide (change) products (gd_no_halides): Do not output products related to halide changes
Type: boolean
Default: False
Choices: [True, False]
Verbosity (gd_verbosity): Sets the output logging verbosity
Type: string
Default: warning
Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]
Site Selection Highlight (gd_siteimage): Attach an image identifying the submol site selection to each output record
Type: boolean
Default: False
Choices: [True, False]
GD Matched Molecular Pair Method
Transform Collection (mmp_xformcoll): The name of the collection containing the MMP transformations, or None to use the default: chembl_25_dbprep_SD_70_100_xforms
Type: collection_source
Transformation Context (mmp_bondcontext): Amount of neighboring chemistry context from the substitution site to include with the transformation: bond0 - less precise, bond3 - more precise
Type: string
Default: [‘bond1’]
Choices: [‘any’, ‘bond0’, ‘bond1’, ‘bond2’, ‘bond3’]
Min MMPs (mmp_minmmps): Require >= this limit for the MMPs associated with the transformation (0: no limit)
Type: integer
Max MMPs (mmp_maxmmps): Require <= this limit for the MMPs associated with the transformation (0: no limit)
Type: integer
Maximum Matches (mmp_maxmatches): Limit the number of times the transformation(s) will be applied to the input molecule(s)
Type: integer
Default: 10
Limit Matches (mmp_limitmatches): Require this limit on the number of transformation(s) sites on the input molecule(s) for the transformation to be applied (0:unconstrained)
Type: integer
Default: 0
Validate Kekule (mmp_validatekekule): Whether to verify Kekulization during application of the transformation(s)
Type: boolean
Default: True
Choices: [True, False]
Strict Valences (mmp_strictvalences): If Check Valences is active, any valence issues found after the transformation is applied terminates further application
Type: boolean
Default: True
Choices: [True, False]
Strict SMIRKS (mmp_strictsmirks): Whether to require strict SMIRKS parsing of the transformation(s)
Type: boolean
Default: True
Choices: [True, False]
GD Sprout Method
Atom sites allowed (spr_allowedsites): Choose the allowed sites for substitution
Required
Type: string
Default: [‘neutral_C’]
Choices: [‘neutral_C’, ‘terminal_C’, ‘aromatic_C’, ‘any_C’, ‘any_heavy’]
Atom type(s) to sprout (spr_sprouttypes): Sprout Atoms
Required
Type: string
Default: [‘Carbon’, ‘Fluorine’, ‘Oxygen’]
Choices: [‘Carbon’, ‘Nitrogen’, ‘Fluorine’, ‘Oxygen’, ‘HaloSubset:[F,Cl,Br]’]
Min Site Hydrogens (spr_minhyds): Require subsitution sites to have >= this number of hydrogens (default:1, unconstrained:0)
Type: integer
Default: 1
Max Site Hydrogens (spr_maxhyds): Require subsitution sites to have <= this number of hydrogens (0: no limit)
Type: integer
Default: 0
Strict Stereo (spr_strictstereo): Whether restrict sprout atom sites to non-stereo atoms only (default: On)
Type: boolean
Default: True
Choices: [True, False]
Prohibit Hetero-Hetero Sprout (spr_heterohetero): Disallow sprouting of heteroatoms at heteroatom sites (default: On)
Type: boolean
Default: True
Choices: [True, False]
GD Trim Method
Atom sites allowed (trm_allowedtrimsites): Choose the allowed site(s) for initiating the trimming activity
Required
Type: string
Default: [‘terminal_C’]
Choices: [‘terminal_C’, ‘terminal_N’, ‘terminal_O’, ‘terminal_any’, ‘halogen’]
Trimming Distance (trm_depth): Defines the maximum bond distance for the trimming activity (0:allowed site atoms only)
Required
Type: integer
Default: 99
GD Graft Method
Graft Database (gd_graftdb): The name of the Graft database to use, or None to use the default database
Type: file_in
Hmember selection (gft_Hmembers): For input molecule selections, also process the H-member equivalent input structure
Type: boolean
Default: False
Choices: [True, False]
Unique Analogs (gft_unique): Whether to deduplicate generated analogs
Type: boolean
Default: True
Choices: [True, False]
Retain Percentage (keeppcnt): Percentage of top graft analogs to retain
Type: integer
Default: 50
GD Join Method
Maximum Reagents (join_maxreags): Only process this number of reagents from the reagent input
Type: integer
Maximum Reagent Size (join_maxreagsize): Ignore reagents with more than this number of heavy atoms
Type: integer
Default: 100
Attach Sites to H (join_star2hyd): Convert all additional *atom attachment sites on the reagent to hydrogen after the join
Type: boolean
Default: True
Choices: [True, False]
Reagent File (join_reagentfile): The name of the Orion structure file containing prepared reagents
Type: file_in
Reagent Dataset (join_reagentdata): The name of the dataset containing prepared reagents
Type: data_source
Reagent Molecule (join_reagent_mol_field): Reagent Dataset molecule field containing the prepared reagent structures
Type: field_parameter::mol
Reagent SMILES (join_reagent_smi_field): Reagent Dataset string field containing the reagent SMILES
Type: field_parameter::string
Inputs
Lead Molecule (ui_result): Draw a lead molecule, or select a record from an existing dataset and annotate a single contiguous attachment site for processing
Required
Type: fragment_input
Outputs
Output Dataset (output): Output dataset for Generative Design analogs
Required
Type: dataset_out
Default: GenDesign_site_analogs