Generative Structure Floe

This Floe provides methods to perform Generative Design alterations on the input lead molecule.

There are currently 4 methods - a custom method called Graft, Matched Molecular Pairs (MMP), Sprout and a Trim method.

For each lead molecule, there can be a 50-500 fold increase in the number of output structures for each molecule - therefore it is recommended to use relatively small input datasets. The default input limit is 10 structures and care should be used when choosing any limit greatly beyond that.

Input/Ouput molecules - Lead molecules will be transformed to analogs and saved to the output dataset on the output molecule field. For comparisons, a new field is added to the output records that contains the original molecule, stored as SMILES. The lead molecule is removed from the output records unless the input fieldname is different than the output molecule field.

Methods - You can select one or more of Graft, MMP, Sprout or Trim for examining the generative output, but at least one method must be selected. Methods that require an external index provide a default version for use. There are additional Floes in this package that allow users to generate their custom Graft or MMP or indexes based on their own structures. Each method has tunable properties available on their respective Cubes for additional control of the method’s activities: Graft/Graft Database, Graft Analogs, MMP/Filter Transforms, MMP Analogs, Sprout/Sprout Analogs and Trim/Trim Analogs

Graft - A fragment replacement algorithm that replaces single fragments sequentially from the lead molecule, preserving 80% of the original structure. A default fragment index is provided, or user-generated index can be selected. The default index is based on tcams (malaria dataset), but other indexes may be available in the OpenEye Org stack data, or custom indexes can be generated from the Advanced floes offerings.

MMP - A method that applies Matched Molecular Pair transformations from an internal default index, or user-generated index. The default index is a set of matched pairs derived from ChEMBL, but other indexes may be available in the OpenEye Org stack data, or custom indexes can be generated from the Advanced floes offerings.

Sprout - A simplistic atom sprouting method that can sprout atoms at site(s) specified by the user.

Trim - A simple side-chain trimming editor that trims back atoms from terminal sites on the input structure(s). After each atom is removed, the new terminal atom alpha to the deletion site is validated against the user selected atom type, and deletion continues until the specified shell-depth, or until no new terminal atoms are found or valid.

Properties - Simple properties can be computed on the generated analogs: any, or none, of the properties can be selected for output.

Filtering - Molecule filtering of various types can be applied to the generated analogs, or disabled entirely.

Extra Required Parameters

  • Filter Output (boolean) : Enable molecule filtering of the generated analogs (see type specified by [Mol Filter])
    Default: True
  • Output Dataset (dataset_out) : Output dataset for Generative Design analogs
    Default: GenDesign_analogs
  • Select Generative Method (string) : The generative method(s) to be used
    Default: [‘Graft’, ‘Matched Molecular Pairs’, ‘Sprout’, ‘Trim’]
    Choices: Graft, Matched Molecular Pairs, Sprout, Trim
  • Check Valences (string) : How to handle valence issues in generated analog structures
    Default: reject
    Choices: reject, allow, fix
  • Input or Append Lead Molecule Dataset (data_source) : The input lead molecule dataset(s) to read records from and/or to append to when the appending option is enabled. If OFF, the dataset specified as the Output Dataset (see Output parameters) is created
  • Molecule Primary Key Field (Field Type: String) : String fieldname to use for recording collision de-duplications
  • Annotate Method (boolean) : Annotate output records with the method that generated each unique structure (On), or use a faster method that just deduplicates (Off)
    Default: True
  • Append Records (boolean) : Append the records to the specified input dataset (On), or creates the named Output Dataset (Off)
    Default: False
  • Input or Append Lead Molecule Dataset (data_source) : The input lead molecule dataset(s) to read records from and to append to when the appending option is enabled
  • Check Valences (string) : How to handle valence issues in generated analog structures
    Default: reject
    Choices: reject, allow, fix
  • Atom sites allowed (string) : Choose the allowed site(s) for initiating the trimming activity
    Default: [‘terminal_C’]
    Choices: terminal_C, terminal_N, terminal_O, terminal_any, halogen
  • Trimming Distance (integer) : Defines the maximum bond distance for the trimming activity (0:allowed site atoms only)
    Default: 99
  • Atom sites allowed (string) : Choose the allowed sites for substitution
    Default: [‘neutral_C’]
    Choices: neutral_C, terminal_C, aromatic_C, any_C, any_heavy
  • Atom type(s) to sprout (string) : Sprout Atoms
    Default: [‘Carbon’, ‘Fluorine’, ‘Oxygen’]
    Choices: Carbon, Nitrogen, Fluorine, Oxygen, HaloSubset:[F,Cl,Br]