Large Scale Reaction Enumeration v0.1.9

This Floe launches the enumeration of a single reaction id from the uploaded reagent collection.

General Usage

In general, this Floe is typically not run directly, but instead is run via the launching Floe, Launch Reaction Enumerations , as it contains a large number of tunable parameters, many of which have direct impact on the scaling characteristics of this enumeration Floe.

Advanced Usage

Reaction ID - a single reaction id from the specified reagent collection ( Reagent Collection ID ) to be enumerated.

Reagent Limit - Limit the number of reagents to this maximum, generally used to perform preliminary product enumerations for testing and evaluation.

Append Enumeration Products To Collection - name of the output collection for output of the product records. If the collection does not exist, it will be created, and if it exists, the product shards will be added to the collection. Allowing the append means that the specified output collection will never be closed to allow additional product enumerations to populate the collection. It is recommended that ocli be used to close the collection once all the enumerations have been completed.

Limit Heavyatoms To >= This Value, Limit Heavyatoms To <= This Value - used to filter the products by the specified heavy atom limits.

Limit Molweight To >= This Value, Limit Molweight To <= This Value - used to filter the products by the specified molecular weight limits.

Limit TPSA To >= This Value, Limit TPSA To <= This Value - used to filter the products by the specified total polar surface area limits.

Limit XLogP To >= This Value, Limit XLogP To <= This Value - used to filter the products by the specified XLogP limits.

Prefetch A Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)

Prefetch B Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)

Prefetch Product Workers - The number of cpus to spin up on startup to be ready for enumeration products (caution!)

Product Molecule Field - Name of the molecule field to contain the enumerated product molecules.

Product Smiles Field - Name of the string field to contain the enumerated product smiles.

Product ID Field - Name of the string field to contain the enumerated product ids.

Product ID Style - the desired style of output product ids.

Product ID Delimiter - the delimiter to use for the Custom style of product ids.

Product Rec/Shard - desired records/shard for the final output collection shards.

Smiles Cleanup Level - Level of smiles cleanup to perform on the products

Output Reagent Properties - Defines which reagent properties should be added to the product output records.

Collection A Rec/Shard, Collection B Rec/Shard, Collection C Rec/Shard, Collection D Rec/Shard - these parameters have direct control over the scale-up behavior of the Floe. Because shards (containing multiple records for each shard) are passed to each step in the concatenation of reagents, the general rule of thumb is to have very small rec/shard early in the Floe and much larger rec/shard later in the Floe. The optimum value will also be reaction-dependent as a reaction might have a very small number of A reagents, a very large number of B reagents, and small C and/or D reagents. A general rule is to increase the rec/shard by roughly an order of magnitude at each concatenation stage, but modulated by the (reaction-specific) number of reagents involved.

Concat B Group Max Parallel - The maximum allowed process count for concatenating B reagents.

Concat C Group Max Parallel - The maximum allowed process count for concatenating C reagents.

Concat D Group Max Parallel - The maximum allowed process count for concatenating D reagents.

Cleanup Group Max Parallel - The maximum allowed process count for final cleanup activities.

Extra Required Parameters

  • Product rec/shard (integer) : Product Records per shard
  • Reagent Role for Validation (string) : Which reagent role is being validated

    Choices: A, B, C, D
  • Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smiles
    Default: product
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, oeb, oeb.gz, ism.gz
  • Product rec/shard (integer) : Product Records per shard
  • Field containing the SMILES string to be processed (Field Type: String) :
    Default: product
  • Enable or disable routing shards to the success port (string) :
    Default: SuccessPort
    Choices: ByPassPort, SuccessPort, ReadSwitchPort
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Product rec/shard (integer) : Product Records per shard
  • Product rec/shard (integer) : Product Records per shard
  • Reagent Role for Validation (string) : Which reagent role is being validated

    Choices: A, B, C, D
  • Enable or disable routing shards to the success port (string) :
    Default: SuccessPort
    Choices: ByPassPort, SuccessPort, ReadSwitchPort
  • Product rec/shard (integer) : Product Records per shard
  • Product rec/shard (integer) : Product Records per shard
  • Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smiles
    Default: product
  • Product ID Field (Field Type: String) : Name of the field to contain the enumerated product ids
    Default: product_id
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, oeb, oeb.gz, ism.gz
  • Collection B rec/shard (integer) : Collection B Records per shard (affects scaling)
  • Product ID Style (string) : Style of output product ids
    Default: EnamineREALSpace
    Choices: None, EnamineREALSpace, Custom
  • Product ID Field (Field Type: String) : Name of the field to contain the enumerated product ids
    Default: product_id
  • Field containing the SMILES string to be processed (Field Type: String) :
    Default: product
  • Enable or disable routing shards to the success port (string) :
    Default: SuccessPort
    Choices: ByPassPort, SuccessPort, ReadSwitchPort
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Reagent Role for Validation (string) : Which reagent role is being validated

    Choices: A, B, C, D
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, oeb, oeb.gz, ism.gz
  • Collection D rec/shard (integer) : Collection D Records per shard (affects scaling)
  • Product rec/shard (integer) : Product Records per shard
  • Product rec/shard (integer) : Product Records per shard
  • Product rec/shard (integer) : Product Records per shard
  • Shard Format (string) : The format of the data that shards contain. Used in validation.
    Default: “”
    Choices: :green:``, ism.gz, oez, oeb, oeb.gz, oedb
  • Field containing the SMILES string to be processed (Field Type: String) :
    Default: product
  • Reagent Role for Validation (string) : Which reagent role is being validated

    Choices: A, B, C, D
  • Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smiles
    Default: product
  • Smiles Cleanup Level (string) : Level of smiles cleanup to perform on the products
    Default: Full
    Choices: None, Fast, Full
  • Product rec/shard (integer) : Product Records per shard
  • Reaction ID (string) : Reaction ID to enumerate
  • Reagent Collection ID (collection_source) : Reagent shard collection
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Product rec/shard (integer) : Product Records per shard
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Shard Format (string) : The format of the data that shards contain
    Default: oedb
    Choices: ism.gz, oez, oeb, oeb.gz, oedb
  • Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smiles
    Default: product
  • Product ID Field (Field Type: String) : Name of the field to contain the enumerated product ids
    Default: product_id
  • Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smiles
    Default: product
  • Product ID Field (Field Type: String) : Name of the field to contain the enumerated product ids
    Default: product_id
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, oeb, oeb.gz, ism.gz
  • Collection A rec/shard (integer) : Collection A Records per shard (affects scaling)
  • Field containing the SMILES string to be processed (Field Type: String) :
    Default: product
  • Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smiles
    Default: product
  • Product ID Field (Field Type: String) : Name of the field to contain the enumerated product ids
    Default: product_id
  • Output Shard Format (string) : The format of the data that shards will contain
    Default: oedb
    Choices: oedb, oeb, oeb.gz, ism.gz
  • Collection C rec/shard (integer) : Collection C Records per shard (affects scaling)
  • Product rec/shard (integer) : Product Records per shard