Large Scale Reaction Enumeration v0.1.9¶
This Floe launches the enumeration of a single reaction id from the uploaded reagent collection.
General Usage
In general, this Floe is typically not run directly, but instead is run via the launching Floe, Launch Reaction Enumerations , as it contains a large number of tunable parameters, many of which have direct impact on the scaling characteristics of this enumeration Floe.
Advanced Usage
Reaction ID - a single reaction id from the specified reagent collection ( Reagent Collection ID ) to be enumerated.
Reagent Limit - Limit the number of reagents to this maximum, generally used to perform preliminary product enumerations for testing and evaluation.
Append Enumeration Products To Collection - name of the output collection for output of the product records. If the
collection does not exist, it will be created, and if it exists, the product shards will be added to the collection.
Allowing the append means that the specified output collection will never be closed to allow additional product
enumerations to populate the collection. It is recommended that ocli
be used to close the collection once all the
enumerations have been completed.
Limit Heavyatoms To >= This Value, Limit Heavyatoms To <= This Value - used to filter the products by the specified heavy atom limits.
Limit Molweight To >= This Value, Limit Molweight To <= This Value - used to filter the products by the specified molecular weight limits.
Limit TPSA To >= This Value, Limit TPSA To <= This Value - used to filter the products by the specified total polar surface area limits.
Limit XLogP To >= This Value, Limit XLogP To <= This Value - used to filter the products by the specified XLogP limits.
Prefetch A Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)
Prefetch B Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)
Prefetch Product Workers - The number of cpus to spin up on startup to be ready for enumeration products (caution!)
Product Molecule Field - Name of the molecule field to contain the enumerated product molecules.
Product Smiles Field - Name of the string field to contain the enumerated product smiles.
Product ID Field - Name of the string field to contain the enumerated product ids.
Product ID Style - the desired style of output product ids.
Product ID Delimiter - the delimiter to use for the Custom style of product ids.
Product Rec/Shard - desired records/shard for the final output collection shards.
Smiles Cleanup Level - Level of smiles cleanup to perform on the products
Output Reagent Properties - Defines which reagent properties should be added to the product output records.
Collection A Rec/Shard, Collection B Rec/Shard, Collection C Rec/Shard, Collection D Rec/Shard - these parameters have direct control over the scale-up behavior of the Floe. Because shards (containing multiple records for each shard) are passed to each step in the concatenation of reagents, the general rule of thumb is to have very small rec/shard early in the Floe and much larger rec/shard later in the Floe. The optimum value will also be reaction-dependent as a reaction might have a very small number of A reagents, a very large number of B reagents, and small C and/or D reagents. A general rule is to increase the rec/shard by roughly an order of magnitude at each concatenation stage, but modulated by the (reaction-specific) number of reagents involved.
Concat B Group Max Parallel - The maximum allowed process count for concatenating B reagents.
Concat C Group Max Parallel - The maximum allowed process count for concatenating C reagents.
Concat D Group Max Parallel - The maximum allowed process count for concatenating D reagents.
Cleanup Group Max Parallel - The maximum allowed process count for final cleanup activities.
Extra Required Parameters
Product rec/shard (integer) : Product Records per shard Reagent Role for Validation (string) : Which reagent role is being validatedChoices: A, B, C, D Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smilesDefault: product Output Shard Format (string) : The format of the data that shards will containDefault: oedbChoices: oedb, oeb, oeb.gz, ism.gz Product rec/shard (integer) : Product Records per shard Field containing the SMILES string to be processed (Field Type: String) :Default: product Enable or disable routing shards to the success port (string) :Default: SuccessPortChoices: ByPassPort, SuccessPort, ReadSwitchPort Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Product rec/shard (integer) : Product Records per shard Product rec/shard (integer) : Product Records per shard Reagent Role for Validation (string) : Which reagent role is being validatedChoices: A, B, C, D Enable or disable routing shards to the success port (string) :Default: SuccessPortChoices: ByPassPort, SuccessPort, ReadSwitchPort Product rec/shard (integer) : Product Records per shard Product rec/shard (integer) : Product Records per shard Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smilesDefault: product Product ID Field (Field Type: String) : Name of the field to contain the enumerated product idsDefault: product_id Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Output Shard Format (string) : The format of the data that shards will containDefault: oedbChoices: oedb, oeb, oeb.gz, ism.gz Collection B rec/shard (integer) : Collection B Records per shard (affects scaling) Product ID Style (string) : Style of output product idsDefault: EnamineREALSpaceChoices: None, EnamineREALSpace, Custom Product ID Field (Field Type: String) : Name of the field to contain the enumerated product idsDefault: product_id Field containing the SMILES string to be processed (Field Type: String) :Default: product Enable or disable routing shards to the success port (string) :Default: SuccessPortChoices: ByPassPort, SuccessPort, ReadSwitchPort Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Reagent Role for Validation (string) : Which reagent role is being validatedChoices: A, B, C, D Output Shard Format (string) : The format of the data that shards will containDefault: oedbChoices: oedb, oeb, oeb.gz, ism.gz Collection D rec/shard (integer) : Collection D Records per shard (affects scaling) Product rec/shard (integer) : Product Records per shard Product rec/shard (integer) : Product Records per shard Product rec/shard (integer) : Product Records per shard Shard Format (string) : The format of the data that shards contain. Used in validation.Default: “” Field containing the SMILES string to be processed (Field Type: String) :Default: product Reagent Role for Validation (string) : Which reagent role is being validatedChoices: A, B, C, D Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smilesDefault: product Smiles Cleanup Level (string) : Level of smiles cleanup to perform on the productsDefault: FullChoices: None, Fast, Full Product rec/shard (integer) : Product Records per shard Reaction ID (string) : Reaction ID to enumerate Reagent Collection ID (collection_source) : Reagent shard collection Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Product rec/shard (integer) : Product Records per shard Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Shard Format (string) : The format of the data that shards containDefault: oedbChoices: ism.gz, oez, oeb, oeb.gz, oedb Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smilesDefault: product Product ID Field (Field Type: String) : Name of the field to contain the enumerated product idsDefault: product_id Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smilesDefault: product Product ID Field (Field Type: String) : Name of the field to contain the enumerated product idsDefault: product_id Output Shard Format (string) : The format of the data that shards will containDefault: oedbChoices: oedb, oeb, oeb.gz, ism.gz Collection A rec/shard (integer) : Collection A Records per shard (affects scaling) Field containing the SMILES string to be processed (Field Type: String) :Default: product Product Smiles Field (Field Type: String) : Name of the field to contain the enumerated product smilesDefault: product Product ID Field (Field Type: String) : Name of the field to contain the enumerated product idsDefault: product_id Output Shard Format (string) : The format of the data that shards will containDefault: oedbChoices: oedb, oeb, oeb.gz, ism.gz Collection C rec/shard (integer) : Collection C Records per shard (affects scaling) Product rec/shard (integer) : Product Records per shard