Large Scale Reaction Enumeration v0.1.10¶
Category Paths
Description
This Floe launches the enumeration of a single reaction id from the uploaded reagent collection.
General Usage
In general, this Floe is typically not run directly, but instead is run via the launching Floe, Launch Reaction Enumerations , as it contains a large number of tunable parameters, many of which have direct impact on the scaling characteristics of this enumeration Floe.
Advanced Usage
Reaction ID - a single reaction id from the specified reagent collection ( Reagent Collection ID ) to be enumerated.
Reagent Limit - Limit the number of reagents to this maximum, generally used to perform preliminary product enumerations for testing and evaluation.
Append Enumeration Products To Collection - name of the output collection for output of the product records. If the
collection does not exist, it will be created, and if it exists, the product shards will be added to the collection.
Allowing the append means that the specified output collection will never be closed to allow additional product
enumerations to populate the collection. It is recommended that ocli
be used to close the collection once all the
enumerations have been completed.
Limit Heavyatoms To >= This Value, Limit Heavyatoms To <= This Value - used to filter the products by the specified heavy atom limits.
Limit Molweight To >= This Value, Limit Molweight To <= This Value - used to filter the products by the specified molecular weight limits.
Limit TPSA To >= This Value, Limit TPSA To <= This Value - used to filter the products by the specified total polar surface area limits.
Limit XLogP To >= This Value, Limit XLogP To <= This Value - used to filter the products by the specified XLogP limits.
Prefetch A Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)
Prefetch B Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)
Prefetch Product Workers - The number of cpus to spin up on startup to be ready for enumeration products (caution!)
Product Molecule Field - Name of the molecule field to contain the enumerated product molecules.
Product Smiles Field - Name of the string field to contain the enumerated product smiles.
Product ID Field - Name of the string field to contain the enumerated product ids.
Product ID Style - the desired style of output product ids.
Product ID Delimiter - the delimiter to use for the Custom style of product ids.
Product Rec/Shard - desired records/shard for the final output collection shards.
Smiles Cleanup Level - Level of smiles cleanup to perform on the products
Output Reagent Properties - Defines which reagent properties should be added to the product output records.
Collection A Rec/Shard, Collection B Rec/Shard, Collection C Rec/Shard, Collection D Rec/Shard - these parameters have direct control over the scale-up behavior of the Floe. Because shards (containing multiple records for each shard) are passed to each step in the concatenation of reagents, the general rule of thumb is to have very small rec/shard early in the Floe and much larger rec/shard later in the Floe. The optimum value will also be reaction-dependent as a reaction might have a very small number of A reagents, a very large number of B reagents, and small C and/or D reagents. A general rule is to increase the rec/shard by roughly an order of magnitude at each concatenation stage, but modulated by the (reaction-specific) number of reagents involved.
Concat B Group Max Parallel - The maximum allowed process count for concatenating B reagents.
Concat C Group Max Parallel - The maximum allowed process count for concatenating C reagents.
Concat D Group Max Parallel - The maximum allowed process count for concatenating D reagents.
Cleanup Group Max Parallel - The maximum allowed process count for final cleanup activities.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Reagent Collection ID (reagent_collection): Reagent shard collection
Required
Type: collection_source
Reaction ID (reaction_id): Reaction ID to enumerate
Required
Type: string
Outputs
Append Enumeration Products to Collection (coll_append_output): The name or id of the collection to append enumeration products to
Type: string
Product Molecule Field (molecule_field): Name of the field to contain the enumerated product molecules
Type: string
Required Enumeration Parameters
Product rec/shard (product_records_per_shard): Product Records per shard
Required
Type: integer
Collection A rec/shard (Coll_A_records_per_shard): Collection A Records per shard (affects scaling)
Required
Type: integer
Collection B rec/shard (Coll_B_records_per_shard): Collection B Records per shard (affects scaling)
Required
Type: integer
Collection C rec/shard (Coll_C_records_per_shard): Collection C Records per shard (affects scaling)
Required
Type: integer
Collection D rec/shard (Coll_D_records_per_shard): Collection D Records per shard (affects scaling)
Required
Type: integer
Enumeration Filtering Options
Reagent Limit (reagent_limit): Limit the number of reagents to this maximum
Type: integer
Limit heavyatoms to >= this value (min_heavyatoms): Filter products to be >= this minimum value
Type: integer
Limit heavyatoms to <= this value (max_heavyatoms): Filter products to be <= this maximum value
Type: integer
Limit molweight to >= this value (min_molweight): Filter products to be >= this minimum value
Type: decimal
Limit molweight to <= this value (max_molweight): Filter products to be <= this maximum value
Type: decimal
Limit TPSA to >= this value (min_TPSA): Filter products to be >= this minimum value
Type: decimal
Limit TPSA to <= this value (max_TPSA): Filter products to be <= this maximum value
Type: decimal
Limit XLogP to >= this value (min_XLogP): Filter products to be >= this minimum value
Type: decimal
Limit XLogP to <= this value (max_XLogP): Filter products to be <= this maximum value
Type: decimal
Enumeration Property Options
Output Reagent Properties (add_reagent_properties): Choose which reagent properties should be added to the output
Type: string
Default: None
Choices: [‘None’, ‘All’, ‘Filtered’]
Output Reagent Properties (reagent_properties): Choose which computed reagent properties should be added to the output
Type: string
Default: None
Choices: [‘None’, ‘All’, ‘Filtered’]
Enumeration Advanced Options
Product ID Field (product_id): Name of the field to contain the enumerated product ids
Type: field_parameter::string
Default: product_id
Product Smiles Field (product_field): Name of the field to contain the enumerated product smiles
Required
Type: field_parameter::string
Default: product
Output Product Properties (product_properties): Choose which computed product properties should be added to the output
Type: string
Default: All
Choices: [‘None’, ‘All’, ‘Filtered’]
Product ID Style (prod_id_style): Style of output product ids
Required
Type: string
Default: EnamineREALSpace
Choices: [‘None’, ‘EnamineREALSpace’, ‘Custom’]
Product ID Delimiter (prod_id_delim): Delimiter to use for the Custom id style
Type: string
Smiles Cleanup Level (smiles_cleanup): Level of smiles cleanup to perform on the products
Required
Type: string
Default: Full
Choices: [‘None’, ‘Fast’, ‘Full’]
Enumeration Performance Control Options
Prefetch A Reagent Workers (A_prefetch_cpus): The number of cpus to spin up on startup to be ready for reagent A enumerations
Type: integer
Default: 0
Prefetch B Reagent Workers (B_prefetch_cpus): The number of cpus to spin up on startup to be ready for reagent B enumerations
Type: integer
Default: 0
Prefetch Product Workers (Product_prefetch_cpus): The number of cpus to spin up on startup to be ready for enumeration products
Type: integer
Default: 0
Concat B Group Max Parallel (Bgroup_max_parallel): The maximum allowed process count for adding B reagents
Type: integer
Default: 1000
Concat C Group Max Parallel (Cgroup_max_parallel): The maximum allowed process count for adding C reagents
Type: integer
Default: 1000
Concat D Group Max Parallel (Dgroup_max_parallel): The maximum allowed process count for adding D reagents
Type: integer
Default: 1000
Cleanup Group Max Parallel (Cleanup_max_parallel): The maximum allowed process count for final cleanup activities
Type: integer
Default: 1000