Large Scale Reaction Enumeration v0.1.10

Category Paths

Description

This Floe launches the enumeration of a single reaction id from the uploaded reagent collection.

General Usage

In general, this Floe is typically not run directly, but instead is run via the launching Floe, Launch Reaction Enumerations , as it contains a large number of tunable parameters, many of which have direct impact on the scaling characteristics of this enumeration Floe.

Advanced Usage

Reaction ID - a single reaction id from the specified reagent collection ( Reagent Collection ID ) to be enumerated.

Reagent Limit - Limit the number of reagents to this maximum, generally used to perform preliminary product enumerations for testing and evaluation.

Append Enumeration Products To Collection - name of the output collection for output of the product records. If the collection does not exist, it will be created, and if it exists, the product shards will be added to the collection. Allowing the append means that the specified output collection will never be closed to allow additional product enumerations to populate the collection. It is recommended that ocli be used to close the collection once all the enumerations have been completed.

Limit Heavyatoms To >= This Value, Limit Heavyatoms To <= This Value - used to filter the products by the specified heavy atom limits.

Limit Molweight To >= This Value, Limit Molweight To <= This Value - used to filter the products by the specified molecular weight limits.

Limit TPSA To >= This Value, Limit TPSA To <= This Value - used to filter the products by the specified total polar surface area limits.

Limit XLogP To >= This Value, Limit XLogP To <= This Value - used to filter the products by the specified XLogP limits.

Prefetch A Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)

Prefetch B Reagent Workers - specifies the number of cpus to spin up on startup to be ready for reagent A enumerations (caution!)

Prefetch Product Workers - The number of cpus to spin up on startup to be ready for enumeration products (caution!)

Product Molecule Field - Name of the molecule field to contain the enumerated product molecules.

Product Smiles Field - Name of the string field to contain the enumerated product smiles.

Product ID Field - Name of the string field to contain the enumerated product ids.

Product ID Style - the desired style of output product ids.

Product ID Delimiter - the delimiter to use for the Custom style of product ids.

Product Rec/Shard - desired records/shard for the final output collection shards.

Smiles Cleanup Level - Level of smiles cleanup to perform on the products

Output Reagent Properties - Defines which reagent properties should be added to the product output records.

Collection A Rec/Shard, Collection B Rec/Shard, Collection C Rec/Shard, Collection D Rec/Shard - these parameters have direct control over the scale-up behavior of the Floe. Because shards (containing multiple records for each shard) are passed to each step in the concatenation of reagents, the general rule of thumb is to have very small rec/shard early in the Floe and much larger rec/shard later in the Floe. The optimum value will also be reaction-dependent as a reaction might have a very small number of A reagents, a very large number of B reagents, and small C and/or D reagents. A general rule is to increase the rec/shard by roughly an order of magnitude at each concatenation stage, but modulated by the (reaction-specific) number of reagents involved.

Concat B Group Max Parallel - The maximum allowed process count for concatenating B reagents.

Concat C Group Max Parallel - The maximum allowed process count for concatenating C reagents.

Concat D Group Max Parallel - The maximum allowed process count for concatenating D reagents.

Cleanup Group Max Parallel - The maximum allowed process count for final cleanup activities.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Reagent Collection ID (reagent_collection): Reagent shard collection

  • Required

  • Type: collection_source

Reaction ID (reaction_id): Reaction ID to enumerate

  • Required

  • Type: string

Outputs

Append Enumeration Products to Collection (coll_append_output): The name or id of the collection to append enumeration products to

  • Type: string

Product Molecule Field (molecule_field): Name of the field to contain the enumerated product molecules

  • Type: string

Required Enumeration Parameters

Product rec/shard (product_records_per_shard): Product Records per shard

  • Required

  • Type: integer

Collection A rec/shard (Coll_A_records_per_shard): Collection A Records per shard (affects scaling)

  • Required

  • Type: integer

Collection B rec/shard (Coll_B_records_per_shard): Collection B Records per shard (affects scaling)

  • Required

  • Type: integer

Collection C rec/shard (Coll_C_records_per_shard): Collection C Records per shard (affects scaling)

  • Required

  • Type: integer

Collection D rec/shard (Coll_D_records_per_shard): Collection D Records per shard (affects scaling)

  • Required

  • Type: integer

Enumeration Filtering Options

Reagent Limit (reagent_limit): Limit the number of reagents to this maximum

  • Type: integer

Limit heavyatoms to >= this value (min_heavyatoms): Filter products to be >= this minimum value

  • Type: integer

Limit heavyatoms to <= this value (max_heavyatoms): Filter products to be <= this maximum value

  • Type: integer

Limit molweight to >= this value (min_molweight): Filter products to be >= this minimum value

  • Type: decimal

Limit molweight to <= this value (max_molweight): Filter products to be <= this maximum value

  • Type: decimal

Limit TPSA to >= this value (min_TPSA): Filter products to be >= this minimum value

  • Type: decimal

Limit TPSA to <= this value (max_TPSA): Filter products to be <= this maximum value

  • Type: decimal

Limit XLogP to >= this value (min_XLogP): Filter products to be >= this minimum value

  • Type: decimal

Limit XLogP to <= this value (max_XLogP): Filter products to be <= this maximum value

  • Type: decimal

Enumeration Property Options

Output Reagent Properties (add_reagent_properties): Choose which reagent properties should be added to the output

  • Type: string

  • Default: None

  • Choices: [‘None’, ‘All’, ‘Filtered’]

Output Reagent Properties (reagent_properties): Choose which computed reagent properties should be added to the output

  • Type: string

  • Default: None

  • Choices: [‘None’, ‘All’, ‘Filtered’]

Enumeration Advanced Options

Product ID Field (product_id): Name of the field to contain the enumerated product ids

  • Type: field_parameter::string

  • Default: product_id

Product Smiles Field (product_field): Name of the field to contain the enumerated product smiles

  • Required

  • Type: field_parameter::string

  • Default: product

Output Product Properties (product_properties): Choose which computed product properties should be added to the output

  • Type: string

  • Default: All

  • Choices: [‘None’, ‘All’, ‘Filtered’]

Product ID Style (prod_id_style): Style of output product ids

  • Required

  • Type: string

  • Default: EnamineREALSpace

  • Choices: [‘None’, ‘EnamineREALSpace’, ‘Custom’]

Product ID Delimiter (prod_id_delim): Delimiter to use for the Custom id style

  • Type: string

Smiles Cleanup Level (smiles_cleanup): Level of smiles cleanup to perform on the products

  • Required

  • Type: string

  • Default: Full

  • Choices: [‘None’, ‘Fast’, ‘Full’]

Enumeration Performance Control Options

Prefetch A Reagent Workers (A_prefetch_cpus): The number of cpus to spin up on startup to be ready for reagent A enumerations

  • Type: integer

  • Default: 0

Prefetch B Reagent Workers (B_prefetch_cpus): The number of cpus to spin up on startup to be ready for reagent B enumerations

  • Type: integer

  • Default: 0

Prefetch Product Workers (Product_prefetch_cpus): The number of cpus to spin up on startup to be ready for enumeration products

  • Type: integer

  • Default: 0

Concat B Group Max Parallel (Bgroup_max_parallel): The maximum allowed process count for adding B reagents

  • Type: integer

  • Default: 1000

Concat C Group Max Parallel (Cgroup_max_parallel): The maximum allowed process count for adding C reagents

  • Type: integer

  • Default: 1000

Concat D Group Max Parallel (Dgroup_max_parallel): The maximum allowed process count for adding D reagents

  • Type: integer

  • Default: 1000

Cleanup Group Max Parallel (Cleanup_max_parallel): The maximum allowed process count for final cleanup activities

  • Type: integer

  • Default: 1000