Reaction & Reagent Database - Launch Product Enumerations¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Task-based/Library Prep & Design/Reaction-based Enumeration
Role-based/Medicinal Chemist
Description
PRELIMINARY RELEASE This preliminary floe launches individual reaction enumeration floes for the selected reaction ID(s) from the provided Reaction & Reagent Database.
WARNING For individual reaction libraries in the 500M size or larger, contact support@eyesopen.com for assistance.
WARNING Extremely large enumerations may place undue burden on the entire Orion stack, so extreme care should be exercised when enumerating unconstrained runs of “All” reactions for very large Reaction & Reagent Database libraries. Judicious use of limits on the minimum/maximum number of products is suggested to ensure the result of the enumeration can be completed, and to generate an output collection of a size and shard_size that is amenable to follow-on processing needs. Concurrent job limits, cost, and time-limit constraints can also be used to limit the impact on the stack and for all Orion users.
General Usage
Careful investigation using the specific Reaction & Reagent Database of interest is highly encouraged. Running the Reaction & Reagent Database - Directory Listing Floe on the desired database is recommended to verify the number of potential products before embarking on a large enumeration activity. Preliminary runs with a subset of the potential reaction(s) should be performed to ensure the product dataset contains all the desired output fields.
Currently, the final collection shard resizing is a particular performance bottleneck and investigation is continuing.
Inputs
Reaction Database: The input Reaction & Reagent Database to enumerate from.
Reaction IDs: These are the reaction ID(s) from the specified database (Reaction Database) that are to be
enumerated. All
is an acceptable selection but is not recommended, as this can result in a large number
of launched floes and an expensive overall cost for the enumeration.
Outputs
Append Enumeration Products To Collection: The name of the output collection for output of the product records. If the
collection does not exist, it will be created, and if it exists, the product shards will be added to the collection.
The append feature means that the specified output collection will never be closed to allow additional product
enumerations to populate the collection. It is recommended that ocli
be used to close the collection once all the
enumerations have been completed. The collection may display with 0
size, but that is simply because the size is
not computed until the collection is closed. Regardless of its open/closed state, the product collection can still be
used directly in other floes.
Product Rec/Shard: The desired records/shard for the final output collection shards.
Output Product Dataset: An optional dataset to capture a subset of the enumerated products for use in the Analyze page review.
Output Product Dataset Limit: Limits the number of records saved to the Output Product Dataset: note that the Analyze page has an intrinsic dataset limit.
Enumeration Options
General enumeration constraints: Reaction Product Min/Max Limits, Product ID style, and Product ID delimiter.
Enumeration Run Constraints
General enumeration run constraints: Cost Limit, InFlight Job Priority Limit, Failure Limit, Time Budget, Concurrent Job Limit, and Skipped Reaction Size limits.
Enumeration Advanced Usage
General advanced options.
Output Finalization Strategy - using mode Auto (default), the output product collection will be created or opened (if not in the ready state), and will be closed after all child jobs complete. To suppress this activity, use the None option. Auto mode implies that only one launching floe is active at a time.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Reaction Database (rxndb): The name of the Reaction & Reagent Database to enumerate from.
Required
Type: file_in
Reaction IDs (rxndb_ids): Either a comma-delimited list of reactions, or keyword ‘All’ to run all reactions from the input R&R Database.
Required
Type: string
Outputs
Append Enumeration Products to Collection (enum_coll_output): The name or ID of the collection for appending enumeration products.
Required
Type: string
Product rec/shard (enum_coll_records_per_shard): Records per shard for the final emitted product shards. For post-processing with Omega directly, 2,500 is optimal. For use in the Prepare Giga Collections Floe, 50 K is optimal.
Required
Type: integer
Default: 100000
Output Product Dataset (enum_dataset): Prefix for the name of dataset(s) generated to capture a subset of the enumerated products for review on the Analyze page. The reaction identifier will be appended to this name.
Type: dataset_out
Output Product Dataset Limit (enum_dataset_max): The maximum number of records to output to a named dataset, or 0 for all/limit, or negative for none. Extreme care should be used for ‘0’ (all), as writing large datasets can be very costly in time. No more than 100 K will be exported, as this is the Analysis page limit. This is simply the first N records; no sampling is performed.
Type: integer
Default: -1
Enumeration Options
Reaction Product Minimum (numprod_ge): Only enumerate reactions having >= this number of products, or leave blank for unconstrained.
Type: integer
Reaction Product Maximum (numprod_le): Only enumerate reactions having <= this number of products, or leave blank for unconstrained.
Type: integer
Product ID Style (enum_prodid_style): Which style of product ID to generate.
Required
Type: string
Default: Custom
Choices: [‘None’, ‘EnamineREALSpace’, ‘Custom’]
Product ID Delimiter (enum_prodid_delim): Reagent delimiter to use for a ‘Custom’ Product ID Style.
Type: string
Default: :
Enumeration Run Constraints
Enumeration Cost Budget (dollars) (enum_cost_limit): Do not launch any more enumeration jobs once this (dollar) cost limit (if available) is met from all the child jobs
Type: decimal
InFlight Job Priority Limit (enum_inflight_limit): Limits in-flight concurrent jobs to <= this threshold
Type: integer
Default: 10000000000
Enumeration Failure Limit (enum_failure_limit): Enumeration jobs are launched until this number of failures is reached (0: no limit)
Type: integer
Default: 1
Concurrent Enumeration Job Limit (enum_job_limit): Limit the number of concurrently running enumeration floes to this limit (0: unlimited).
Type: integer
Default: 1
Enumeration Time Budget (minutes) (enum_min_limit): Do not launch any more enumeration jobs once this (elapsed minute) job limit is met from all the child jobs
Type: integer
Skip Reaction Sizes (enum_skip_size_limit): Ignore reactions that generate more than this number of products
Type: integer
Enumeration Advanced Options
Enumeration Product Deduplication (enum_dedupe): Whether to deduplicate enumeration products within individual reaction enumerations. This has cube memory requirements, see corresponding memory limit parameter for tuning. Note that this adds significant runtime and cost to jobs that enable this option as all enumerated records must be inspected.
Type: boolean
Default: False
Choices: [True, False]
Enumeration Product Deduplication Memory (enum_dedupe_memory): The memory limit for the deduplication cube in MB.
Type: integer
Default: 20240
Enumeration Product Classifier Memory (enum_classifiermem): The memory limit for the reaction classifier in MB.
Type: integer
Default: 10240
Enable V2 Collections (enum_enablev2colls): If ON, use the high-performance collection API exclusively
Type: boolean
Default: True
Choices: [True, False]
Finalization Strategy (enum_finalizecoll): Finalization strategy for the output collection. Auto: if the output collection was opened or created, close after final child job completes; None: ensure ‘open’ of the output collection for writing, no change to collection status upon completion; Force: force close of the output collection on job completion.
Type: string
Default: Auto
Choices: [‘None’, ‘Auto’, ‘Force’]
R&R Enumeration Floe (enumeration_floe): Designate the enumeration floe by title or id for product enumerations
Required
Type: string
Default: _R&R Database Single Reaction Enumerator v0.9.2dev100
Output File Resource Name (enum_outfilename): The name of the Orion file resource for text molecule formats.
Type: string
Output File Format (enum_outfiletype): The format of the data for the file resource. Only text molecule formats are supported for file resource creation.
Type: string
Choices: [‘oedb’, ‘smi’, ‘ism’, ‘csv’]
Maximum number of Cubes (enum_max_parallel): The maximum parallel limit for the launched child floes.
Type: integer
Default: 500
Verbosity (verbosity): Sets the output logging verbosity level.
Type: string
Default: warning
Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]