Reaction & Reagent Database - Launch Product Enumerations

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Task-based/Library Prep & Design/Reaction-based Enumeration

  • Role-based/Medicinal Chemist

Description

PRELIMINARY RELEASE This preliminary floe launches individual reaction enumeration floes for the selected reaction ID(s) from the provided Reaction & Reagent Database.

WARNING For individual reaction libraries in the 500M size or larger, contact support@eyesopen.com for assistance.

WARNING Extremely large enumerations may place undue burden on the entire Orion stack, so extreme care should be exercised when enumerating unconstrained runs of “All” reactions for very large Reaction & Reagent Database libraries. Judicious use of limits on the minimum/maximum number of products is suggested to ensure the result of the enumeration can be completed, and to generate an output collection of a size and shard_size that is amenable to follow-on processing needs. Concurrent job limits, cost, and time-limit constraints can also be used to limit the impact on the stack and for all Orion users.

General Usage

Careful investigation using the specific Reaction & Reagent Database of interest is highly encouraged. Running the Reaction & Reagent Database - Directory Listing Floe on the desired database is recommended to verify the number of potential products before embarking on a large enumeration activity. Preliminary runs with a subset of the potential reaction(s) should be performed to ensure the product dataset contains all the desired output fields.

Currently, the final collection shard resizing is a particular performance bottleneck and investigation is continuing.

Inputs

Reaction Database: The input Reaction & Reagent Database to enumerate from.

Reaction IDs: These are the reaction ID(s) from the specified database (Reaction Database) that are to be enumerated. All is an acceptable selection but is not recommended, as this can result in a large number of launched floes and an expensive overall cost for the enumeration.

Outputs

Append Enumeration Products To Collection: The name of the output collection for output of the product records. If the collection does not exist, it will be created, and if it exists, the product shards will be added to the collection. The append feature means that the specified output collection will never be closed to allow additional product enumerations to populate the collection. It is recommended that ocli be used to close the collection once all the enumerations have been completed. The collection may display with 0 size, but that is simply because the size is not computed until the collection is closed. Regardless of its open/closed state, the product collection can still be used directly in other floes.

Product Rec/Shard: The desired records/shard for the final output collection shards.

Output Product Dataset: An optional dataset to capture a subset of the enumerated products for use in the Analyze page review.

Output Product Dataset Limit: Limits the number of records saved to the Output Product Dataset: note that the Analyze page has an intrinsic dataset limit.

Enumeration Options

General enumeration constraints: Reaction Product Min/Max Limits, Product ID style, and Product ID delimiter.

Enumeration Run Constraints

General enumeration run constraints: Cost Limit, InFlight Job Priority Limit, Failure Limit, Time Budget, Concurrent Job Limit, and Skipped Reaction Size limits.

Enumeration Advanced Usage

General advanced options.

Output Finalization Strategy - using mode Auto (default), the output product collection will be created or opened (if not in the ready state), and will be closed after all child jobs complete. To suppress this activity, use the None option. Auto mode implies that only one launching floe is active at a time.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Reaction Database (rxndb): The name of the Reaction & Reagent Database to enumerate from.

  • Required

  • Type: file_in

Reaction IDs (rxndb_ids): Either a comma-delimited list of reactions, or keyword ‘All’ to run all reactions from the input R&R Database.

  • Required

  • Type: string

Outputs

Append Enumeration Products to Collection (enum_coll_output): The name or ID of the collection for appending enumeration products.

  • Required

  • Type: string

Product rec/shard (enum_coll_records_per_shard): Records per shard for the final emitted product shards. For post-processing with Omega directly, 2,500 is optimal. For use in the Prepare Giga Collections Floe, 50 K is optimal.

  • Required

  • Type: integer

  • Default: 100000

Output Product Dataset (enum_dataset): Prefix for the name of dataset(s) generated to capture a subset of the enumerated products for review on the Analyze page. The reaction identifier will be appended to this name.

  • Type: dataset_out

Output Product Dataset Limit (enum_dataset_max): The maximum number of records to output to a named dataset, or 0 for all/limit, or negative for none. Extreme care should be used for ‘0’ (all), as writing large datasets can be very costly in time. No more than 100 K will be exported, as this is the Analysis page limit. This is simply the first N records; no sampling is performed.

  • Type: integer

  • Default: -1

Enumeration Options

Reaction Product Minimum (numprod_ge): Only enumerate reactions having >= this number of products, or 0 for unconstrained.

  • Type: integer

Reaction Product Maximum (numprod_le): Only enumerate reactions having <= this number of products, or 0 for unconstrained.

  • Type: integer

Product ID Style (enum_prodid_style): Which style of product ID to generate.

  • Required

  • Type: string

  • Default: Custom

  • Choices: [‘None’, ‘EnamineREALSpace’, ‘Custom’]

Product ID Delimiter (enum_prodid_delim): Reagent delimiter to use for a ‘Custom’ Product ID Style.

  • Type: string

  • Default: :

Enumeration Run Constraints

Enumeration Cost Budget (dollars) (enum_cost_limit): Do not launch any more enumeration jobs once this (dollar) cost limit (if available) is met from all the child jobs

  • Type: decimal

InFlight Job Priority Limit (enum_inflight_limit): Limits in-flight concurrent jobs to <= this threshold

  • Type: integer

  • Default: 10000000000

Enumeration Failure Limit (enum_failure_limit): Enumeration jobs are launched until this number of failures is reached (0: no limit)

  • Type: integer

  • Default: 1

Concurrent Enumeration Job Limit (enum_job_limit): Limit the number of concurrently running enumeration floes to this limit (0: unlimited).

  • Type: integer

  • Default: 1

Enumeration Time Budget (minutes) (enum_min_limit): Do not launch any more enumeration jobs once this (elapsed minute) job limit is met from all the child jobs

  • Type: integer

Skip Reaction Sizes (enum_skip_size_limit): Ignore reactions that generate more than this number of products

  • Type: integer

Enumeration Advanced Options

Enumeration Product Deduplication (enum_dedupe): Whether to deduplicate enumeration products within individual reaction enumerations. This has cube memory requirements, see corresponding memory limit parameter for tuning. Note that this adds significant runtime and cost to jobs that enable this option as all enumerated records must be inspected.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Enumeration Product Deduplication Memory (enum_dedupe_memory): The memory limit for the deduplication cube in MB.

  • Type: integer

  • Default: 20240

Enumeration Product Classifier Memory (enum_classifiermem): The memory limit for the reaction classifier in MB.

  • Type: integer

  • Default: 10240

Enable V2 Collections (enum_enablev2colls): If ON, use the high-performance collection API exclusively

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Finalization Strategy (enum_finalizecoll): Finalization strategy for the output collection. Auto: if the output collection was opened or created, close after final child job completes; None: ensure ‘open’ of the output collection for writing, no change to collection status upon completion; Force: force close of the output collection on job completion.

  • Type: string

  • Default: Auto

  • Choices: [‘None’, ‘Auto’, ‘Force’]

R&R Enumeration Floe (enumeration_floe): Designate the enumeration floe by title or id for product enumerations

  • Required

  • Type: string

  • Default: _R&R Database Single Reaction Enumerator v0.7.1

Output File Resource Name (enum_outfilename): The name of the Orion file resource for text molecule formats.

  • Type: string

Output File Format (enum_outfiletype): The format of the data for the file resource. Only text molecule formats are supported for file resource creation.

  • Type: string

  • Choices: [‘oedb’, ‘smi’, ‘ism’, ‘csv’]

Maximum number of Cubes (enum_max_parallel): The maximum parallel limit for the launched child floes.

  • Type: integer

  • Default: 500

Verbosity (verbosity): Sets the output logging verbosity level.

  • Type: string

  • Default: warning

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]