ROCS X - Run 3D Search

Description

This floe runs a 3D similarity search from an initialized ROCS X model. The 3D search is based on reinforcement learning and Thompson sampling in a multi-armed bandit framework. The key element of the search is the sampling trial. In a sampling trial, a product is selected from the bandit arms that form the decision space in the model. Rewards for the bandit arms are adjusted based on the evaluation of the products sampled from the bandit arms (i.e., bandit arms that tend to yield high-scoring 3D-similar products are sampled from more frequently).

Key Inputs and Outputs

The key input is a ROCS X 3D Search model, which is typically output from the ROCS X - Initialize 3D Search Floe. The query to search is stored in this model.

The key output is a hit list of top-scoring 3D-similar products against the query. The hit list comes with a file containing duplicate information. This file shows the different ways each hit product can be made by combining different components from the library in various reactions. The hit list can also be triaged with the ROCS X - Hit List Clustering and Sampling Floe. It is recommended to set the Hit List Size parameter between 10,000 and 100,000 for best triage results. A secondary output is a collection of all the products that were searched from the sampling trials.

Cost Considerations

The floe cost scales with the Number of Sampling Trials parameter. Running twice as many sampling trials will cost roughly twice as much but may result in finding higher-scoring products in the search results.

Promoted Parameters

Title in user interface (promoted name)

Inputs

ROCS X 3D Search Model (model_state_collection_in): The collection containing an initialized ROCS X model. The model includes the search query.

Required

Type: collection_source

Vendor Selection (vendor_list): One or more specific vendors to search (comma or blank delimited). Vendor keys for a model can be viewed by looking at its Type Hints on Orion. The default option ‘All’ searches all vendors.

Type: string

Default: All

Outputs

ROCS X 3D Search Hit List Dataset (hitlist_out): Hit list dataset of top-scoring ROCS products.

Required

Type: dataset_out

Default: ROCS X 3D Search Hit List

ROCS X 3D Search (All) Collection (products_out): The name of the output collection containing all of the products from the sampling trials that were searched.

Required

Type: collection_sink

Default: ROCS X 3D Search (All)

Failures Collection (failures_out): The name of the output collection for failures.

Required

Type: collection_sink

Default: ROCS X 3D Search Failures

Temporary Collection (temporary_collection): This collection is created by the floe for internal use during the run and is automatically deleted by the floe when it finishes.

Type: collection_sink

Default: Temporary Collection

Hit List Duplicate Info File (file_out): File with duplicate information for products on the hit list.

Required

Type: file_out

Default: Hitlist_Duplicate_Info.txt

Options: Search

Number of Sampling Trials (num_trials): The number of sampling trials to run. This is typically a small fraction of the product space spanned by the combinatorial library. Note: The cost of the floe scales with this parameter.

Required

Type: integer

Default: 1500000

Minimum Products Scale Factor (min_products_scale_factor): Multiply this scale factor by the number of products in the initial model to get the minimum number of trials that can be run.

Type: decimal

Default: 0.1

Max Batches (num_batches): The maximum number of batches to run concurrently.

Type: integer

Default: 500

Initial Batches (num_init_batches): The number of initial batches to run.

Type: integer

Default: 20

Batch Scaling Factor (batch_scaling_factor): The number of batches to send out when the current number is less than the maximum (the default value of two sends out one additional batch).

Type: integer

Default: 4

Products Per Batch (batch_size): The number of sampling trial products to package in each batch.

Type: integer

Default: 100

Options: Overlay

Color Force Field (color_force_field): Color force field to be used for ROCS overlays. If a custom color force field was used to initialize the model, the custom color force field will be used unless Override Custom Color Force Field is turned On.

Type: string

Default: ImplicitMillsDean

Choices: [‘ImplicitMillsDean’, ‘ExplicitMillsDean’, ‘ImplicitMillsDeanNoRings’, ‘ExplicitMillsDeanNoRings’]

Override Custom Color Force Field (override_cff):

Type: boolean

Default: False

Choices: [True, False]

ROCS Score Sorter Type (sorter_type): Type of predicate for sorting scores for ROCS search.

Type: string

Default: HighestTanimotoCombo

Choices: [‘HighestTanimotoCombo’, ‘HighestFitTverskyCombo’, ‘HighestRefTverskyCombo’]

Sort Field (sort_field): Scoring function field to sort on.

Required

Type: field_parameter::float

Default: Tanimoto Combo

Choices: [‘Tanimoto Combo’, ‘Fit Tversky Combo’, ‘Ref Tversky Combo’]

ROCS Start Type (start_type): The type of starting orientations for ROCS.

Type: string

Default: Rocs

Choices: [‘Rocs’, ‘Random’]

Number of Random Starts (num_rand_starts): If specified, ROCS scoring will use the specified number of random starting orientations for each conformer being overlaid. If unspecified, the default of 4 inertial starts will be used.

Type: integer

Options: Hit List

Hit List Size (num_hitlist): The number of top-scoring ROCS X products to keep on the hit list.

Type: integer

Default: 10000

Hit List Rewards Size (num_hitlist_rewards): The number of top-scoring ROCS X products to keep on the hit list for calculating rewards. Influences the success/failure rates for products selected by the Thompson sampling model. Since the success/failure cutoff is the lowest score in the hit list, setting this smaller will make it harder for searched products to succeed, while setting this larger will make it easier for searched products to succeed.

Type: integer

Default: 10000

Sort Cube Memory (sort_memory_mb): Memory (in MB) allocated to the Hit List Cube.

Type: decimal

Default: 30720

Records Per Shard (records_per_shard): The target number of records in a shard for All products.

Type: integer

Default: 50000

Descending (descending): If On, scores will be sorted in descending order (i.e, high scores will appear at the top of the hit list). If Off, scores will be sorted in ascending order (i.e., low scores will appear at the top of the hit list).Hint: Set this to On when processing ROCS/FastROCS results and Off when processing docking results.

Type: boolean

Default: True

Choices: [True, False]

Chunk Count (chunk_count): The target number of records in a shard for the ROCS X 3D Search (All) Collection.

Type: integer

Default: 100000000

Options: Advanced Search

Memory (MB) (memory_mb): Memory (MB) for Bandit Model Hub Cube.

Type: decimal

Default: 7200

DB Memory Usage (%) (duckdb_memory_pct): Percentage of the Bandit Model Hub Cube’s memory to allocate to the DB.

Type: decimal

Default: 0.5

Product Normalization (protomer_prep_mode): Tautomer and ionization state normalization applied to searched products.

Type: string

Default: Get reasonable protomer and set neutral pH

Choices: [‘Get reasonable protomer and set neutral pH’, ‘Set neutral pH’, ‘None’]

Searchlist Format (searchlist_format):

Type: string

Default: DuckDB

Choices: [‘Pandas’, ‘DuckDB’]

Log Info Frequency (log_freq): Frequency number of trials for printing status to log.

Type: integer

Default: 20000

Logging Verbosity (verbosity): The level of logging verbosity to enable.

Type: string

Default: info

Choices: [‘error’, ‘warning’, ‘info’, ‘debug’, ‘ddebug’]

Stop After Number of No Successes (success_tracker_limit): Stops the job if no successes are found after X trials.

Type: integer

Default: 20000

Limit Rewards Memory (memory_flag): If On, bandit arms will remember rewards for only the last X pulls, where X is set by the Memory Length parameter. If Off, bandit arms will remember rewards for every pull.

Type: boolean

Default: True

Choices: [True, False]

Memory Length Pull 1 (memory_length_pull1): If Limit Rewards Memory is On, the number of rewards a Pull 1 bandit arm will remember.

Type: integer

Default: 10000

Memory Length Pull 2 (memory_length_pull2): If Limit Rewards Memory is On, the number of rewards a Pull 2 bandit arm will remember.

Type: integer

Default: 1000

Hierarchical Pulls Mode (hierarchical_mode): Strategy for pulling two arms. ‘Independent’ pulls arms independently. ‘Hierarchical’ tracks combinations of pulls.

Type: string

Default: hierarchical

Choices: [‘independent’, ‘hierarchical’]