ROCS X - Run 3D Search
Description
This floe runs a 3D similarity search from an initialized ROCS X model. The 3D search is based on reinforcement learning and Thompson sampling in a multi-armed bandit framework. The key element of the search is the sampling trial. In a sampling trial, a product is selected from the bandit arms that form the decision space in the model. Rewards for the bandit arms are adjusted based on the evaluation of the products sampled from the bandit arms (i.e., bandit arms that tend to yield high-scoring 3D-similar products are sampled from more frequently).
Key Inputs and Outputs
The key input is a ROCS X 3D Search model, which is typically output from the ROCS X - Initialize 3D Search Floe. The query to search is stored in this model.
The key output is a hit list of top-scoring 3D-similar products against the query. The hit list comes with a file containing duplicate information. This file shows the different ways each hit product can be made by combining different components from the library in various reactions. The hit list can also be triaged with the ROCS X - Hit List Clustering and Sampling Floe. It is recommended to set the Hit List Size parameter between 10,000 and 100,000 for best triage results. A secondary output is a collection of all the products that were searched from the sampling trials.
Cost Considerations
The floe cost scales with the Number of Sampling Trials parameter. Running twice as many sampling trials will cost roughly twice as much but may result in finding higher-scoring products in the search results.
Promoted Parameters
Title in user interface (promoted name)
Inputs
ROCS X 3D Search Model (model_state_collection_in): The collection containing an initialized ROCS X model. The model includes the search query.
Required
Type: collection_source
Vendor Selection (vendor_list): One or more specific vendors to search (comma or blank delimited). Vendor keys for a model can be viewed by looking at its Type Hints on Orion. The default option ‘All’ searches all vendors.
Type: string
Default: All
Outputs
ROCS X 3D Search Hit List Dataset (hitlist_out): Hit list dataset of top-scoring ROCS products.
Required
Type: dataset_out
Default: ROCS X 3D Search Hit List
ROCS X 3D Search (All) Collection (products_out): The name of the output collection containing all of the products from the sampling trials that were searched.
Required
Type: collection_sink
Default: ROCS X 3D Search (All)
Failures Collection (failures_out): The name of the output collection for failures.
Required
Type: collection_sink
Default: ROCS X 3D Search Failures
Temporary Collection (temporary_collection): This collection is created by the floe for internal use during the run and is automatically deleted by the floe when it finishes.
Type: collection_sink
Default: Temporary Collection
Hit List Duplicate Info File (file_out): File with duplicate information for products on the hit list.
Required
Type: file_out
Default: Hitlist_Duplicate_Info.txt
Options: Search
Number of Sampling Trials (num_trials): The number of sampling trials to run. This is typically a small fraction of the product space spanned by the combinatorial library. Note: The cost of the floe scales with this parameter.
Required
Type: integer
Default: 1500000
Minimum Products Scale Factor (min_products_scale_factor): Multiply this scale factor by the number of products in the initial model to get the minimum number of trials that can be run.
Type: decimal
Default: 0.1
Max Batches (num_batches): The maximum number of batches to run concurrently.
Type: integer
Default: 500
Initial Batches (num_init_batches): The number of initial batches to run.
Type: integer
Default: 20
Batch Scaling Factor (batch_scaling_factor): The number of batches to send out when the current number is less than the maximum (the default value of two sends out one additional batch).
Type: integer
Default: 4
Products Per Batch (batch_size): The number of sampling trial products to package in each batch.
Type: integer
Default: 100
Options: Overlay
Color Force Field (color_force_field): Color force field to be used for ROCS overlays. If a custom color force field was used to initialize the model, the custom color force field will be used unless Override Custom Color Force Field is turned On.
Type: string
Default: ImplicitMillsDean
Choices: [‘ImplicitMillsDean’, ‘ExplicitMillsDean’, ‘ImplicitMillsDeanNoRings’, ‘ExplicitMillsDeanNoRings’]
Override Custom Color Force Field (override_cff):
Type: boolean
Default: False
Choices: [True, False]
ROCS Score Sorter Type (sorter_type): Type of predicate for sorting scores for ROCS search.
Type: string
Default: HighestTanimotoCombo
Choices: [‘HighestTanimotoCombo’, ‘HighestFitTverskyCombo’, ‘HighestRefTverskyCombo’]
Sort Field (sort_field): Scoring function field to sort on.
Required
Type: field_parameter::float
Default: Tanimoto Combo
Choices: [‘Tanimoto Combo’, ‘Fit Tversky Combo’, ‘Ref Tversky Combo’]
ROCS Start Type (start_type): The type of starting orientations for ROCS.
Type: string
Default: Rocs
Choices: [‘Rocs’, ‘Random’]
Number of Random Starts (num_rand_starts): If specified, ROCS scoring will use the specified number of random starting orientations for each conformer being overlaid. If unspecified, the default of 4 inertial starts will be used.
Type: integer
Options: Hit List
Hit List Size (num_hitlist): The number of top-scoring ROCS X products to keep on the hit list.
Type: integer
Default: 10000
Hit List Rewards Size (num_hitlist_rewards): The number of top-scoring ROCS X products to keep on the hit list for calculating rewards. Influences the success/failure rates for products selected by the Thompson sampling model. Since the success/failure cutoff is the lowest score in the hit list, setting this smaller will make it harder for searched products to succeed, while setting this larger will make it easier for searched products to succeed.
Type: integer
Default: 10000
Sort Cube Memory (sort_memory_mb): Memory (in MB) allocated to the Hit List Cube.
Type: decimal
Default: 30720
Records Per Shard (records_per_shard): The target number of records in a shard for All products.
Type: integer
Default: 50000
Descending (descending): If On, scores will be sorted in descending order (i.e, high scores will appear at the top of the hit list). If Off, scores will be sorted in ascending order (i.e., low scores will appear at the top of the hit list).Hint: Set this to On when processing ROCS/FastROCS results and Off when processing docking results.
Type: boolean
Default: True
Choices: [True, False]
Chunk Count (chunk_count): The target number of records in a shard for the ROCS X 3D Search (All) Collection.
Type: integer
Default: 100000000
Options: Advanced Search
Memory (MB) (memory_mb): Memory (MB) for Bandit Model Hub Cube.
Type: decimal
Default: 7200
DB Memory Usage (%) (duckdb_memory_pct): Percentage of the Bandit Model Hub Cube’s memory to allocate to the DB.
Type: decimal
Default: 0.5
Product Normalization (protomer_prep_mode): Tautomer and ionization state normalization applied to searched products.
Type: string
Default: Get reasonable protomer and set neutral pH
Choices: [‘Get reasonable protomer and set neutral pH’, ‘Set neutral pH’, ‘None’]
Searchlist Format (searchlist_format):
Type: string
Default: DuckDB
Choices: [‘Pandas’, ‘DuckDB’]
Log Info Frequency (log_freq): Frequency number of trials for printing status to log.
Type: integer
Default: 20000
Logging Verbosity (verbosity): The level of logging verbosity to enable.
Type: string
Default: info
Choices: [‘error’, ‘warning’, ‘info’, ‘debug’, ‘ddebug’]
Stop After Number of No Successes (success_tracker_limit): Stops the job if no successes are found after X trials.
Type: integer
Default: 20000
Limit Rewards Memory (memory_flag): If On, bandit arms will remember rewards for only the last X pulls, where X is set by the Memory Length parameter. If Off, bandit arms will remember rewards for every pull.
Type: boolean
Default: True
Choices: [True, False]
Memory Length Pull 1 (memory_length_pull1): If Limit Rewards Memory is On, the number of rewards a Pull 1 bandit arm will remember.
Type: integer
Default: 10000
Memory Length Pull 2 (memory_length_pull2): If Limit Rewards Memory is On, the number of rewards a Pull 2 bandit arm will remember.
Type: integer
Default: 1000
Hierarchical Pulls Mode (hierarchical_mode): Strategy for pulling two arms. ‘Independent’ pulls arms independently. ‘Hierarchical’ tracks combinations of pulls.
Type: string
Default: hierarchical
Choices: [‘independent’, ‘hierarchical’]