Multi-Stage ROCS X Search
Description
This floe orchestrates the multi-stage search of a ROCS X 3D library, resulting in a clustered hit list of top-scoring 3D-similar molecules to a query. It launches one “head” job that runs the following three “stage” floes in sequence:
ROCS X - Initialize 3D Search: Initializes a ROCS X model for a 3D similarity search on a query.
ROCS X - Run 3D Search: Runs a 3D similarity search from an initialized ROCS X model.
ROCS X - Hit List Clustering and Sampling: Returns a diverse sample from a hit list by clustering based on 2D fingerprint similarity.
The Multi-Stage Floe handles inputs and outputs for the stage floes. Outputs from completed stages are automatically passed to downstream stages. The Multi-Stage Floe starts where it first receives input, so early stages can be skipped by leaving their inputs blank and providing input to the designated starting stage. All outputs generated by the Multi-Stage Floe can be in other contexts on Orion.
Promoted Parameters
Title in user interface (promoted name)
Orchestration Settings
Unique Orchestration Job Tag (unique_job_tag): Tag to apply to this job and all launched jobs for easy identification. If provided tag is not unique, a suffix will be added to ensure uniqueness.
Type: string
Default: MS:ROCSX_Srch
Exact Search Floes Package Version (floes_package_version): Exact version of the ROCS X Search Floes package to use.
Type: string
Default: 1.0.0
Generate Live Report (live_report_service): Generates an experimental live report for tracking orchestrated jobs, then replaces with a static one upon job completion. If off only a static report will be generated at the end of the multi-stage orchestration.
Type: boolean
Default: False
Choices: [True, False]
Stage 1: Model Init
Query to Search (query_in): The dataset containing the query. Only single-conformer molecule queries are supported at the time. If a multi-conformer molecule is provided, the floe will immediately fail.
Type: data_source
ROCS X 3D Search Library Collection (collection_in): The collection containing the 3D library to search.
Type: collection_source
External Hit List for Query (opt_init_hitlist_in): (Optional) Dataset containing a hit list of ROCS scores from an external search on the query.
Type: data_source
External Hit List Score Field (opt_init_ext_hitlist_field_name): (Optional) Field for 3D similarity scores on the external hit list.
Type: string
Default: Tanimoto Combo
Stage 2: ROCS X Search
Input ROCS X 3D Search Model (model_in): The collection containing an initialized ROCS X model. The model includes the search query.
Type: collection_source
Vendor Selection (vendor_list): One or more specific vendors to search (comma or blank delimited). Vendor keys for a model can be viewed by looking at its Type Hints on Orion. The default option ‘All’ searches all vendors.
Type: string
Default: All
Number of Sampling Trials (num_trials): The number of sampling trials to run. This is typically a small fraction of the product space spanned by the combinatorial library. Note: The cost of the floe scales with this parameter.
Type: integer
Default: 1500000
Hit List Size (num_hitlist): The number of top-scoring ROCS X products to keep on the output hit list.
Type: integer
Default: 10000
Stage 3: Hit List Triage
Input Hit List from ROCS X 3D Search (hitlist_in): Dataset containing hit list to cluster and sample.
Type: data_source
Number of Clusters to Sample (num_clusters_to_sample): The number of clusters from which to sample molecules. Set to 0 to sample all clusters.
Type: integer
Default: 25
Number of Samples per Cluster (num_samples_per_cluster): The number of molecules to sample from each cluster.
Type: integer
Default: 5
Outputs
Stage 1 Output: ROCS X 3D Search Model (output_initialized_model_collection): The name of the output collection containing the initialized model that can be run in a 3D search.
Type: string
Default: ROCS X 3D Search Model
Stage 1 Output: Hit List from Product Sample (output_initial_sample_hitlist_dataset): The name of the output hit list dataset from ROCS rescoring of the top hits in the initial FastROCS search of the product sample against the query.
Type: string
Default: Product Sample Hit List
Stage 2 Output: Hit List from ROCS X 3D Search (hitlist_out): Hit list of top-scoring ROCS X products.
Type: string
Default: ROCS X 3D Search Hit List
Stage 2 Output: All Products Sampled during ROCS X 3D Search (products_out): The name of the output collection containing all of the products from the sampling trials that were searched.
Type: string
Default: ROCS X 3D Search (All)
Stage 2 Output: Hit List Duplicate Info File (dupe_file_out): File with duplicate information for products on the hit list.
Type: string
Default: Hitlist_Duplicate_Info.txt
Stage 3 Output: Full Clustered Hit List (clustered_out): Output dataset for the full clustered hit list.
Type: string
Default: ROCS X Clustered Hit List
Stage 3 Output: Sampled Hit List (sampled_out): Output dataset for the sampled hit list.
Type: string
Default: ROCS X Sampled Hit List