ROCS X - 2D Substructure Search
Description
This floe performs a fast substructure search for query molecules or SMARTS patterns on a ROCS X 2D synthon library. The fast search works by splitting substructure queries and piecing together partial hits in the reagent space to find hits in the product space. The method currently cannot search ring-forming reactions in the reagent space, so ring-forming reactions are searched in the product space. In cases where a substructure hit lies entirely within one reagent, every product that can be formed with that reagent will also be a hit.
Key Inputs and Outputs
The key input is a ROCS X 2D synthon library. This is typically output from the Reaction & Reagent Database - Multi-vendor - Parallel Export Synthon Collection Floe in the Generative Design Hit-to-Lead Floes package.
The key output is a collection of hit molecules that matched any of the substructure queries. The collection can be used with floes that read in and process a collection of 2D molecules. A sample of hits from the collection is provided in a dataset for convenient viewing on Orion’s 3D & Analyze Page.
Cost Considerations
The cost of the floe is largely determined by the number of hits that are found during the search. Typically, more hits are found when searching larger libraries. Including more substructure queries and including simpler queries also results in more hits. If there are a very large number of hits in the search, the floe can get bogged down. Please be judicious about your search settings, monitor running jobs, and set cost termination thresholds appropriately.
Promoted Parameters
Title in user interface (promoted name)
Outputs
All Hits Collection (out_coll): Output collection containing all similarity hits above cutoff.
Required
Type: collection_sink
Default: ROCS X 2D Substructure Search Hits (All)
Sample Records From Collection Into Dataset (switch): If On, will randomly sample records into the Sample Hits Dataset. Can increase the floe runtime due to need to serially read each shard in output collection.
Required
Type: boolean
Default: True
Choices: [True, False]
Sample Hits Dataset (data_out): Contains sample of hits, determined by hitlist (similarity search) or sample percentage (subsearch).
Required
Type: dataset_out
Default: ROCS X 2D Substructure Search Hits (Sample)
Floe Report Name (floe_report_name):
Required
Type: string
Default: ROCS X 2D Substructure Search Hits Report
Memory in MB for Pair Hits cube group (pair_hits_memory_mb): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Required
Type: decimal
Default: 1800
Disk Space in MB used for Pair Hits cube group (pair_hits_disk_space): The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Required
Type: decimal
Default: 10000
Advanced Database Ingestion Parameters
Validate Reagent Shards (validate): Perform validation on the reagent shards.
Type: boolean
Default: False
Choices: [True, False]
Logging Verbosity (verbosity): What level of logging verbosity to enable.
Type: string
Default: info
Choices: [‘error’, ‘warning’, ‘info’, ‘debug’, ‘ddebug’]
Advanced Database Parallel Processing Parameters
Parallel Limit (maxparallel): The maximum number of concurrently running copies of this Cube
Type: integer
Default: 1000
Parallel Failure Limit (maxfailure): The maximum number of times to attempt processing a work item
Type: integer
Default: 10
Inputs
ROCS X 2D Synthon Library (reagcoll): The input collection for the 2D synthon library to search.
Required
Type: collection_source
Reaction Constraint (rxnlist): Select one or more reactions from the sample reaction database list. Select All to use all reactions and Custom to use the Custom Reaction Constraint below.
Required
Type: string
Default: [‘All’]
Choices: [‘All’, ‘Custom’, ‘3-nitrile-pyridine’, ‘Buchwald-Hartwig’, ‘Buchwald_cross_coupling1’, ‘Buchwald_cross_coupling2’, ‘Ester_hydrolysis-amide_synthesis1’, ‘Ester_hydrolysis-amide_synthesis2’, ‘Grignard_alcohol’, ‘Grignard_carbonyl’, ‘Heck_non-terminal_vinyl’, ‘Heck_terminal_vinyl’, ‘Huisgen_disubst-alkyne’, ‘Mitsunobu_imide’, ‘Mitsunobu_phenol’, ‘Mitsunobu_sulfonamide’, ‘Mitsunobu_tetrazole_1’, ‘Mitsunobu_tetrazole_2’, ‘N-alkylation1’, ‘N-alkylation2’, ‘N-arylation_heterocycles’, ‘Negishi’, ‘Niementowski_quinazoline’, ‘O-alkylation’, ‘O-biarylation’, ‘Pictet-Spengler’, ‘Reductive_amination1’, ‘Reductive_amination2’, ‘Schotten-Baumann_amide’, ‘SnAr1’, ‘SnAr2’, ‘Sonogashira’, ‘Stille’, ‘Suzuki_cross_coupling’, ‘Wittig’, ‘benzimidazole_derivatives_aldehyde’, ‘benzimidazole_derivatives_carboxylic-acid/ester’, ‘benzofuran’, ‘benzothiazole’, ‘benzothiophene’, ‘benzoxazole_arom-aldehyde’, ‘benzoxazole_carboxylic-acid’, ‘decarboxylative_coupling’, ‘heteroaromatic_nuc_sub’, ‘imidazole’, ‘indole’, ‘nucl_sub_aromatic_ortho_nitro’, ‘nucl_sub_aromatic_para_nitro’, ‘oxadiazole’, ‘phthalazinone’, ‘piperidine_indole’, ‘pyrazole’, ‘spiro-chromanone’, ‘sulfon_amide’, ‘tetrazole_connect_regioisomere_1’, ‘tetrazole_connect_regioisomere_2’, ‘tetrazole_terminal’, ‘thiazole’, ‘triaryl-imidazole’, ‘urea’]
Custom Reaction Constraint (customrxnlist): Custom input for a comma delimited list of reaction names to process from the reaction database. If Reaction Constraint is not Custom, this field will be ignored.
Type: string
Filter Ring Forming Reactions (filter_ringforming_flag): Off means all reactions will be included. On means ring-forming reactions will be filtered out.
Type: boolean
Default: False
Choices: [True, False]
Query SMARTS Pattern(s) (in_smarts): To provide multiple patterns, separate them with a blank, or, use a file, with the parameter below.
Type: string
Query SMARTS pattern list (in_smarts_list): Text file of input SMARTS patterns, one on each line.
Type: file_in
Filter Options
Minimum Heavy Atom Count (min_hac): Minimum Heavy Atom Count for a product molecule.
Type: integer
Default: 1
Maximum Heavy Atom Count (max_hac): Maximum Heavy Atom Count for a product molecule.
Type: integer
Default: 80
Advanced Enumeration Splitting Options
Max Allowed (max_split_enum): Maximum number of equivalent enumerated records to search at a time, in each parallel instance of enumerated search.
Type: integer
Default: 5000000