Overlays a FastROCS collection onto up to 200 shape or molecule queries and outputs a single FastROCS
hit list using the best result (i.e., highest similarity to any query) to rank molecules. This floe
also optionally re-scores the top FastROCS molecules with ROCS, Docking and a consensus
of both, creating a separate hit list for each.
Input Query Dataset(s) One or more datasets containing molecules and/or shape queries. Queries can come from multiple datasets or a single dataset with one or more queries. The ‘2D Sketcher’ can also be used to create a query, in which case a reasonable set of conformers of the sketched molecule will be generated and used as queries (see ‘Query Conformer Generation Mode’ option parameter). The total number of queries is limited to 200. Compute costs will scale roughly linearly with the number of queries PLUS 10,i.e., cost is roughly proportional to <number of queries + 10>.
Type : data_source
Required : True
Python Name : input_query_datasets
FastROCS Input Collection FastROCS collection to screen against. OpenEye supplied several pre-generated vendor molecule collections in Organization Data. The ‘Prepare Giga Collections’ or ‘Giga Docking Collection to Hi-res FastROCS Collection’ floes can also be used to create suitable collections for this floe.
Type : collection_source
Required : True
Python Name : fastrocs_input_collection
Design Unit(s) (Optional) If a design unit is supplied here the top scoring molecules from FastROCS will be docked to the design unit and the results outputted in a separate hit list. Up to 10 design units can be supplied from one or more datasets. If multiple design units are supplied the specific a single docking hit list is still created using the score from the best design unit for each docked molecule (see the ‘Dock Re-scoring Mode’ parameter description).’
Type : data_source
Required : False
Python Name : design_units_optional
Shape Query(s) for ROCS Re-scoring (Optional) Optional dataset(s) with one or more shape queries to be used for the ROCS re-scoring IN PLACE of the queries passed to ‘Input Query Dataset(s)’. This dataset only accepts shape queries, not molecule queries. This parameter allows the ROCS re-scoring step to use shape queries that are not supported by FastROCS (e.g. shape queries with grids). If a dataset(s) is supplied to this parameter ROCS re-scoring will automatically be turned on and the setting of the ‘Options: Re-scoring -> ROCS Re-scoring Mode’ parameter will be ignored.
Type : data_source
Required : False
Python Name : shape_querys_for_rocs_re_scoring_optional
FastROCS Hit List Dataset Output dataset that will contain the top hits directly from FastROCS.
Type : dataset_out
Required : True
Default : FastROCS Hit List
Python Name : fastrocs_hit_list_dataset
FastROCS Novelty Hit List Dataset Molecules in this output dataset will be sorted by FastROCS score. The molecule in this hit list contain molecules that tend to have high 3D similarity and low 2D similarity to the query(s).
Type : dataset_out
Required : True
Default : FastROCS Novelty Hit List
Python Name : fastrocs_novelty_hit_list_dataset
ROCS Hit List Dataset Output hit list dataset from ROCS re-scoring of the top FastROCS hits. This dataset will not be created if the ‘ROCS Re-scoring Mode’ parameter is set to ‘Off’.
Type : dataset_out
Required : True
Default : ROCS Hit List
Python Name : rocs_hit_list_dataset
ROCS Novelty Hit List Dataset Molecules in this output dataset will be sorted by ROCS score. The molecule in this hit list contain molecules that tend to have high 3D similarity and low 2D similarity to the query(s).
Type : dataset_out
Required : True
Default : ROCS Novelty Hit List
Python Name : rocs_novelty_hit_list_dataset
Dock Hit List Dataset Output hit list dataset from docking the top FastROCS hits. This dataset will only be created if at least one design unit is supplied to the ‘Design Unit(s) (Optional)’ parameter.
Type : dataset_out
Required : True
Default : Dock Hit List
Python Name : dock_hit_list_dataset
Consensus ROCS Hit List Dataset Consensus output hit list ranked by ROCS Combo Tanimoto. This hit list will only be created if both ROCS and Dock re-scoring are enabled (see the ‘Design Unit(s) (Optional)’ and ‘ROCS Re-scoring Mode’ parameters).
Type : dataset_out
Required : True
Default : Consensus ROCS Hit List
Python Name : consensus_rocs_hit_list_dataset
Consensus Dock Hit List Dataset Consensus output hit list ranked by docking score. This hit list will only be created if both ROCS and Dock re-scoring are enabled (see the ‘Design Unit(s) (Optional)’ and ‘ROCS Re-scoring Mode’ parameters).
Type : dataset_out
Required : True
Default : Consensus Dock Hit List
Python Name : consensus_dock_hit_list_dataset
Output Query Dataset This output dataset will contain a copy of the input queries (see the ‘Input Query Dataset(s)’ parameter). In addition to the query, this dataset will include an integer id field that also appears in the ROCS/FastROCS hit list records so the query associated with the molecule can be identified (This is primarily useful when multiple queries are used). Note that if the ‘Shape Query(s) for ROCS Re-scoring (Optional)’ is specified this dataset will not contain the queries used for FastROCS but not ROCS re-scoring (see the ‘Outputs-> Output ROCS Re-score Shape Query Dataset’ parameter in that case).
Type : dataset_out
Required : True
Default : FastROCS Queries
Python Name : output_query_dataset
Output ROCS Re-score Shape Query Dataset This output dataset will contain a copy of the input shape query(s) set to ‘Shape Query(s) for ROCS Re-scoring (Optional)’. If the ‘Shape Query(s) for ROCS Re-scoring (Optional)’ is not specified this dataset will not be createdbecause the query(s) for ROCS re-scoring will be the same as those for FastROCS (see the ‘Outputs -> Output Query Dataset’ parameter).In addition to the query, this dataset will include and integer id field that also appears in the ROCS hit list records so the query associated with the molecule can be identified. (This is primarily useful when multiple shape queries are used.)
Type : dataset_out
Required : True
Default : ROCS Re-score Shape Queries
Python Name : output_rocs_re_score_shape_query_dataset
Design Unit Dataset Output dataset with a copy of the input design unit(s) (see the ‘Design Unit(s) (Optional)’ parameter). In addition to the design unit this dataset will include an integer id field. This id will also appear in the dock hit list records so the design unit associated with the molecule can be identified. (This is primarily useful when multiple design units are used.)This dataset will only be created if design units are sent to this floe.
Type : dataset_out
Required : True
Default : Design Units
Python Name : design_unit_dataset
Raw Results Collection (Optional) The name of an output collection that will contain a number of molecules approximately equal to the setting of ‘Options: Advanced -> Number of Molecules to Re-score’. If either ROCS or Docking re-scoring is turned on (ROCS is on by default) this collection will contain the entire set of top scoring FastROCS molecules that were re-scoring. If both ROCS and Docking re-scoring are turned off the collection will contain the top scoring FastROCS molecules. The format of the individual shards of the collection are .oedb which, if downloaded locally, can be read with the toolkits OEReadMolRecords function. If this parameter is not specified this output collection will not be created.
Type : collection_sink
Required : False
Python Name : raw_results_collection_optional
Temporary Collection This collection is created by the floe for internal use during the run and is automatically deleted by the floe when it finishes.
Hit List Size Size of all output hit lists. (Max value 100,000, Min Value 1000).
Type : integer
Required : True
Default : 10000
Range : 1000 to 100000
Python Name : hit_list_size
FastROCS Similarity Type Type of FastROCS Similarity to use to rank molecules sent to the FastROCS, ROCS and consensus ROCS hit lists. This method will also be used by ROCS re-scoring if it is enabled (ROCS re-scoring is enabled by default) and shape queries explicitly for the ROCS re-scoring have not been set using the ‘Shape Query(s) for ROCS Re-scoring’ parameter.
Type : string
Required : False
Default : Tanimoto Combo
Choices :Tanimoto Combo, Ref Tversky, Fit Tversky, Shape Tanimoto, Shape Ref Tversky, Shape Fit Tversky
Query Conformer Generation Mode Method used to generate conformer(s) of the molecules queries (shape queries are always accepted as is).
‘input’: Uses conformer of the molecule query as is (molecule queries without coordinates will fail in this mode).
‘omega’: Generate query molecule conformations with omega.
‘dock’: Use the best pose of the query molecule docked to the design unit(s) This mode requires that design units be supplied to the ‘Design Unit(s) (Optional)’ parameter.
‘auto’Molecule queries with 3d coordinates will be used as is. Molecules queries without 3d coordinates will used ‘dock’ mode if design unit(s) are supplied to the floe and ‘omega’ mode otherwise.
Type : string
Required : False
Default : auto
Choices :input, omega, dock, auto
Python Name : query_conformer_generation_mode
Multi Conformer Mol Query Mode Controls how query molecules with multiple conformers are handled.
‘fail’: records with molecule with multiple conformers will fail.
‘active’: The active conformer of the molecule (this is typically the first conformer and often lowest energy) will be used.
‘first10’ : The first 10 conformers of the molecule will be used.
‘limit’: All conformers of the molecule up to this cubes limit for total queries.
‘all’ : all conformers of the molecule will be used as queries.
WARNING: using ‘all’ or ‘limit’ can significantly increase the cost of the floe.
Number of FastROCS Random Starts If specified FastROCS will used the specified number of random starting orientations for each conformer being overlayed with FastROCS. If unspecified the default 4 inertial starts will be used. Compute time (i.e., cost) scales roughly linearly with the number of starts.
Type : integer
Required : False
Range : 4 to 100
Python Name : number_of_fastrocs_random_starts
Shape Only FastROCS Overlay If set to ‘On’ FastROCS will overlay molecules using shape only ignoring color. If set to ‘Off’ FastROCS will overlay molecules using shape&color. Note that this parameters affects the overlay process, but not the scoring. E.g., the overlay can be done with shape while the scoring is done with shape and color.
Number of Molecules to Re-score The number of top scoring molecules from FastROCS that will be sent to any of the enabled post-processing methods (ROCS and/or Docking). Note that the outputted hit lists will still be of the size specified by the ‘Hit List Size’ parameter, which is generally smaller than this number. Maximum allowed value 100,000,000. Minimum allowed value is 100,000.
Type : integer
Required : False
Default : 100000
Range : 100000 to 100000000
Python Name : number_of_molecules_to_re_score
ROCS Re-scoring Mode Type of ROCS re-scoring to do on the top scoring molecule from FastROCS if ‘Shape Query(s) for ROCS Re-scoring (Optional)’ is not specified (if it is specified this parameter is ignored and ROCS rescoring will be done with the shape query(s)). ‘Off’ : Turns of ROCS re-scoring of the top FastROCS Molecules. ‘Best FastROCS Query’: Overlay molecules onto the query FastROCS selected as the best query for the molecule. ‘All Queries’: Overlay molecules onto all queries and output the best overlay. WARNING: ‘All Queries’ mode can result in significant compute costs if there are many queries and molecules to re-score.
Type : string
Required : False
Default : Best FastROCS Query
Choices :Off, Best FastROCS Query, All Queries
Python Name : rocs_re_scoring_mode
Number of ROCS Re-scoring Random Starts If specified ROCS re-scoring will used the specified number of random starting orientations for each conformer being overlayed. If unspecified the default 4 inertial starts will be used. Compute time scales roughly linearly with the number of starts.
Type : integer
Required : False
Range : 4 to 100
Python Name : number_of_rocs_re_scoring_random_starts
ROCS Re-scoring Shape Query Similarity Type Similarity type to use in the ROCS Re-scoring step when shape queries for ROCS re-scoring have been set with the ‘Inputs -> Shape Query(s) for ROCS Re-scoring (Optional)’ parameter. This parameter is ignored if ‘Inputs -> Shape Query(s) for ROCS Re-scoring (Optional)’ is not set.
Type : string
Required : False
Default : Tanimoto Combo
Choices :Tanimoto Combo, Ref Tversky, Fit Tversky, Shape Tanimoto, Shape Ref Tversky, Shape Fit Tversky
Python Name : rocs_re_scoring_shape_query_similarity_type
Dock Re-scoring Mode Docking method to use to dock the top scoring molecules from FastROCS when design units are supplied to the floe (see ‘Design Unit(s) Optional’ parameter). ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster) that samples less and uses a simpler scoring function in the initial stages of docking. This parameter also determines how the ‘best design unit’ is selected when multiple design units are supplied to the floe. For ‘Fred’ and ‘FastFred’ mode ‘best design unit’ is the design unit with the best docking score, and for ‘Hybrid’ the ‘best design unit’ is the design unit with the most similar bound ligand. With multiple design units ‘Hybrid’ is a much more computationally efficient because each molecule is only docked once to the design unit with the most similar bound ligand, while ‘Fred’ and ‘FastFred’ modes dock each molecule to all the design units to determine which one gives the lowest score.
Type : string
Required : False
Default : Fred
Choices :Fred, Hybrid, Fast Fred
Python Name : dock_re_scoring_mode
Consensus Max Pareto Rank When re-scoring with both docking and rocs each molecule will get a pareto dominance rank based on the docking score and rocs similarity. Molecules with a pareto dominance rank higher than this number will be filtered out of the consensus hit lists. Minimum allowed value 0. Maximum allowed value 10.
Novelty Fingerprint Type Type of fingerprint to use to identify molecules that are 2D dissimilar to the query molecule in 2D space.
Type : string
Required : False
Default : Circular
Choices :Circular, Path, Tree
Python Name : novelty_fingerprint_type
Novelty Pareto Max Rank In the pareto consensus for novelty this is the maximum pareto domiance rank that will be allowed. Setting this to higher values will cause more molecule to appear in the FastROCS and ROCS novelty hit lists. Max value 10
These parameters control the AWS instance type the FastROCS Cube will use. There is in general no reason to adjust these. They are exposed because overall demand for GPU instances on AWS has occasionally been very high and this has led to extremely long run times for this floe as it waits for GPU instances in some circumstances.
FastROCS Instance Type The instances excluded by default are known to be not cost effective for FastROCS.
Spot instance policy for FastROCS GPU Instance. To run on SPOT instances use the default setting of ‘preferred’. To run on ON-DEMAND instances set the value to ‘prohibited’. ON-DEMAND instances typically cost x3-4 more than SPOT instances, but are more available than SPOT instances when overall demand for GPUs on AWS is high.
These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.
Input Query Mol Field Field on the input query dataset(s) with the query molecules. If unspecified the primary (i.e., default) molecule field will be used.
Type : field_parameter::mol
Required : False
Python Name : input_query_mol_field
Input Shape Query Field Field on the input query dataset(s) with the shape query to search against. If unspecified each record will be searched for a single shape query.
Type : field_parameter
Required : False
Python Name : input_shape_query_field
Query Design Unit Field Field on the query dataset(s) passed to ‘Input Query Dataset(s)’ holding a design unit with a ligand to use as the query. If unspecified the floe will use whatever design unit it can find on any field of each record, provided the record does not have multiple fields with design units.
Type : field_parameter
Required : False
Python Name : query_design_unit_field
Dock Design Unit Field Field on the datasets passed to ‘Design Unit(s) (Optional)’) that contains the design unit(s) to dock to. If unspecified the floe will use whatever design unit it can find on any field of each record, provided the record does not have multiple fields with design units.
Type : field_parameter
Required : False
Python Name : dock_design_unit_field
Dock Receptor Field Field on the datasets passed to ‘Design Unit(s) (Optional)’) that contains and old style receptor molecule to dock to. If unspecified the primary molecule field will be used.
These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.
Overlay Molecule Field Field on the output records that will hold the structure of the molecule overlayed by ROCS or FastROCS.
Type : field_parameter::mol
Required : False
Default : Overlay Molecule
Python Name : overlay_molecule_field
Tanimoto Combo Field Output field with the Tanimoto Combo. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Combo Similarity.
Type : field_parameter::float
Required : False
Default : Tanimoto Combo
Python Name : tanimoto_combo_field
Tanimoto Color Field Output field with the Color Tanimoto. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Color Similarity.
Type : field_parameter::float
Required : False
Default : Color Tanimoto
Python Name : tanimoto_color_field
Tanimoto Shape Field Output field with the Shape Tanimoto. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Shape Similarity.
Type : field_parameter::float
Required : False
Default : Shape Tanimoto
Python Name : tanimoto_shape_field
Tversky Combo Field Output field with the Tversky Combo. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Combo Similarity.
Type : field_parameter::float
Required : False
Default : Tversky Combo
Python Name : tversky_combo_field
Tversky Color Field Output field with the Color Tversky. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Color Similarity.
Type : field_parameter::float
Required : False
Default : Color Tversky
Python Name : tversky_color_field
Tversky Shape Field Output field with the Shape Tversky. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Shape Similarity.
Type : field_parameter::float
Required : False
Default : Shape Tversky
Python Name : tversky_shape_field
Best Query Field Output field for the query with the highest similarity to the fit molecule.
Type : field_parameter::mol
Required : False
Default : Query
Python Name : best_query_field
Best Query ID Field Output field for the ID of the query with the highest similarity to the molecule. This identifier will also appear in the dataset specified by the ‘Output Query Dataset’ parameter.
Type : field_parameter::int
Required : False
Default : Query ID
Python Name : best_query_id_field
Best Query Link Field Output field for a link to the query with the highest similarity to the molecule. The link will point to the query in the dataset specified by the ‘Output Query Dataset’ parameter.
Type : field_parameter::link
Required : False
Default : Query Link
Python Name : best_query_link_field
Bemis Murcko Field Output field for the Bemis Murcko core SMILES.
Type : field_parameter::string
Required : False
Default : Bemis Murcko
Python Name : bemis_murcko_field
Bemis Murcko ID Field Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
Type : field_parameter::int
Required : False
Default : Bemis Murcko ID
Python Name : bemis_murcko_id_field
Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
Type : field_parameter::int
Required : False
Default : Bemis Murcko Rank
Python Name : bemis_murcko_rank_field
Hetero Bemis Murcko Field Output field for the Hetero Bemis Murcko core SMILES.
Type : field_parameter::string
Required : False
Default : Hetero Bemis Murcko
Python Name : hetero_bemis_murcko_field
Hetero Bemis Murcko ID Field Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
Type : field_parameter::int
Required : False
Default : Hetero Bemis Murcko ID
Python Name : hetero_bemis_murcko_id_field
Hetero Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
Type : field_parameter::int
Required : False
Default : Hetero Bemis Murcko Rank
Python Name : hetero_bemis_murcko_rank_field
Docked Molecule Field Output field for the docked molecule. This field will only be created on the output records if design units are supplied to this floe.
Type : field_parameter::mol
Required : False
Default : Docked Molecule
Python Name : docked_molecule_field
Docked Score Field Output field for the score of the docked molecule. This field will only be created on the output records if design units are supplied to this floe.
Type : field_parameter::float
Required : True
Default : Chemgauss4
Python Name : docked_score_field
Design Unit ID Field Output field for the ID of the design unit the molecule scores best in. This field will only be created on the output records if design units are supplied to this floe.
Type : field_parameter::int
Required : False
Default : Design Unit ID
Python Name : design_unit_id_field
Design Unit Link Field Output field for a Link to the design unit the molecule scores best in. This field will only be created on the output records if design units are supplied to this floe.
Type : field_parameter::link
Required : False
Default : Design Unit Link
Python Name : design_unit_link_field
Steric Score Field Output field for the steric score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type : field_parameter::float
Required : False
Python Name : steric_score_field
Clash Score Field Output field for the clash score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type : field_parameter::float
Required : False
Python Name : clash_score_field
Protein Desolv Score Field Output field for the protein desolvation score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type : field_parameter::float
Required : False
Python Name : protein_desolv_score_field
Ligand Desolv Score Field Output field for the ligand desolvation score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type : field_parameter::float
Required : False
Python Name : ligand_desolv_score_field
Ligand Desolv HB Score Field Output field for the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type : field_parameter::float
Required : False
Python Name : ligand_desolv_hb_score_field
Hydrogen Bond Score Field Output field for the hydrogen bond score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type : field_parameter::float
Required : False
Python Name : hydrogen_bond_score_field
Consensus Pareto Dominance Rank Field Integer field on the output record holding the pareto dominance rank of the record.
Type : field_parameter::int
Required : True
Default : Pareto Rank
Python Name : consensus_pareto_dominance_rank_field
Highest 2D Tanimoto Output field for each molecules highest 2D Tanimoto to any query. This Tanimoto value is used to generate the FastROCS and ROCS novelty output hit lists
Type : field_parameter::float
Required : True
Default : 2D Tanimoto
Python Name : highest_2d_tanimoto
Most 2D Similar Query SMILES Field Output field holding the SMILES of the query molecule with the highest Tanimoto to the output molecule.
Type : field_parameter::string
Required : False
Default : Most 2D Similar Query SMILES
Python Name : most_2d_similar_query_smiles_field
Most 2D Similar Query Title Field Output field holding the Title of the query molecule with the highest Tanimoto to the output molecule.
Type : field_parameter::string
Required : False
Default : Most 2D Similar Query Title
Python Name : most_2d_similar_query_title_field
Fingerprint Type Field Output field holding the name of the type of fingerprint used in the 2D calculation. This field will only be added to the output if a value is entered for this parameter.