FastROCS Plus
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/FastROCS
Role-based/Computational Chemist
Solution-based/Virtual-screening/DB Search/FastROCS
Task-based/Virtual Screening - Ligand-Based
Description
This floe overlays a FastROCS collection onto up to 200 shape or molecule queries and outputs a single FastROCS hit list using the best result (i.e., highest similarity to any query) to rank molecules. This floe also optionally rescores the top FastROCS molecules with ROCS, Docking, and a consensus of both, creating a separate hit list for each.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Query Dataset(s) (input_query_datasets): One or more datasets containing molecules and/or shape queries. Queries can come from multiple datasets or a single dataset with one or more queries. The 2D Sketcher can also be used to create a query, in which case a reasonable set of conformers of the sketched molecule will be generated and used as queries (see ‘Query Conformer Generation Mode’ option parameter). The total number of queries is limited to 200. Compute costs will scale roughly linearly with the number of queries PLUS 10,that is, cost is roughly proportional to <number of queries + 10>.
Required
Type: data_source
FastROCS Input Collection (fastrocs_input_collection): FastROCS collection to screen against. OpenEye supplies several curated vendor molecule collections in Organization Data. The Prepare Giga Collections or Giga Docking Collection to Hi-res FastROCS Collection Floes can also be used to create suitable collections for this floe.
Required
Type: collection_source
Design Unit(s) (Optional) (design_units_optional): If a design unit is supplied here, the top scoring molecules from FastROCS will be docked to the design unit and the results outputted in a separate hit list. Up to 10 design units can be supplied from one or more datasets. If multiple design units are supplied, a single docking hit list is still created using the score from the best design unit for each docked molecule (see the Dock Rescoring Mode parameter description).’
Type: data_source
Shape Query(s) for ROCS Re-scoring (Optional) (shape_querys_for_rocs_re_scoring_optional): Optional dataset with one or more shape queries to be used for the ROCS rescoring IN PLACE of the queries passed to Input Query Dataset(s). This dataset only accepts shape queries, not molecule queries. This parameter allows the ROCS rescoring step to use shape queries that are not supported by FastROCS (e.g., shape queries with grids). If a dataset is supplied to this parameter, ROCS rescoring will automatically be turned on and the setting of the Options: Rescoring -> ROCS Rescoring Mode parameter will be ignored.
Type: data_source
Outputs
FastROCS Hit List Dataset (fastrocs_hit_list_dataset): Output dataset that will contain the top hits directly from FastROCS.
Required
Type: dataset_out
Default: FastROCS Hit List
FastROCS Novelty Hit List Dataset (fastrocs_novelty_hit_list_dataset): Molecules in this output dataset will be sorted by FastROCS score. The molecule in this hit list contain molecules that tend to have high 3D similarity and low 2D similarity to the query(s).
Required
Type: dataset_out
Default: FastROCS Novelty Hit List
ROCS Hit List Dataset (rocs_hit_list_dataset): Output hit list dataset from ROCS rescoring of the top FastROCS hits. This dataset will not be created if the ROCS Rescoring Mode parameter is set to Off.
Required
Type: dataset_out
Default: ROCS Hit List
ROCS Novelty Hit List Dataset (rocs_novelty_hit_list_dataset): Molecules in this output dataset will be sorted by ROCS score. The molecule in this hit list contain molecules that tend to have high 3D similarity and low 2D similarity to the query(s).
Required
Type: dataset_out
Default: ROCS Novelty Hit List
Dock Hit List Dataset (dock_hit_list_dataset): Output hit list dataset from docking the top FastROCS hits. This dataset will only be created if at least one design unit is supplied to the Design Unit(s) (Optional) parameter.
Required
Type: dataset_out
Default: Dock Hit List
Consensus ROCS Hit List Dataset (consensus_rocs_hit_list_dataset): Consensus output hit list ranked by ROCS Combo Tanimoto. This hit list will only be created if both ROCS and Dock rescoring are enabled (see the Design Unit(s) (Optional) and ROCS Rescoring Mode parameters).
Required
Type: dataset_out
Default: Consensus ROCS Hit List
Consensus Dock Hit List Dataset (consensus_dock_hit_list_dataset): Consensus output hit list ranked by docking score. This hit list will only be created if both ROCS and Dock rescoring are enabled (see the Design Unit(s) (Optional) and ROCS Rescoring Mode parameters).
Required
Type: dataset_out
Default: Consensus Dock Hit List
Output Query Dataset (output_query_dataset): This output dataset will contain a copy of the input queries (see the Input Query Dataset(s) parameter). In addition to the query, this dataset will include an integer ID field that also appears in the ROCS/FastROCS hit list records, so the query associated with the molecule can be identified. This is primarily useful when multiple queries are used. Note that if the Shape Query(s) for ROCS Re-scoring (Optional) is specified, this dataset will not contain the queries used for FastROCS but not ROCS rescoring. See the Outputs-> Output ROCS Rescore Shape Query Dataset parameter in that case.
Required
Type: dataset_out
Default: FastROCS Queries
Output ROCS Re-score Shape Query Dataset (output_rocs_re_score_shape_query_dataset): This output dataset will contain a copy of the input shape query(s) set to Shape Query(s) for ROCS Rescoring (Optional). If the Shape Query(s) for ROCS Rescoring (Optional) is not specified, this dataset will not be created because the query(s) for ROCS rescoring will be the same as those for FastROCS (see the Outputs -> Output Query Dataset parameter).In addition to the query, this dataset will include an integer ID field that also appears in the ROCS hit list records, so the query associated with the molecule can be identified. This is primarily useful when multiple shape queries are used.
Required
Type: dataset_out
Default: ROCS Re-score Shape Queries
Design Unit Dataset (design_unit_dataset): Output dataset with a copy of the input design units (see the Design Unit(s) (Optional) parameter). In addition to the design unit, this dataset will include an integer ID field. This ID will also appear in the dock hit list records so the design unit associated with the molecule can be identified. This is primarily useful when multiple design units are used. This dataset will only be created if design units are sent to this floe.
Required
Type: dataset_out
Default: Design Units
Raw Results Collection (Optional) (raw_results_collection_optional): The name of an output collection that will contain a number of molecules approximately equal to the setting of the Options: Advanced -> Number of Molecules to Rescore parameter. If either ROCS or Docking rescoring is turned on (ROCS is on by default), this collection will contain the entire set of top scoring FastROCS molecules that were rescored. If both ROCS and Docking rescoring are turned off, the collection will contain the top scoring FastROCS molecules. The format of the individual shards of the collection are .oedb which can be read with the toolkits OEReadMolRecords function if downloaded locally. If this parameter is not specified, this output collection will not be created.
Type: collection_sink
Temporary Collection (temporary_collection): This collection is created by the floe for internal use during the run and is automatically deleted by the floe when it finishes.
Required
Type: collection_sink
Default: FastROCS Temporary Collection
Options
Hit List Size (hit_list_size): Size of all output hit lists. Max value 100,000. Min Value 1000.
Required
Type: integer
Default: 10000
FastROCS Similarity Type (fastrocs_similarity_type): Type of FastROCS similarity to use to rank molecules sent to the FastROCS, ROCS, and consensus ROCS hit lists. This method will also be used by ROCS rescoring if it is enabled (ROCS rescoring is enabled by default) and if shape queries explicitly for ROCS rescoring have not been set using the Shape Query(s) for ROCS Rescoring parameter.
Type: string
Default: Tanimoto Combo
Choices: [‘Tanimoto Combo’, ‘Ref Tversky’, ‘Fit Tversky’, ‘Shape Tanimoto’, ‘Shape Ref Tversky’, ‘Shape Fit Tversky’]
Options: Query
Query Conformer Generation Mode (query_conformer_generation_mode): Method used to generate conformer(s) of the molecules queries (shape queries are always accepted as is).
‘input’: Uses conformer of the molecule query as is (molecule queries without coordinates will fail in this mode).
‘omega’: Generate query molecule conformations with omega.
‘dock’: Use the best pose of the query molecule docked to the design unit(s) This mode requires that design units be supplied to the ‘Design Unit(s) (Optional)’ parameter.
‘auto’ : Molecule queries with 3d coordinates will be used as is. Molecules queries without 3d coordinates will used ‘dock’ mode if design unit(s) are supplied to the floe and ‘omega’ mode otherwise.
Type: string
Default: auto
Choices: [‘input’, ‘omega’, ‘dock’, ‘auto’]
Multi Conformer Mol Query Mode (multi_conformer_mol_query_mode): Controls how query molecules with multiple conformers are handled.
‘fail’: records with molecule with multiple conformers will fail.
‘active’: The active conformer of the molecule (this is typically the first conformer and often lowest energy) will be used.
‘first10’ : The first 10 conformers of the molecule will be used.
‘limit’: All conformers of the molecule up to this cubes limit for total queries.
‘all’ : all conformers of the molecule will be used as queries.
WARNING: using ‘all’ or ‘limit’ can significantly increase the cost of the floe.
Type: string
Default: first10
Choices: [‘fail’, ‘active’, ‘first10’, ‘limit’, ‘all’]
Options: FastROCS
Number of FastROCS Random Starts (number_of_fastrocs_random_starts): If specified, FastROCS will used the specified number of random starting orientations for each conformer being overlayed with FastROCS. If unspecified, the default 4 inertial starts will be used. Compute time (i.e., cost) scales roughly linearly with the number of starts.
Type: integer
Shape Only FastROCS Overlay (shape_only_fastrocs_overlay): If set to On, FastROCS will overlay molecules using shape only and ignoring color. If set to Off, FastROCS will overlay molecules using shape and color. Note that this parameter affects the overlay process, but not the scoring. For example, the overlay can be done with shape while the scoring is done with shape and color.
Type: boolean
Default: False
Choices: [True, False]
Options: Re-scoring
Number of Molecules to Re-score (number_of_molecules_to_re_score): The number of top scoring molecules from FastROCS that will be sent to any of the enabled postprocessing methods (ROCS and/or Docking). Note that the outputted hit lists will still be of the size specified by the Hit List Size parameter, which is generally smaller than this number. Maximum allowed value is 100,000,000. Minimum allowed value is 100,000.
Type: integer
Default: 100000
ROCS Re-scoring Mode (rocs_re_scoring_mode): Type of ROCS rescoring to do on the top scoring molecule from FastROCS if ‘Shape Query(s) for ROCS Rescoring (Optional)’ is not specified. If it is specified, this parameter is ignored and ROCS rescoring will be done with the shape query(s). ‘Off’ : Turns off ROCS rescoring of the top FastROCS molecules. ‘Best FastROCS Query’: Overlays molecules onto the query FastROCS selected as the best query for the molecule. ‘All Queries’: Overlays molecules onto all queries and outputs the best overlay. WARNING: ‘All Queries’ mode can result in significant compute costs if there are many queries and molecules to rescore.
Type: string
Default: Best FastROCS Query
Choices: [‘Off’, ‘Best FastROCS Query’, ‘All Queries’]
Number of ROCS Re-scoring Random Starts (number_of_rocs_re_scoring_random_starts): If specified, ROCS rescoring will use the specified number of random starting orientations for each conformer being overlayed. If unspecified, the default 4 inertial starts will be used. Compute time scales roughly linearly with the number of starts.
Type: integer
ROCS Re-scoring Shape Query Similarity Type (rocs_re_scoring_shape_query_similarity_type): Similarity type to use in the ROCS Rescoring step when shape queries for ROCS rescoring have been set with the Inputs -> Shape Query(s) for ROCS Rescoring (Optional) parameter. This parameter is ignored if Inputs -> Shape Query(s) for ROCS Rescoring (Optional) is not set.
Type: string
Default: Tanimoto Combo
Choices: [‘Tanimoto Combo’, ‘Ref Tversky’, ‘Fit Tversky’, ‘Shape Tanimoto’, ‘Shape Ref Tversky’, ‘Shape Fit Tversky’]
Dock Re-scoring Mode (dock_re_scoring_mode): Docking method to use to dock the top scoring molecules from FastROCS when design units are supplied to the floe (see Design Unit(s) Optional parameter). ‘Fred’ is the default structure-based scoring method. ‘Hybrid’ biases the the docking toward poses that overlay the crystallographic ligand (design units must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster) that samples less and uses a simpler scoring function in the initial stages of docking. This option also determines how the best design unit is selected when multiple design units are supplied to the floe. For ‘Fred’ and ‘FastFred’ modes, the best design unit is the design unit with the best docking score, and for ‘Hybrid’ mode, the best design unit is the design unit with the most similar bound ligand. With multiple design units, ‘Hybrid’ is much more computationally efficient because each molecule is only docked once to the design unit with the most similar bound ligand, while ‘Fred’ and ‘FastFred’ modes dock each molecule to all the design units to determine which one gives the lowest score.
Type: string
Default: Fred
Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]
Consensus Max Pareto Rank (consensus_max_pareto_rank): When rescoring with both docking and ROCS, each molecule will get a pareto dominance rank based on the docking score and ROCS similarity. Molecules with a pareto dominance rank higher than this number will be filtered out of the consensus hit lists. Minimum allowed value is 0. Maximum allowed value is 10.
Type: integer
Default: 4
Options: Novelty
Novelty Fingerprint Type (novelty_fingerprint_type): Type of fingerprint to use to identify molecules that are 2D dissimilar to the query molecule in 2D space.
Type: string
Default: Circular
Choices: [‘Circular’, ‘Path’, ‘Tree’]
Novelty Pareto Max Rank (novelty_pareto_max_rank): In the pareto consensus for novelty, this is the maximum pareto dominance rank that will be allowed. Setting this to higher values will cause more molecules to appear in the FastROCS and ROCS novelty hit lists. Max value is 10.
Type: integer
Default: 10
GPU Hardware
FastROCS Instance Type (fastrocs_instance_type): The instances excluded by default are known to be not cost effective for FastROCS.
Type: string
Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.
Spot Instance Policy for FastROCS GPU Instance (spot_instance_policy_for_fastrocs_gpu_instance): To run on SPOT instances, use the default setting of Preferred. To run on ON-DEMAND instances, set the value to Prohibited. ON-DEMAND instances typically cost 3–4 times more than SPOT instances, but are more available than SPOT instances when overall demand for GPUs on AWS is high.
Type: string
Default: Preferred
Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]
Input Fields
Input Query Mol Field (input_query_mol_field): Field on the input query dataset with the query molecules. If unspecified, the primary (default) molecule field will be used.
Type: field_parameter::mol
Input Shape Query Field (input_shape_query_field): Field on the input query dataset with the shape query to search against. If unspecified, each record will be searched for a single shape query.
Type: field_parameter
Query Design Unit Field (query_design_unit_field): Field on the query dataset passed to Input Query Dataset(s) holding a design unit with a ligand to use as the query. If unspecified, the floe will use whatever design unit it can find on any field of each record, provided the record does not have multiple fields with design units.
Type: field_parameter
Dock Design Unit Field (dock_design_unit_field): Field on the datasets passed to Design Unit(s) (Optional) that contains the design unit(s) to dock to. If unspecified, the floe will use whatever design unit it can find on any field of each record, provided the record does not have multiple fields with design units.
Type: field_parameter
Dock Receptor Field (dock_receptor_field): Field on the datasets passed to Design Unit(s) (Optional) that contains an old style receptor molecule to dock to. If unspecified, the primary molecule field will be used.
Type: field_parameter::mol
Output Fields
Overlay Molecule Field (overlay_molecule_field): Field on the output records that will hold the structure of the molecule overlayed by ROCS or FastROCS.
Type: field_parameter::mol
Default: Overlay Molecule
Tanimoto Combo Field (tanimoto_combo_field): Output field with the Tanimoto Combo. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Combo Similarity.
Type: field_parameter::float
Default: Tanimoto Combo
Tanimoto Color Field (tanimoto_color_field): Output field with the Color Tanimoto. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Color Similarity.
Type: field_parameter::float
Default: Color Tanimoto
Tanimoto Shape Field (tanimoto_shape_field): Output field with the Shape Tanimoto. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Shape Similarity.
Type: field_parameter::float
Default: Shape Tanimoto
Tversky Combo Field (tversky_combo_field): Output field with the Tversky Combo. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Combo Similarity.
Type: field_parameter::float
Default: Tversky Combo
Tversky Color Field (tversky_color_field): Output field with the Color Tversky. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Color Similarity.
Type: field_parameter::float
Default: Color Tversky
Tversky Shape Field (tversky_shape_field): Output field with the Shape Tversky. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Shape Similarity.
Type: field_parameter::float
Default: Shape Tversky
Best Query Field (best_query_field): Output field for the query with the highest similarity to the hit molecule.
Type: field_parameter::mol
Default: Query
Best Query ID Field (best_query_id_field): Output field for the ID of the query with the highest similarity to the molecule. This identifier will also appear in the dataset specified by the Output Query Dataset parameter.
Type: field_parameter::int
Default: Query ID
Best Query Link Field (best_query_link_field): Output field for a link to the query with the highest similarity to the molecule. The link will point to the query in the dataset specified by the Output Query Dataset parameter.
Type: field_parameter::link
Default: Query Link
Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Bemis Murcko
Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Bemis Murcko ID
Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Bemis Murcko Rank
Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Hetero Bemis Murcko
Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Hetero Bemis Murcko ID
Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Hetero Bemis Murcko Rank
Docked Molecule Field (docked_molecule_field): Output field for the docked molecule. This field will only be created on the output records if design units are supplied to this floe.
Type: field_parameter::mol
Default: Docked Molecule
Docked Score Field (docked_score_field): Output field for the score of the docked molecule. This field will only be created on the output records if design units are supplied to this floe.
Required
Type: field_parameter::float
Default: Chemgauss4
Design Unit ID Field (design_unit_id_field): Output field for the ID of the design unit the molecule scores best in. This field will only be created on the output records if design units are supplied to this floe.
Type: field_parameter::int
Default: Design Unit ID
Design Unit Link Field (design_unit_link_field): Output field for a link to the design unit the molecule scores best in. This field will only be created on the output records if design units are supplied to this floe.
Type: field_parameter::link
Default: Design Unit Link
Steric Score Field (steric_score_field): Output field for the steric score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type: field_parameter::float
Clash Score Field (clash_score_field): Output field for the clash score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type: field_parameter::float
Protein Desolv Score Field (protein_desolv_score_field): Output field for the protein desolvation score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type: field_parameter::float
Ligand Desolv Score Field (ligand_desolv_score_field): Output field for the ligand desolvation score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type: field_parameter::float
Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field for the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type: field_parameter::float
Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field for the hydrogen bond score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.
Type: field_parameter::float
Consensus Pareto Dominance Rank Field (consensus_pareto_dominance_rank_field): Integer field on the output record holding the pareto dominance rank of the record.
Required
Type: field_parameter::int
Default: Pareto Rank
Highest 2D Tanimoto (highest_2d_tanimoto): Output field for the highest 2D Tanimoto of each molecule to any query. This Tanimoto value is used to generate the FastROCS and ROCS novelty output hit lists.
Required
Type: field_parameter::float
Default: 2D Tanimoto
Most 2D Similar Query SMILES Field (most_2d_similar_query_smiles_field): Output field holding the SMILES of the query molecule with the highest Tanimoto to the output molecule.
Type: field_parameter::string
Default: Most 2D Similar Query SMILES
Most 2D Similar Query Title Field (most_2d_similar_query_title_field): Output field holding the title of the query molecule with the highest Tanimoto to the output molecule.
Type: field_parameter::string
Default: Most 2D Similar Query Title
Fingerprint Type Field (fingerprint_type_field): Output field holding the name of the type of fingerprint used in the 2D calculation. This field will only be added to the output if a value is entered for this parameter.
Type: field_parameter::string