FastROCS Plus

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/FastROCS

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search/FastROCS

  • Task-based/Virtual Screening - Ligand-Based

Description

This floe overlays a FastROCS collection onto up to 200 shape or molecule queries and outputs a single FastROCS hit list using the best result (i.e., highest similarity to any query) to rank molecules. This floe also optionally rescores the top FastROCS molecules with ROCS, Docking, and a consensus of both, creating a separate hit list for each.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Query Dataset(s) (input_query_datasets): One or more datasets containing molecules and/or shape queries. Queries can come from multiple datasets or a single dataset with one or more queries. The 2D Sketcher can also be used to create a query, in which case a reasonable set of conformers of the sketched molecule will be generated and used as queries (see ‘Query Conformer Generation Mode’ option parameter). The total number of queries is limited to 200. Compute costs will scale roughly linearly with the number of queries PLUS 10,that is, cost is roughly proportional to <number of queries + 10>.

  • Required

  • Type: data_source

FastROCS Input Collection (fastrocs_input_collection): FastROCS collection to screen against. OpenEye supplies several curated vendor molecule collections in Organization Data. The Prepare Giga Collections or Giga Docking Collection to Hi-res FastROCS Collection Floes can also be used to create suitable collections for this floe.

  • Required

  • Type: collection_source

Design Unit(s) (Optional) (design_units_optional): If a design unit is supplied here, the top scoring molecules from FastROCS will be docked to the design unit and the results outputted in a separate hit list. Up to 10 design units can be supplied from one or more datasets. If multiple design units are supplied, a single docking hit list is still created using the score from the best design unit for each docked molecule (see the Dock Rescoring Mode parameter description).’

  • Type: data_source

Shape Query(s) for ROCS Re-scoring (Optional) (shape_querys_for_rocs_re_scoring_optional): Optional dataset with one or more shape queries to be used for the ROCS rescoring IN PLACE of the queries passed to Input Query Dataset(s). This dataset only accepts shape queries, not molecule queries. This parameter allows the ROCS rescoring step to use shape queries that are not supported by FastROCS (e.g., shape queries with grids). If a dataset is supplied to this parameter, ROCS rescoring will automatically be turned on and the setting of the Options: Rescoring -> ROCS Rescoring Mode parameter will be ignored.

  • Type: data_source

Outputs

FastROCS Hit List Dataset (fastrocs_hit_list_dataset): Output dataset that will contain the top hits directly from FastROCS.

  • Required

  • Type: dataset_out

  • Default: FastROCS Hit List

FastROCS Novelty Hit List Dataset (fastrocs_novelty_hit_list_dataset): Molecules in this output dataset will be sorted by FastROCS score. The molecule in this hit list contain molecules that tend to have high 3D similarity and low 2D similarity to the query(s).

  • Required

  • Type: dataset_out

  • Default: FastROCS Novelty Hit List

ROCS Hit List Dataset (rocs_hit_list_dataset): Output hit list dataset from ROCS rescoring of the top FastROCS hits. This dataset will not be created if the ROCS Rescoring Mode parameter is set to Off.

  • Required

  • Type: dataset_out

  • Default: ROCS Hit List

ROCS Novelty Hit List Dataset (rocs_novelty_hit_list_dataset): Molecules in this output dataset will be sorted by ROCS score. The molecule in this hit list contain molecules that tend to have high 3D similarity and low 2D similarity to the query(s).

  • Required

  • Type: dataset_out

  • Default: ROCS Novelty Hit List

Dock Hit List Dataset (dock_hit_list_dataset): Output hit list dataset from docking the top FastROCS hits. This dataset will only be created if at least one design unit is supplied to the Design Unit(s) (Optional) parameter.

  • Required

  • Type: dataset_out

  • Default: Dock Hit List

Consensus ROCS Hit List Dataset (consensus_rocs_hit_list_dataset): Consensus output hit list ranked by ROCS Combo Tanimoto. This hit list will only be created if both ROCS and Dock rescoring are enabled (see the Design Unit(s) (Optional) and ROCS Rescoring Mode parameters).

  • Required

  • Type: dataset_out

  • Default: Consensus ROCS Hit List

Consensus Dock Hit List Dataset (consensus_dock_hit_list_dataset): Consensus output hit list ranked by docking score. This hit list will only be created if both ROCS and Dock rescoring are enabled (see the Design Unit(s) (Optional) and ROCS Rescoring Mode parameters).

  • Required

  • Type: dataset_out

  • Default: Consensus Dock Hit List

Output Query Dataset (output_query_dataset): This output dataset will contain a copy of the input queries (see the Input Query Dataset(s) parameter). In addition to the query, this dataset will include an integer ID field that also appears in the ROCS/FastROCS hit list records, so the query associated with the molecule can be identified. This is primarily useful when multiple queries are used. Note that if the Shape Query(s) for ROCS Re-scoring (Optional) is specified, this dataset will not contain the queries used for FastROCS but not ROCS rescoring. See the Outputs-> Output ROCS Rescore Shape Query Dataset parameter in that case.

  • Required

  • Type: dataset_out

  • Default: FastROCS Queries

Output ROCS Re-score Shape Query Dataset (output_rocs_re_score_shape_query_dataset): This output dataset will contain a copy of the input shape query(s) set to Shape Query(s) for ROCS Rescoring (Optional). If the Shape Query(s) for ROCS Rescoring (Optional) is not specified, this dataset will not be created because the query(s) for ROCS rescoring will be the same as those for FastROCS (see the Outputs -> Output Query Dataset parameter).In addition to the query, this dataset will include an integer ID field that also appears in the ROCS hit list records, so the query associated with the molecule can be identified. This is primarily useful when multiple shape queries are used.

  • Required

  • Type: dataset_out

  • Default: ROCS Re-score Shape Queries

Design Unit Dataset (design_unit_dataset): Output dataset with a copy of the input design units (see the Design Unit(s) (Optional) parameter). In addition to the design unit, this dataset will include an integer ID field. This ID will also appear in the dock hit list records so the design unit associated with the molecule can be identified. This is primarily useful when multiple design units are used. This dataset will only be created if design units are sent to this floe.

  • Required

  • Type: dataset_out

  • Default: Design Units

Raw Results Collection (Optional) (raw_results_collection_optional): The name of an output collection that will contain a number of molecules approximately equal to the setting of the Options: Advanced -> Number of Molecules to Rescore parameter. If either ROCS or Docking rescoring is turned on (ROCS is on by default), this collection will contain the entire set of top scoring FastROCS molecules that were rescored. If both ROCS and Docking rescoring are turned off, the collection will contain the top scoring FastROCS molecules. The format of the individual shards of the collection are .oedb which can be read with the toolkits OEReadMolRecords function if downloaded locally. If this parameter is not specified, this output collection will not be created.

  • Type: collection_sink

Temporary Collection (temporary_collection): This collection is created by the floe for internal use during the run and is automatically deleted by the floe when it finishes.

  • Required

  • Type: collection_sink

  • Default: FastROCS Temporary Collection

Options

Hit List Size (hit_list_size): Size of all output hit lists. Max value 100,000. Min Value 1000.

  • Required

  • Type: integer

  • Default: 10000

FastROCS Similarity Type (fastrocs_similarity_type): Type of FastROCS similarity to use to rank molecules sent to the FastROCS, ROCS, and consensus ROCS hit lists. This method will also be used by ROCS rescoring if it is enabled (ROCS rescoring is enabled by default) and if shape queries explicitly for ROCS rescoring have not been set using the Shape Query(s) for ROCS Rescoring parameter.

  • Type: string

  • Default: Tanimoto Combo

  • Choices: [‘Tanimoto Combo’, ‘Ref Tversky’, ‘Fit Tversky’, ‘Shape Tanimoto’, ‘Shape Ref Tversky’, ‘Shape Fit Tversky’]

Options: Query

Query Conformer Generation Mode (query_conformer_generation_mode): Method used to generate conformer(s) of the molecules queries (shape queries are always accepted as is).

‘input’: Uses conformer of the molecule query as is (molecule queries without coordinates will fail in this mode).

‘omega’: Generate query molecule conformations with omega.

‘dock’: Use the best pose of the query molecule docked to the design unit(s) This mode requires that design units be supplied to the ‘Design Unit(s) (Optional)’ parameter.

‘auto’ : Molecule queries with 3d coordinates will be used as is. Molecules queries without 3d coordinates will used ‘dock’ mode if design unit(s) are supplied to the floe and ‘omega’ mode otherwise.

  • Type: string

  • Default: auto

  • Choices: [‘input’, ‘omega’, ‘dock’, ‘auto’]

Multi Conformer Mol Query Mode (multi_conformer_mol_query_mode): Controls how query molecules with multiple conformers are handled.

‘fail’: records with molecule with multiple conformers will fail.

‘active’: The active conformer of the molecule (this is typically the first conformer and often lowest energy) will be used.

‘first10’ : The first 10 conformers of the molecule will be used.

‘limit’: All conformers of the molecule up to this cubes limit for total queries.

‘all’ : all conformers of the molecule will be used as queries.

WARNING: using ‘all’ or ‘limit’ can significantly increase the cost of the floe.

  • Type: string

  • Default: first10

  • Choices: [‘fail’, ‘active’, ‘first10’, ‘limit’, ‘all’]

Options: FastROCS

Number of FastROCS Random Starts (number_of_fastrocs_random_starts): If specified, FastROCS will used the specified number of random starting orientations for each conformer being overlayed with FastROCS. If unspecified, the default 4 inertial starts will be used. Compute time (i.e., cost) scales roughly linearly with the number of starts.

  • Type: integer

Shape Only FastROCS Overlay (shape_only_fastrocs_overlay): If set to On, FastROCS will overlay molecules using shape only and ignoring color. If set to Off, FastROCS will overlay molecules using shape and color. Note that this parameter affects the overlay process, but not the scoring. For example, the overlay can be done with shape while the scoring is done with shape and color.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Options: Re-scoring

Number of Molecules to Re-score (number_of_molecules_to_re_score): The number of top scoring molecules from FastROCS that will be sent to any of the enabled postprocessing methods (ROCS and/or Docking). Note that the outputted hit lists will still be of the size specified by the Hit List Size parameter, which is generally smaller than this number. Maximum allowed value is 100,000,000. Minimum allowed value is 100,000.

  • Type: integer

  • Default: 100000

ROCS Re-scoring Mode (rocs_re_scoring_mode): Type of ROCS rescoring to do on the top scoring molecule from FastROCS if ‘Shape Query(s) for ROCS Rescoring (Optional)’ is not specified. If it is specified, this parameter is ignored and ROCS rescoring will be done with the shape query(s). ‘Off’ : Turns off ROCS rescoring of the top FastROCS molecules. ‘Best FastROCS Query’: Overlays molecules onto the query FastROCS selected as the best query for the molecule. ‘All Queries’: Overlays molecules onto all queries and outputs the best overlay. WARNING: ‘All Queries’ mode can result in significant compute costs if there are many queries and molecules to rescore.

  • Type: string

  • Default: Best FastROCS Query

  • Choices: [‘Off’, ‘Best FastROCS Query’, ‘All Queries’]

Number of ROCS Re-scoring Random Starts (number_of_rocs_re_scoring_random_starts): If specified, ROCS rescoring will use the specified number of random starting orientations for each conformer being overlayed. If unspecified, the default 4 inertial starts will be used. Compute time scales roughly linearly with the number of starts.

  • Type: integer

ROCS Re-scoring Shape Query Similarity Type (rocs_re_scoring_shape_query_similarity_type): Similarity type to use in the ROCS Rescoring step when shape queries for ROCS rescoring have been set with the Inputs -> Shape Query(s) for ROCS Rescoring (Optional) parameter. This parameter is ignored if Inputs -> Shape Query(s) for ROCS Rescoring (Optional) is not set.

  • Type: string

  • Default: Tanimoto Combo

  • Choices: [‘Tanimoto Combo’, ‘Ref Tversky’, ‘Fit Tversky’, ‘Shape Tanimoto’, ‘Shape Ref Tversky’, ‘Shape Fit Tversky’]

Dock Re-scoring Mode (dock_re_scoring_mode): Docking method to use to dock the top scoring molecules from FastROCS when design units are supplied to the floe (see Design Unit(s) Optional parameter). ‘Fred’ is the default structure-based scoring method. ‘Hybrid’ biases the the docking toward poses that overlay the crystallographic ligand (design units must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster) that samples less and uses a simpler scoring function in the initial stages of docking. This option also determines how the best design unit is selected when multiple design units are supplied to the floe. For ‘Fred’ and ‘FastFred’ modes, the best design unit is the design unit with the best docking score, and for ‘Hybrid’ mode, the best design unit is the design unit with the most similar bound ligand. With multiple design units, ‘Hybrid’ is much more computationally efficient because each molecule is only docked once to the design unit with the most similar bound ligand, while ‘Fred’ and ‘FastFred’ modes dock each molecule to all the design units to determine which one gives the lowest score.

  • Type: string

  • Default: Fred

  • Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]

Consensus Max Pareto Rank (consensus_max_pareto_rank): When rescoring with both docking and ROCS, each molecule will get a pareto dominance rank based on the docking score and ROCS similarity. Molecules with a pareto dominance rank higher than this number will be filtered out of the consensus hit lists. Minimum allowed value is 0. Maximum allowed value is 10.

  • Type: integer

  • Default: 4

Options: Novelty

Novelty Fingerprint Type (novelty_fingerprint_type): Type of fingerprint to use to identify molecules that are 2D dissimilar to the query molecule in 2D space.

  • Type: string

  • Default: Circular

  • Choices: [‘Circular’, ‘Path’, ‘Tree’]

Novelty Pareto Max Rank (novelty_pareto_max_rank): In the pareto consensus for novelty, this is the maximum pareto dominance rank that will be allowed. Setting this to higher values will cause more molecules to appear in the FastROCS and ROCS novelty hit lists. Max value is 10.

  • Type: integer

  • Default: 10

GPU Hardware

FastROCS Instance Type (fastrocs_instance_type): The instances excluded by default are known to be not cost effective for FastROCS.

  • Type: string

  • Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.

Spot Instance Policy for FastROCS GPU Instance (spot_instance_policy_for_fastrocs_gpu_instance): To run on SPOT instances, use the default setting of Preferred. To run on ON-DEMAND instances, set the value to Prohibited. ON-DEMAND instances typically cost 3–4 times more than SPOT instances, but are more available than SPOT instances when overall demand for GPUs on AWS is high.

  • Type: string

  • Default: Preferred

  • Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]

Input Fields

Input Query Mol Field (input_query_mol_field): Field on the input query dataset with the query molecules. If unspecified, the primary (default) molecule field will be used.

  • Type: field_parameter::mol

Input Shape Query Field (input_shape_query_field): Field on the input query dataset with the shape query to search against. If unspecified, each record will be searched for a single shape query.

  • Type: field_parameter

Query Design Unit Field (query_design_unit_field): Field on the query dataset passed to Input Query Dataset(s) holding a design unit with a ligand to use as the query. If unspecified, the floe will use whatever design unit it can find on any field of each record, provided the record does not have multiple fields with design units.

  • Type: field_parameter

Dock Design Unit Field (dock_design_unit_field): Field on the datasets passed to Design Unit(s) (Optional) that contains the design unit(s) to dock to. If unspecified, the floe will use whatever design unit it can find on any field of each record, provided the record does not have multiple fields with design units.

  • Type: field_parameter

Dock Receptor Field (dock_receptor_field): Field on the datasets passed to Design Unit(s) (Optional) that contains an old style receptor molecule to dock to. If unspecified, the primary molecule field will be used.

  • Type: field_parameter::mol

Output Fields

Overlay Molecule Field (overlay_molecule_field): Field on the output records that will hold the structure of the molecule overlayed by ROCS or FastROCS.

  • Type: field_parameter::mol

  • Default: Overlay Molecule

Tanimoto Combo Field (tanimoto_combo_field): Output field with the Tanimoto Combo. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Combo Similarity.

  • Type: field_parameter::float

  • Default: Tanimoto Combo

Tanimoto Color Field (tanimoto_color_field): Output field with the Color Tanimoto. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Color Similarity.

  • Type: field_parameter::float

  • Default: Color Tanimoto

Tanimoto Shape Field (tanimoto_shape_field): Output field with the Shape Tanimoto. This field will only be created if the score type is FastROCS Similarity Type is Tanimoto Combo. The value in this field is a duplicate of the value in Shape Similarity.

  • Type: field_parameter::float

  • Default: Shape Tanimoto

Tversky Combo Field (tversky_combo_field): Output field with the Tversky Combo. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Combo Similarity.

  • Type: field_parameter::float

  • Default: Tversky Combo

Tversky Color Field (tversky_color_field): Output field with the Color Tversky. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Color Similarity.

  • Type: field_parameter::float

  • Default: Color Tversky

Tversky Shape Field (tversky_shape_field): Output field with the Shape Tversky. This field will only be created if the score type is FastROCS Similarity Type is Fit Tversky or Ref Tversky. The value in this field is a duplicate of the value in Shape Similarity.

  • Type: field_parameter::float

  • Default: Shape Tversky

Best Query Field (best_query_field): Output field for the query with the highest similarity to the hit molecule.

  • Type: field_parameter::mol

  • Default: Query

Best Query ID Field (best_query_id_field): Output field for the ID of the query with the highest similarity to the molecule. This identifier will also appear in the dataset specified by the Output Query Dataset parameter.

  • Type: field_parameter::int

  • Default: Query ID

Best Query Link Field (best_query_link_field): Output field for a link to the query with the highest similarity to the molecule. The link will point to the query in the dataset specified by the Output Query Dataset parameter.

  • Type: field_parameter::link

  • Default: Query Link

Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Bemis Murcko

Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Bemis Murcko ID

Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Bemis Murcko Rank

Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Hetero Bemis Murcko

Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko ID

Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko Rank

Docked Molecule Field (docked_molecule_field): Output field for the docked molecule. This field will only be created on the output records if design units are supplied to this floe.

  • Type: field_parameter::mol

  • Default: Docked Molecule

Docked Score Field (docked_score_field): Output field for the score of the docked molecule. This field will only be created on the output records if design units are supplied to this floe.

  • Required

  • Type: field_parameter::float

  • Default: Chemgauss4

Design Unit ID Field (design_unit_id_field): Output field for the ID of the design unit the molecule scores best in. This field will only be created on the output records if design units are supplied to this floe.

  • Type: field_parameter::int

  • Default: Design Unit ID

Design Unit Link Field (design_unit_link_field): Output field for a link to the design unit the molecule scores best in. This field will only be created on the output records if design units are supplied to this floe.

  • Type: field_parameter::link

  • Default: Design Unit Link

Steric Score Field (steric_score_field): Output field for the steric score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.

  • Type: field_parameter::float

Clash Score Field (clash_score_field): Output field for the clash score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.

  • Type: field_parameter::float

Protein Desolv Score Field (protein_desolv_score_field): Output field for the protein desolvation score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv Score Field (ligand_desolv_score_field): Output field for the ligand desolvation score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field for the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.

  • Type: field_parameter::float

Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field for the hydrogen bond score component of the docked molecule. This field will only be created on the output records if design units are supplied to this floe and this parameter is specified.

  • Type: field_parameter::float

Consensus Pareto Dominance Rank Field (consensus_pareto_dominance_rank_field): Integer field on the output record holding the pareto dominance rank of the record.

  • Required

  • Type: field_parameter::int

  • Default: Pareto Rank

Highest 2D Tanimoto (highest_2d_tanimoto): Output field for the highest 2D Tanimoto of each molecule to any query. This Tanimoto value is used to generate the FastROCS and ROCS novelty output hit lists.

  • Required

  • Type: field_parameter::float

  • Default: 2D Tanimoto

Most 2D Similar Query SMILES Field (most_2d_similar_query_smiles_field): Output field holding the SMILES of the query molecule with the highest Tanimoto to the output molecule.

  • Type: field_parameter::string

  • Default: Most 2D Similar Query SMILES

Most 2D Similar Query Title Field (most_2d_similar_query_title_field): Output field holding the title of the query molecule with the highest Tanimoto to the output molecule.

  • Type: field_parameter::string

  • Default: Most 2D Similar Query Title

Fingerprint Type Field (fingerprint_type_field): Output field holding the name of the type of fingerprint used in the 2D calculation. This field will only be added to the output if a value is entered for this parameter.

  • Type: field_parameter::string