Gigadock Warp Classic

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Gigadock

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search/Gigadock

  • Task-based/Virtual Screening - Structure-Based

Description

Approximates a full GigaDock run with a mixture of FastROCS and docking.

  1. Dock a random subset of molecules

  2. Cluster top docked molecules and select top scoring cluster heads

  3. Runs FastROCS on all input molecules using top scoring poses from the previous step as queries

  4. Re-dock the top scoring molecules from FastROCS

  5. Output Hit List of top scoring docked molecules.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.

  • Required

  • Type: data_source

Input Conformer Collection (molecule_input_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.

  • Required

  • Type: collection_source

Outputs

Hit List Dataset (hit_list_output_dataset): Output dataset with the top scoring docked molecules.

  • Required

  • Type: dataset_out

  • Default: Gigadock Warp Hit List

Queries (queries): Output dataset with the queries used by FastROCS. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.

  • Required

  • Type: dataset_out

  • Default: Gigadock Warp Queries

Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.

  • Required

  • Type: dataset_out

  • Default: Gigadock Warp Design Unit

Temporary Collections (temporary_collections): This temporary collection is used by the floe during the run and automatically deleted at the end of the run.

  • Required

  • Type: collection_sink

  • Default: Gigadock Warp Temporary Collection

Options

Hit List Size (hit_list_size): Size of the final hit list with the top scoring docked molecules.

  • Required

  • Type: integer

  • Default: 10000

Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.

  • Type: string

  • Default: Fred

  • Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]

Options: Advanced

Random Dock Fraction (random_dock_fraction): Fraction of molecule from the input collection(s) to select at random and dock. The top scoring poses from this docking will be clustered and the top cluster heads used as queries for FastROCS

  • Required

  • Type: decimal

  • Default: 0.02

Final Dock Fraction (final_dock_fraction): The number of top scoring molecules from FastROCS that are passed to the final docking step is equal to this fraction of the size of the input collection(s)

  • Type: decimal

  • Default: 0.08

Number of FastROCS Queries (number_of_fastrocs_queries): Number of top scoring molecules from the docking of the random subset of collection molecules to use as queries for FastROCS

  • Required

  • Type: integer

  • Default: 50

Cluster FastROCS Queries (cluster_fastrocs_queries): If False the queries for FastROCS will be the top scoring molecules from docking a random subset of the molecules. If True the queries for FastROCS will be the top scoring cluster HEADS from docking a random subset of molecule.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

GPU Hardware

FastROCS Instance Type (fastrocs_instance_type): The instances excluded by default are known to be not cost effective for FastROCS.

  • Type: string

  • Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.

Spot instance policy for FastROCS GPU Instance. (spot_instance_policy_for_fastrocs_gpu_instance): To run on SPOT instances use the default setting of ‘preferred’. To run on ON-DEMAND instances set the value to ‘prohibited’. ON-DEMAND instances typically cost x3-4 more than SPOT instances, but are more available than SPOT instances when overall demand for GPUs on AWS is high.

  • Type: string

  • Default: Required

  • Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]

Output Fields

Docked Pose Field (docked_pose_field): Field on the output hit list containing the pose of the docked molecule. If unspecified the primary molecule field will be used.

  • Type: field_parameter::mol

Docked Score Field (score_field): Field on the output record where the docked score will be placed

  • Type: field_parameter::float

  • Default: Chemgauss4

Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Design Unit ID Field (design_unit_id_field): Output field with with the ID of the design unit the molecule scores best in

  • Type: field_parameter::int

  • Default: Design Unit ID

Design Unit Link Field (design_unit_link_field): Output field with a Link to the design unit the molecule scores best in

  • Type: field_parameter::link

  • Default: Design Unit Link

FastROCS Overlay Field (fastrocs_overlay_field): Field on the output hit list containing the best FastROCS overlay onto the query pose with the highest Tanimoto of any of the query poses. The query poses are generated by the floe by docking a random subset of the initial collection(s) and selecting the top scoring poses as queries for FastROCS.

  • Type: field_parameter::mol

  • Default: FastROCS Overlay

FastROCS Query Field (fastrocs_query_field): Field on the output hit list containing the query pose the docked pose best overlayed onto with FastROCS. The query poses are generated by the floe by docking a random subset of the initial collection(s) and selecting the top scoring poses as queries for FastROCS.

  • Type: field_parameter::mol

  • Default: FastROCS Query

Combo Tanimoto Field (combo_tanimoto_field): Name of the field with the FastROCS Combo Tanimoto Score.

  • Type: field_parameter::float

  • Default: FastROCS Combo Tanimoto

Shape Tanimoto Field (shape_tanimoto_field): Name of the field with the FastROCS Shape Tanimoto Score.

  • Type: field_parameter::float

  • Default: FastROCS Shape

Color Tanimoto Field (color_tanimoto_field): Name of the field with the FastROCS Color Tanimoto Score.

  • Type: field_parameter::float

  • Default: FastROCS Color

Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Bemis Murcko SMILES

Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Bemis Murcko ID

Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Bemis Murcko Rank

Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Hetero Bemis Murcko

Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko ID

Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko Rank

Development

Cutoff Sample Size Limit (cutoff_sample_size_limit): Hit list size threshold for best FastROCS hist above which sampling will be used to conserve memory.

  • Type: integer

  • Default: 10000000

Dock Random Subset ‘Item Count’ (dock_random_subset_item_count): Target molecule count (best effort) per shard group passed to the random dock.

  • Required

  • Type: integer

  • Default: 1000

Final Dock ‘Item Count’ (final_dock_item_count): Target molecule count (best effort) per shard group passed to the final docking.

  • Required

  • Type: integer

  • Default: 1000

Cluster Poses Charge Model (cluster_poses_charge_model): Charge model to use in the electrostatic similar part of the 3D similarity calculation.

  • Type: string

  • Default: elf10

  • Choices: [‘elf10’, ‘mmff’, ‘input’]

Cluster Poses Tanimoto Threshold (cluster_poses_tanimoto_threshold): Tanimoto Similarity threshold used to determine cluster centers. Larger values will result in more clusters with fewer conformers/poses in each cluster that are more similar to each other.

  • Type: decimal

  • Default: 0.85

Cluster Poses Shape Falloff (cluster_poses_shape_falloff): Distance at which the gaussian atom density is half it’s max value. This can be thought of roughly as the effective radius of the heavy atoms in the similarity model. Higher values mean that two poses can with atoms at are not exactly on top of each other can still have high a high similarity/Tanimoto

  • Type: decimal

  • Default: 2.0

Cluster Poses Charge Falloff (cluster_poses_charge_falloff): Distance at which the gaussian atom charge density is half it’s max value. Higher values mean that atoms with different partial charges are more likely to be considered similar and that poses with the same shape but differing partial charges can have high similarity/Tanimoto.

  • Type: decimal

  • Default: 0.25

Docked Subset Hit List Size (docked_subset_hit_list_size): Desired size of the hitlist. If this value is set to 0 no value will be emitted from this cube.

  • Required

  • Type: integer

  • Default: 5000

FastROCS Shard Size (fastrocs_shard_size): Total count on the input shards to accumulate before emitting a group of shards

  • Type: integer

  • Default: 100000

Verbose FastROCS Cube (verbose_fastrocs_cube): If On timing information will be sent to the log as the cube runs.

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Hit List Instance Type (hit_list_instance_type): The type of instance that this cube needs to be run on

  • Type: string

  • Default: r5

Enable cube timing report (time_all_cubes): If true this cube will emit timing information to the timing_data port.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Parallel Fast Fail Session Retry Timeout (parallel_fast_fail_session_retry_timeout): Sets the retry timeout (sec) on the cube_session OrionSession for this cube. If unspecified parallel cubes will uses a value of 600 and serial cubes will use a value of 7200.

  • Type: integer

  • Default: 120

Shard Download Attempts (shard_download_attempts): Number of attempts to make when downloading a shard

  • Type: integer

  • Default: 1

Parallel Session Retry Dict (parallel_session_retry_dict): Entry must be of the form ‘<status code>:<number of retries>’. Both <status code> and <number of retries> must be integer values.

  • Type: string

  • Default: [‘404:1’]