Gigadock Warp Classic
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/Gigadock
Role-based/Computational Chemist
Solution-based/Virtual-screening/DB Search/Gigadock
Task-based/Virtual Screening - Structure-Based
Description
Approximates a full GigaDock run with a mixture of FastROCS and docking.
Dock a random subset of molecules
Cluster top docked molecules and select top scoring cluster heads
Runs FastROCS on all input molecules using top scoring poses from the previous step as queries
Re-dock the top scoring molecules from FastROCS
Output Hit List of top scoring docked molecules.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.
Required
Type: data_source
Input Conformer Collection (molecule_input_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.
Required
Type: collection_source
Outputs
Hit List Dataset (hit_list_output_dataset): Output dataset with the top scoring docked molecules.
Required
Type: dataset_out
Default: Gigadock Warp Hit List
Queries (queries): Output dataset with the queries used by FastROCS. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.
Required
Type: dataset_out
Default: Gigadock Warp Queries
Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.
Required
Type: dataset_out
Default: Gigadock Warp Design Unit
Temporary Collections (temporary_collections): This temporary collection is used by the floe during the run and automatically deleted at the end of the run.
Required
Type: collection_sink
Default: Gigadock Warp Temporary Collection
Options
Hit List Size (hit_list_size): Size of the final hit list with the top scoring docked molecules.
Required
Type: integer
Default: 10000
Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.
Type: string
Default: Fred
Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]
Options: Advanced
Random Dock Fraction (random_dock_fraction): Fraction of molecule from the input collection(s) to select at random and dock. The top scoring poses from this docking will be clustered and the top cluster heads used as queries for FastROCS
Required
Type: decimal
Default: 0.02
Final Dock Fraction (final_dock_fraction): The number of top scoring molecules from FastROCS that are passed to the final docking step is equal to this fraction of the size of the input collection(s)
Type: decimal
Default: 0.08
Number of FastROCS Queries (number_of_fastrocs_queries): Number of top scoring molecules from the docking of the random subset of collection molecules to use as queries for FastROCS
Required
Type: integer
Default: 50
Cluster FastROCS Queries (cluster_fastrocs_queries): If False the queries for FastROCS will be the top scoring molecules from docking a random subset of the molecules. If True the queries for FastROCS will be the top scoring cluster HEADS from docking a random subset of molecule.
Required
Type: boolean
Default: False
Choices: [True, False]
GPU Hardware
FastROCS Instance Type (fastrocs_instance_type): The instances excluded by default are known to be not cost effective for FastROCS.
Type: string
Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.
Spot instance policy for FastROCS GPU Instance. (spot_instance_policy_for_fastrocs_gpu_instance): To run on SPOT instances use the default setting of ‘preferred’. To run on ON-DEMAND instances set the value to ‘prohibited’. ON-DEMAND instances typically cost x3-4 more than SPOT instances, but are more available than SPOT instances when overall demand for GPUs on AWS is high.
Type: string
Default: Required
Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]
Output Fields
Docked Pose Field (docked_pose_field): Field on the output hit list containing the pose of the docked molecule. If unspecified the primary molecule field will be used.
Type: field_parameter::mol
Docked Score Field (score_field): Field on the output record where the docked score will be placed
Type: field_parameter::float
Default: Chemgauss4
Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Design Unit ID Field (design_unit_id_field): Output field with with the ID of the design unit the molecule scores best in
Type: field_parameter::int
Default: Design Unit ID
Design Unit Link Field (design_unit_link_field): Output field with a Link to the design unit the molecule scores best in
Type: field_parameter::link
Default: Design Unit Link
FastROCS Overlay Field (fastrocs_overlay_field): Field on the output hit list containing the best FastROCS overlay onto the query pose with the highest Tanimoto of any of the query poses. The query poses are generated by the floe by docking a random subset of the initial collection(s) and selecting the top scoring poses as queries for FastROCS.
Type: field_parameter::mol
Default: FastROCS Overlay
FastROCS Query Field (fastrocs_query_field): Field on the output hit list containing the query pose the docked pose best overlayed onto with FastROCS. The query poses are generated by the floe by docking a random subset of the initial collection(s) and selecting the top scoring poses as queries for FastROCS.
Type: field_parameter::mol
Default: FastROCS Query
Combo Tanimoto Field (combo_tanimoto_field): Name of the field with the FastROCS Combo Tanimoto Score.
Type: field_parameter::float
Default: FastROCS Combo Tanimoto
Shape Tanimoto Field (shape_tanimoto_field): Name of the field with the FastROCS Shape Tanimoto Score.
Type: field_parameter::float
Default: FastROCS Shape
Color Tanimoto Field (color_tanimoto_field): Name of the field with the FastROCS Color Tanimoto Score.
Type: field_parameter::float
Default: FastROCS Color
Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Bemis Murcko SMILES
Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Bemis Murcko ID
Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Bemis Murcko Rank
Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Hetero Bemis Murcko
Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Hetero Bemis Murcko ID
Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Hetero Bemis Murcko Rank
Development
Cutoff Sample Size Limit (cutoff_sample_size_limit): Hit list size threshold for best FastROCS hist above which sampling will be used to conserve memory.
Type: integer
Default: 10000000
Dock Random Subset ‘Item Count’ (dock_random_subset_item_count): Target molecule count (best effort) per shard group passed to the random dock.
Required
Type: integer
Default: 1000
Final Dock ‘Item Count’ (final_dock_item_count): Target molecule count (best effort) per shard group passed to the final docking.
Required
Type: integer
Default: 1000
Cluster Poses Charge Model (cluster_poses_charge_model): Charge model to use in the electrostatic similar part of the 3D similarity calculation.
Type: string
Default: elf10
Choices: [‘elf10’, ‘mmff’, ‘input’]
Cluster Poses Tanimoto Threshold (cluster_poses_tanimoto_threshold): Tanimoto Similarity threshold used to determine cluster centers. Larger values will result in more clusters with fewer conformers/poses in each cluster that are more similar to each other.
Type: decimal
Default: 0.85
Cluster Poses Shape Falloff (cluster_poses_shape_falloff): Distance at which the gaussian atom density is half it’s max value. This can be thought of roughly as the effective radius of the heavy atoms in the similarity model. Higher values mean that two poses can with atoms at are not exactly on top of each other can still have high a high similarity/Tanimoto
Type: decimal
Default: 2.0
Cluster Poses Charge Falloff (cluster_poses_charge_falloff): Distance at which the gaussian atom charge density is half it’s max value. Higher values mean that atoms with different partial charges are more likely to be considered similar and that poses with the same shape but differing partial charges can have high similarity/Tanimoto.
Type: decimal
Default: 0.25
Docked Subset Hit List Size (docked_subset_hit_list_size): Desired size of the hitlist. If this value is set to 0 no value will be emitted from this cube.
Required
Type: integer
Default: 5000
FastROCS Shard Size (fastrocs_shard_size): Total count on the input shards to accumulate before emitting a group of shards
Type: integer
Default: 100000
Verbose FastROCS Cube (verbose_fastrocs_cube): If On timing information will be sent to the log as the cube runs.
Type: boolean
Default: False
Choices: [True, False]
Hit List Instance Type (hit_list_instance_type): The type of instance that this cube needs to be run on
Type: string
Default: r5
Enable cube timing report (time_all_cubes): If true this cube will emit timing information to the timing_data port.
Type: boolean
Default: True
Choices: [True, False]
Parallel Fast Fail Session Retry Timeout (parallel_fast_fail_session_retry_timeout): Sets the retry timeout (sec) on the cube_session OrionSession for this cube. If unspecified parallel cubes will uses a value of 600 and serial cubes will use a value of 7200.
Type: integer
Default: 120
Shard Download Attempts (shard_download_attempts): Number of attempts to make when downloading a shard
Type: integer
Default: 1
Parallel Session Retry Dict (parallel_session_retry_dict): Entry must be of the form ‘<status code>:<number of retries>’. Both <status code> and <number of retries> must be integer values.
Type: string
Default: [‘404:1’]