Gigadock
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/Gigadock
Role-based/Computational Chemist
Solution-based/Virtual-screening/DB Search/Gigadock
Task-based/Virtual Screening - Structure-Based
Description
Floe for docking up to billions of molecules into one or more conformations of a target protein structure.
If this Floe shuts down due to exceeding the cost threshold (see the ‘Cost Threshold’ parameter) a restart collection will be present at the end of the run. The Floe can be continued from the point it was shutdown re-running the Floe and passing the restart collection to the ‘Input Conformer Collection’ in place of the original ‘Input Conformer Collection’ and/or ‘Input Dataset Collection’ while using the same settings for all other parameters. The resulting output will include the complete results (unless the restart Floe also exceeds its cost threshold), so there is no need to merge results from the first run.
WARNING: Using ‘Job Properties->Job Cost Limits->Terminate this job if the cost exceeds’ option with this floe will result in abrupt termination of the floe if the specified cost is exceeded. No output will generated and the compute cost consumed by the floe will be wasted in this circumstand and the job will not be restartable. Use the ‘Options->Cost Threshold’ parameter instead, which will shut the floe down in a controlled fashion that allows for a stopped job to be restarted.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.
Required
Type: data_source
Input Conformer Collection (molecule_input_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.
Type: collection_source
Actives Dataset (Optional) (actives_in): Dataset containing conformer-enumerated known active molecules. If specified, these molecules will be docked to the receptor and their scores compared to the other input molecules in a generated floe report. A dataset named ‘AUC’ will also be created with the AUC of the actives compared to the input molecules.
Type: data_source
Inputs: Alternate
Input Conformer Dataset (molecule_input_dataset): Input dataset of molecules to dock that can be used in place of the normal Giga Docking collection input. The molecules must be conformer-enumerated (this can be done with the ‘Classic Omega’ floe).
Type: data_source
Outputs
Hit List Dataset (hit_list_output_dataset): Output dataset with the top 10000 scoring docked molecules.
Required
Type: dataset_out
Default: Hit List
Raw Results Collection (raw_results_collection): Output OERecord collection with all docked input molecules.
Required
Type: collection_sink
Default: Raw Results
Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.
Required
Type: dataset_out
Default: Receptor
AUC Dataset (auc_dataset): Output dataset with the AUC. This dataset will only be created if active molecules were passed to the Floe.
Required
Type: dataset_out
Default: AUC
Docked Actives Dataset (docked_actives_dataset): Output dataset with the docked actives. This dataset will only be created if active molecules were passed to the Floe.
Required
Type: dataset_out
Default: Docked Actives
Restart Collection (restart_collection): Name of the restart collection. The floe will create this collection to store the data needed to resume a run that was stopped due to exceeding the cost threshold. This collection will automatically be deleted by the floe if it finished without exceeding the threshold.
Required
Type: collection_sink
Default: Restart
Options
Cost Threshold ($USD) (floe_cost_threshold): If the cost estimate of the floe exceeds this value (in U.S. Dollars) the floe will shut down. All molecules from the input collection or dataset will stop being sent for processing. Molecules already processed, or currently processing on an instance, will be output normally. A restart collection will be created that can be used to continue the calculation in a subsequent floe run. Note that the final cost of the run will exceed this threshold, as molecules currently out for processing when this threshold is crossed will finish processing before the floe is shut down.
Type: decimal
Default: 1000.0
Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.
Type: string
Default: Fred
Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]
Output Tag(s) (output_user_tags): If specified this tag(s) will be applied to all output datasets and collections.
Type: string
Default: []
Input Fields
Design Unit Field (design_unit_field): Design Unit field on the dataset(s) passed to the ‘Design Unit or Receptor Dataset(s)’ parameter. If unspecified each record will be automatically searched for a single design unit to use, and if a record contains multiple design units an error will be thrown.Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.
Type: field_parameter
Receptor Field (receptor_field): Molecule field on the dataset(s) passed to the the ‘Design Unit or Receptor Dataset(s)’ parameter to look for an old style OE receptor to dock to. If unspecified the default primary molecule field will be searched for a receptor. Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.
Type: field_parameter::mol
Input Conformer Field (input_conf_field): Field on the input record containing the input conformers. Default is the primary molecule field. If specified this applies to all input conformers, i.e., molecules passed to ‘Input Collection Dataset’, ‘Input Conformer Dataset’ , and ‘Actives’. This field generally only needs to be specified if you are docking a dataset, rather than a collection, AND the conformers you want to dock are not in the default primary molecule field.
Type: field_parameter::mol
Output Fields
Docked Molecule Field (docked_mol_field): Field on the output records where the docked molecule will be placed. Default is the primary molecule field.
Type: field_parameter::mol
Design Unit ID Field (design_unit_id_field): Output field with with the ID of the design unit the molecule scores best in
Type: field_parameter::int
Default: Design Unit ID
Design Unit Link Field (design_unit_link_field): Output field with a Link to the design unit the molecule scores best in
Type: field_parameter::link
Default: Design Unit Link
Docked Score Field (score_field): Field on the output record where the docked score will be placed
Type: field_parameter::float
Default: Chemgauss4
Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Bemis Murcko SMILES
Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Bemis Murcko ID
Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Bemis Murcko Rank
Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.
Type: field_parameter::string
Default: Hetero Bemis Murcko
Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
Type: field_parameter::int
Default: Hetero Bemis Murcko ID
Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
Type: field_parameter::int
Default: Hetero Bemis Murcko Rank
Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
Type: field_parameter::float
Development
Catch exceptions (catch_exceptions): If Off exception handling will be disabled for this cube.
Type: boolean
Default: True
Choices: [True, False]
Catch exceptions (parallel_catch_exception_methods): Specifies which methods of a parallel cube an exception will be caught and emitted to the exception port if the port is connected. If the exception port is connected to an exception handler this will stop the floe
Type: string
Default: [‘begin’]
Choices: [‘begin’, ‘process’, ‘end’]
Enable cube timing report (time_all_cubes): If true this cube will emit timing information to the timing_data port.
Type: boolean
Default: True
Choices: [True, False]
Min Shard Download Timeout (min_shard_download_timeout): Minimum timeout for the smart shard to records cubes and cost check cubes (the later are not ‘smart’)
Required
Type: integer
Default: 30.0
Max Shard Download Timeout (max_shard_download_timeout): Maximum timeout for the smart shard to records cubes
Required
Type: integer
Default: 21600.0
Session Retry Dict for Shard Download (session_retry_dict_for_shard_download_): Session retry dict for the smart shard to records cubes
Type: string
Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]
Shard Download Attempts (shard_download_attempts): Download attempts for the smart shard to records cubes
Type: integer
Default: 1
Min Shard Upload Timeout (min_shard_upload_timeout): Minimum timeout for the smart records to record shard cubes
Required
Type: integer
Default: 2
Max Shard Upload Timeout (max_shard_upload_timeout): Maximum timeout for the smart records to record shard cubes
Required
Type: integer
Default: 21600.0
Session Retry Dict for Shard Upload (session_retry_dict_for_shard_upload_): Session retry dict for the smart records to record shard cubes
Type: string
Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]
Shard Upload Attempts (shard_upload_attempts): Download attempts for the smart record to record shard cubes
Type: integer
Default: 3
Serial Cube Retry Timeout (serial_cube_retry_timeout): Sets the retry timeout (sec) on the cube_session OrionSession for this cube. If unspecified parallel cubes will uses a value of 600 and serial cubes will use a value of 7200.
Type: integer
Parallel Scaling Group (parallel_instance_type): If specified, parallel cubes will only scale up on the specified spot scaling group. If unspecified, Orion will select the scaling groups to scale up on, possibly on multiple scaling group types but always spot instances.
Type: string
Default: c5,c6,c7
Max CPUs (primary_max_parallel): Maximum number of CPUs to use for the primary calculation. This setting is for the maximum number of CPUs that can be used, the actual number used can be less, e.g. if a relatively small number of molecules are sent to the floe.
Type: integer
Default: 25000
Docked Record Cache Size (docked_record_cache_size): Number of records to accumulate before docking the accumulated records. If set to 0 records will only be docked when end is called.
Type: integer
Default: 1000
Non Hybrid Receptor Limit (non_hybrid_receptor_limit): Limit on the total number of design units that can be passed to this cube if hybrid design unit design unit selection is not being used. If more than this number are passed in then those past the limit will be sent to the failure port.
Type: integer
Default: 2
Dock cube memory in MB (dock_cube_memory_in_mb): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Type: decimal
Default: 1800