Gigadock

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Gigadock

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search/Gigadock

  • Task-based/Virtual Screening - Structure-Based

Description

Floe for docking up to billions of molecules into one or more conformations of a target protein structure.

If this Floe shuts down due to exceeding the cost threshold (see the ‘Cost Threshold’ parameter) a restart collection will be present at the end of the run. The Floe can be continued from the point it was shutdown re-running the Floe and passing the restart collection to the ‘Input Conformer Collection’ in place of the original ‘Input Conformer Collection’ and/or ‘Input Dataset Collection’ while using the same settings for all other parameters. The resulting output will include the complete results (unless the restart Floe also exceeds its cost threshold), so there is no need to merge results from the first run.

WARNING: Using ‘Job Properties->Job Cost Limits->Terminate this job if the cost exceeds’ option with this floe will result in abrupt termination of the floe if the specified cost is exceeded. No output will generated and the compute cost consumed by the floe will be wasted in this circumstand and the job will not be restartable. Use the ‘Options->Cost Threshold’ parameter instead, which will shut the floe down in a controlled fashion that allows for a stopped job to be restarted.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.

  • Required

  • Type: data_source

Input Conformer Collection (molecule_input_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.

  • Type: collection_source

Actives Dataset (Optional) (actives_in): Dataset containing conformer-enumerated known active molecules. If specified, these molecules will be docked to the receptor and their scores compared to the other input molecules in a generated floe report. A dataset named ‘AUC’ will also be created with the AUC of the actives compared to the input molecules.

  • Type: data_source

Inputs: Alternate

Input Conformer Dataset (molecule_input_dataset): Input dataset of molecules to dock that can be used in place of the normal Giga Docking collection input. The molecules must be conformer-enumerated (this can be done with the ‘Classic Omega’ floe).

  • Type: data_source

Outputs

Hit List Dataset (hit_list_output_dataset): Output dataset with the top 10000 scoring docked molecules.

  • Required

  • Type: dataset_out

  • Default: Hit List

Raw Results Collection (raw_results_collection): Output OERecord collection with all docked input molecules.

  • Required

  • Type: collection_sink

  • Default: Raw Results

Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.

  • Required

  • Type: dataset_out

  • Default: Receptor

AUC Dataset (auc_dataset): Output dataset with the AUC. This dataset will only be created if active molecules were passed to the Floe.

  • Required

  • Type: dataset_out

  • Default: AUC

Docked Actives Dataset (docked_actives_dataset): Output dataset with the docked actives. This dataset will only be created if active molecules were passed to the Floe.

  • Required

  • Type: dataset_out

  • Default: Docked Actives

Restart Collection (restart_collection): Name of the restart collection. The floe will create this collection to store the data needed to resume a run that was stopped due to exceeding the cost threshold. This collection will automatically be deleted by the floe if it finished without exceeding the threshold.

  • Required

  • Type: collection_sink

  • Default: Restart

Options

Cost Threshold ($USD) (floe_cost_threshold): If the cost estimate of the floe exceeds this value (in U.S. Dollars) the floe will shut down. All molecules from the input collection or dataset will stop being sent for processing. Molecules already processed, or currently processing on an instance, will be output normally. A restart collection will be created that can be used to continue the calculation in a subsequent floe run. Note that the final cost of the run will exceed this threshold, as molecules currently out for processing when this threshold is crossed will finish processing before the floe is shut down.

  • Type: decimal

  • Default: 1000.0

Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.

  • Type: string

  • Default: Fred

  • Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]

Output Tag(s) (output_user_tags): If specified this tag(s) will be applied to all output datasets and collections.

  • Type: string

  • Default: []

Input Fields

Design Unit Field (design_unit_field): Design Unit field on the dataset(s) passed to the ‘Design Unit or Receptor Dataset(s)’ parameter. If unspecified each record will be automatically searched for a single design unit to use, and if a record contains multiple design units an error will be thrown.Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.

  • Type: field_parameter

Receptor Field (receptor_field): Molecule field on the dataset(s) passed to the the ‘Design Unit or Receptor Dataset(s)’ parameter to look for an old style OE receptor to dock to. If unspecified the default primary molecule field will be searched for a receptor. Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.

  • Type: field_parameter::mol

Input Conformer Field (input_conf_field): Field on the input record containing the input conformers. Default is the primary molecule field. If specified this applies to all input conformers, i.e., molecules passed to ‘Input Collection Dataset’, ‘Input Conformer Dataset’ , and ‘Actives’. This field generally only needs to be specified if you are docking a dataset, rather than a collection, AND the conformers you want to dock are not in the default primary molecule field.

  • Type: field_parameter::mol

Output Fields

Docked Molecule Field (docked_mol_field): Field on the output records where the docked molecule will be placed. Default is the primary molecule field.

  • Type: field_parameter::mol

Design Unit ID Field (design_unit_id_field): Output field with with the ID of the design unit the molecule scores best in

  • Type: field_parameter::int

  • Default: Design Unit ID

Design Unit Link Field (design_unit_link_field): Output field with a Link to the design unit the molecule scores best in

  • Type: field_parameter::link

  • Default: Design Unit Link

Docked Score Field (score_field): Field on the output record where the docked score will be placed

  • Type: field_parameter::float

  • Default: Chemgauss4

Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Bemis Murcko SMILES

Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Bemis Murcko ID

Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Bemis Murcko Rank

Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Hetero Bemis Murcko

Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko ID

Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko Rank

Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Development

Catch exceptions (catch_exceptions): If Off exception handling will be disabled for this cube.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Catch exceptions (parallel_catch_exception_methods): Specifies which methods of a parallel cube an exception will be caught and emitted to the exception port if the port is connected. If the exception port is connected to an exception handler this will stop the floe

  • Type: string

  • Default: [‘begin’]

  • Choices: [‘begin’, ‘process’, ‘end’]

Enable cube timing report (time_all_cubes): If true this cube will emit timing information to the timing_data port.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Min Shard Download Timeout (min_shard_download_timeout): Minimum timeout for the smart shard to records cubes and cost check cubes (the later are not ‘smart’)

  • Required

  • Type: integer

  • Default: 30.0

Max Shard Download Timeout (max_shard_download_timeout): Maximum timeout for the smart shard to records cubes

  • Required

  • Type: integer

  • Default: 21600.0

Session Retry Dict for Shard Download (session_retry_dict_for_shard_download_): Session retry dict for the smart shard to records cubes

  • Type: string

  • Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]

Shard Download Attempts (shard_download_attempts): Download attempts for the smart shard to records cubes

  • Type: integer

  • Default: 1

Min Shard Upload Timeout (min_shard_upload_timeout): Minimum timeout for the smart records to record shard cubes

  • Required

  • Type: integer

  • Default: 2

Max Shard Upload Timeout (max_shard_upload_timeout): Maximum timeout for the smart records to record shard cubes

  • Required

  • Type: integer

  • Default: 21600.0

Session Retry Dict for Shard Upload (session_retry_dict_for_shard_upload_): Session retry dict for the smart records to record shard cubes

  • Type: string

  • Default: [‘429:1000’, ‘460:1000’, ‘500:1000’, ‘502:1000’, ‘503:1000’, ‘504:1000’]

Shard Upload Attempts (shard_upload_attempts): Download attempts for the smart record to record shard cubes

  • Type: integer

  • Default: 3

Serial Cube Retry Timeout (serial_cube_retry_timeout): Sets the retry timeout (sec) on the cube_session OrionSession for this cube. If unspecified parallel cubes will uses a value of 600 and serial cubes will use a value of 7200.

  • Type: integer

Parallel Scaling Group (parallel_instance_type): If specified, parallel cubes will only scale up on the specified spot scaling group. If unspecified, Orion will select the scaling groups to scale up on, possibly on multiple scaling group types but always spot instances.

  • Type: string

  • Default: c5,c6,c7

Max CPUs (primary_max_parallel): Maximum number of CPUs to use for the primary calculation. This setting is for the maximum number of CPUs that can be used, the actual number used can be less, e.g. if a relatively small number of molecules are sent to the floe.

  • Type: integer

  • Default: 25000

Docked Record Cache Size (docked_record_cache_size): Number of records to accumulate before docking the accumulated records. If set to 0 records will only be docked when end is called.

  • Type: integer

  • Default: 1000

Non Hybrid Receptor Limit (non_hybrid_receptor_limit): Limit on the total number of design units that can be passed to this cube if hybrid design unit design unit selection is not being used. If more than this number are passed in then those past the limit will be sent to the failure port.

  • Type: integer

  • Default: 2

Dock cube memory in MB (dock_cube_memory_in_mb): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 1800