Gigadock

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Gigadock

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search/Gigadock

  • Task-based/Virtual Screening - Structure-Based

Description

This floe docks up to billions of molecules into one or more conformations of a target protein structure.

If this floe shuts down due to exceeding the cost threshold (see the ‘Cost Threshold’ parameter), a restart collection will be present at the end of the run. The floe can be continued from the point it was shut down, rerunning the floe and passing the restart collection to the ‘Input Conformer Collection’ in place of the original ‘Input Conformer Collection’ and/or ‘Input Dataset Collection’ while using the same settings for all other parameters. The resulting output will include the complete results (unless the restart floe also exceeds its cost threshold), so there is no need to merge results from the first run.

WARNING: Using the ‘Job Properties->Job Cost Limits->Terminate this job if the cost exceeds’ option with this floe will result in abrupt termination of the floe if the specified cost is exceeded. No output will be generated and the compute cost consumed by the floe will be wasted in this circumstance, and the job will not be restartable. Use the ‘Options->Cost Threshold’ parameter instead, which will shut the floe down in a controlled fashion that allows for a stopped job to be restarted.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and two otherwise. The behavior with multiple design units depends on the docking method. For Fred or FastFred, each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For Hybrid, each molecule will be docked only into the design unit with the crystallographically bound ligand most similar (by ROCS Tanimoto Combo) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per additional design unit.

  • Required

  • Type: data_source

Input Conformer Collection (molecule_input_collection): Input collection containing molecules to dock. The collection should have been created by the Prepare Giga Collections Floe. OpenEye has also prepared several large third-party vendor databases and collections. The Organization Data/OpenEye Data/Gigadocking Collections folder contains data curated and provided by OpenEye that is freely available to Orion customers.

  • Type: collection_source

Actives Dataset (Optional) (actives_in): Dataset containing conformer-enumerated known active molecules. If specified, these molecules will be docked to the receptor and their scores compared to the other input molecules in a generated floe report. A dataset named ‘AUC’ will also be created with the AUC of the actives compared to the input molecules.

  • Type: data_source

Inputs: Alternate

Input Conformer Dataset (molecule_input_dataset): Input dataset of molecules to dock that can be used in place of the normal Giga Docking collection input. The molecules must be conformer-enumerated: if necessary, this can be done with the OMEGA - 3D Conformer Ensemble Generation Floe.

  • Type: data_source

Outputs

Hit List Dataset (hit_list_output_dataset): Output dataset with the top 10000 scoring docked molecules.

  • Required

  • Type: dataset_out

  • Default: Hit List

Raw Results Collection (raw_results_collection): Output OERecord collection with all docked input molecules.

  • Required

  • Type: collection_sink

  • Default: Raw Results

Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.

  • Required

  • Type: dataset_out

  • Default: Receptor

AUC Dataset (auc_dataset): Output dataset with the AUC. This dataset will only be created if active molecules were passed to the floe.

  • Required

  • Type: dataset_out

  • Default: AUC

Docked Actives Dataset (docked_actives_dataset): Output dataset with the docked actives. This dataset will only be created if active molecules were passed to the floe.

  • Required

  • Type: dataset_out

  • Default: Docked Actives

Restart Collection (restart_collection): Name of the restart collection. The floe will create this collection to store the data needed to resume a run that was stopped due to exceeding the cost threshold. This collection will automatically be deleted by the floe if it finished without exceeding the threshold.

  • Required

  • Type: collection_sink

  • Default: Restart

Options

Cost Threshold (USD) (floe_cost_threshold): If the cost estimate of the floe exceeds this value (in US dollars), the floe will shut down. All molecules from the input collection or dataset will stop being sent for processing. Molecules already processed, or currently processing on an instance, will be output normally. A restart collection will be created that can be used to continue the calculation in a subsequent floe run. Note that the final cost of the run will exceed this threshold, as molecules currently out for processing when this threshold is crossed will finish processing before the floe is shut down.

  • Type: decimal

  • Default: 1000.0

Docking Method (docking_method): Docking method to use. Fred is the default structure based scoring method. Hybrid biases the the docking towards poses that overlay the crystallographic ligand (the design units must have a bound ligand). FastFred is a faster variant of Fred (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.

  • Type: string

  • Default: Fred

  • Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]

Output Tag(s) (output_user_tags): If specified, this tag will be applied to all output datasets and collections.

  • Type: string

  • Default: []

Input Fields

Design Unit Field (design_unit_field): Design unit field on the dataset passed to the Design Unit or Receptor Dataset(s) parameter. If unspecified, each record will be automatically searched for a single design unit to use, and if a record contains multiple design units, an error will be thrown.Note that datasets passed to Design Unit or Receptor Dataset(s) can contain either a receptor or design unit; they do not need both.

  • Type: field_parameter

Receptor Field (receptor_field): Molecule field on the datasets passed to the the Design Unit or Receptor Dataset(s) parameter to look for an old style OE receptor to dock to. If unspecified, the default primary molecule field will be searched for a receptor. Note that the datasets passed to Design Unit or Receptor Dataset(s) can contain either a receptor or design unit; they do not need both.

  • Type: field_parameter::mol

Input Conformer Field (input_conf_field): Field on the input record containing the input conformers. The default is the primary molecule field. If specified, this applies to all input conformers, that is, molecules passed to Input Collection Dataset, Input Conformer Dataset , and Actives. This field generally only needs to be specified if you are docking a dataset, rather than a collection, AND the conformers you want to dock are not in the default primary molecule field.

  • Type: field_parameter::mol

Output Fields

Docked Molecule Field (docked_mol_field): Field on the output records where the docked molecule will be placed. Default is the primary molecule field.

  • Type: field_parameter::mol

Design Unit ID Field (design_unit_id_field): Output field with with the ID of the design unit the molecule scores best in.

  • Type: field_parameter::int

  • Default: Design Unit ID

Design Unit Link Field (design_unit_link_field): Output field with a link to the design unit the molecule scores best in.

  • Type: field_parameter::link

  • Default: Design Unit Link

Docked Score Field (score_field): Field on the output record where the docked score will be placed.

  • Type: field_parameter::float

  • Default: Chemgauss4

Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Bemis Murcko SMILES

Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Bemis Murcko ID

Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Bemis Murcko Rank

Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Hetero Bemis Murcko

Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko ID

Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko Rank

Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float