Gigadock

Description

Floe for docking up to billions of molecules into one or more conformations of a target protein structure.

If this Floe shuts down due to exceeding the cost threshold (see the ‘Cost Threshold’ parameter) a restart collection will be present at the end of the run. The Floe can be continued from the point it was shutdown re-running the Floe and passing the restart collection to the ‘Input Conformer Collection’ in place of the original ‘Input Conformer Collection’ and/or ‘Input Dataset Collection’ while using the same settings for all other parameters. The resulting output will include the complete results (unless the restart Floe also exceeds its cost threshold), so there is no need to merge results from the first run.

WARNING: Using ‘Job Properties->Job Cost Limits->Terminate this job if the cost exceeds’ option with this floe will result in abrupt termination of the floe if the specified cost is exceeded. No output will generated and the compute cost consumed by the floe will be wasted in this circumstand and the job will not be restartable. Use the ‘Options->Cost Threshold’ parameter instead, which will shut the floe down in a controlled fashion that allows for a stopped job to be restarted.

See also

This floe is used in the Dock One Million Molecules with Gigadock Floe tutorial.

Details

Title : Gigadock
Tags : Large Scale Floes Giga-Docking Gigadock FRED HYBRID Docking Chemgauss4 Virtual Screening
Python Name : #01_large_scale_docking_floe

Parameters

Inputs

  • Design Unit or Receptor Dataset(s) Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.
    Type : data_source
    Required : True
    Python Name : init_input_dataset
  • Input Conformer Collection Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.
    Type : collection_source
    Required : False
    Python Name : molecule_input_collection
  • Actives Dataset (Optional) Dataset containing conformer-enumerated known active molecules. If specified, these molecules will be docked to the receptor and their scores compared to the other input molecules in a generated floe report. A dataset named ‘AUC’ will also be created with the AUC of the actives compared to the input molecules.
    Type : data_source
    Required : False
    Python Name : actives_in

Inputs: Alternate

  • Input Conformer Dataset Input dataset of molecules to dock that can be used in place of the normal Giga Docking collection input. The molecules must be conformer-enumerated (this can be done with the ‘Classic Omega’ floe).
    Type : data_source
    Required : False
    Python Name : molecule_input_dataset

Outputs

  • Hit List Dataset Output dataset with the top 10000 scoring docked molecules.
    Type : dataset_out
    Required : True
    Default : Hit List
    Python Name : hit_list_output_dataset
  • Raw Results Collection Output OERecord collection with all docked input molecules.
    Type : collection_sink
    Required : True
    Default : Raw Results
    Python Name : raw_results_collection
  • Output Design Unit(s) Dataset Output dataset containing a copy of the design unit(s) docked to.
    Type : dataset_out
    Required : True
    Default : Receptor
    Python Name : output_design_units_dataset
  • AUC Dataset Output dataset with the AUC. This dataset will only be created if active molecules were passed to the Floe.
    Type : dataset_out
    Required : True
    Default : AUC
    Python Name : auc_dataset
  • Docked Actives Dataset Output dataset with the docked actives. This dataset will only be created if active molecules were passed to the Floe.
    Type : dataset_out
    Required : True
    Default : Docked Actives
    Python Name : docked_actives_dataset
  • Restart Collection Name of the restart collection. The floe will create this collection to store the data needed to resume a run that was stopped due to exceeding the cost threshold. This collection will automatically be deleted by the floe if it finished without exceeding the threshold.
    Type : collection_sink
    Required : True
    Default : Restart
    Python Name : restart_collection

Options

  • Cost Threshold ($USD) If the cost estimate of the floe exceeds this value (in U.S. Dollars) the floe will shut down. All molecules from the input collection or dataset will stop being sent for processing. Molecules already processed, or currently processing on an instance, will be output normally. A restart collection will be created that can be used to continue the calculation in a subsequent floe run. Note that the final cost of the run will exceed this threshold, as molecules currently out for processing when this threshold is crossed will finish processing before the floe is shut down.
    Type : decimal
    Required : False
    Default : 1000.0
    Python Name : floe_cost_threshold
  • Docking Method Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.
    Type : string
    Required : False
    Default : Fred
    Choices :Fred, Hybrid, Fast Fred
    Python Name : docking_method
  • Output Tag(s) If specified this tag(s) will be applied to all output datasets and collections.
    Type : string
    Required : False
    Default : []
    Accepts Multiple Values
    Python Name : output_user_tags

Input Fields

These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.

  • Design Unit Field Design Unit field on the dataset(s) passed to the ‘Design Unit or Receptor Dataset(s)’ parameter. If unspecified each record will be automatically searched for a single design unit to use, and if a record contains multiple design units an error will be thrown.Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.
    Type : field_parameter
    Required : False
    Python Name : design_unit_field
  • Receptor Field Molecule field on the dataset(s) passed to the the ‘Design Unit or Receptor Dataset(s)’ parameter to look for an old style OE receptor to dock to. If unspecified the default primary molecule field will be searched for a receptor. Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.
    Type : field_parameter::mol
    Required : False
    Python Name : receptor_field
  • Input Conformer Field Field on the input record containing the input conformers. Default is the primary molecule field. If specified this applies to all input conformers, i.e., molecules passed to ‘Input Collection Dataset’, ‘Input Conformer Dataset’ , and ‘Actives’. This field generally only needs to be specified if you are docking a dataset, rather than a collection, AND the conformers you want to dock are not in the default primary molecule field.
    Type : field_parameter::mol
    Required : False
    Python Name : input_conf_field

Output Fields

These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.

  • Docked Molecule Field Field on the output records where the docked molecule will be placed. Default is the primary molecule field.
    Type : field_parameter::mol
    Required : False
    Python Name : docked_mol_field
  • Design Unit ID Field Output field with with the ID of the design unit the molecule scores best in
    Type : field_parameter::int
    Required : False
    Default : Design Unit ID
    Python Name : design_unit_id_field
  • Design Unit Link Field Output field with a Link to the design unit the molecule scores best in
    Type : field_parameter::link
    Required : False
    Default : Design Unit Link
    Python Name : design_unit_link_field
  • Docked Score Field Field on the output record where the docked score will be placed
    Type : field_parameter::float
    Required : False
    Default : Chemgauss4
    Python Name : score_field
  • Bemis Murcko Field Output field for the Bemis Murcko core SMILES.
    Type : field_parameter::string
    Required : False
    Default : Bemis Murcko SMILES
    Python Name : bemis_murcko_field
  • Bemis Murcko ID Field Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
    Type : field_parameter::int
    Required : False
    Default : Bemis Murcko ID
    Python Name : bemis_murcko_id_field
  • Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
    Type : field_parameter::int
    Required : False
    Default : Bemis Murcko Rank
    Python Name : bemis_murcko_rank_field
  • Hetero Bemis Murcko Field Output field for the Hetero Bemis Murcko core SMILES.
    Type : field_parameter::string
    Required : False
    Default : Hetero Bemis Murcko
    Python Name : hetero_bemis_murcko_field
  • Hetero Bemis Murcko ID Field Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
    Type : field_parameter::int
    Required : False
    Default : Hetero Bemis Murcko ID
    Python Name : hetero_bemis_murcko_id_field
  • Hetero Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
    Type : field_parameter::int
    Required : False
    Default : Hetero Bemis Murcko Rank
    Python Name : hetero_bemis_murcko_rank_field
  • Steric Score Field Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : steric_score_field
  • Clash Score Field Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : clash_score_field
  • Protein Desolv Score Field Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : protein_desolv_score_field
  • Ligand Desolv Score Field Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : ligand_desolv_score_field
  • Ligand Desolv HB Score Field Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : ligand_desolv_hb_score_field
  • Hydrogen Bond Score Field Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : hydrogen_bond_score_field