Gigadock
Description
Floe for docking up to billions of molecules into one or more conformations of a target protein structure.
If this Floe shuts down due to exceeding the cost threshold (see the ‘Cost Threshold’ parameter) a restart collection will be present at the end of the run. The Floe can be continued from the point it was shutdown re-running the Floe and passing the restart collection to the ‘Input Conformer Collection’ in place of the original ‘Input Conformer Collection’ and/or ‘Input Dataset Collection’ while using the same settings for all other parameters. The resulting output will include the complete results (unless the restart Floe also exceeds its cost threshold), so there is no need to merge results from the first run.
WARNING: Using ‘Job Properties->Job Cost Limits->Terminate this job if the cost exceeds’ option with this floe will result in abrupt termination of the floe if the specified cost is exceeded. No output will generated and the compute cost consumed by the floe will be wasted in this circumstand and the job will not be restartable. Use the ‘Options->Cost Threshold’ parameter instead, which will shut the floe down in a controlled fashion that allows for a stopped job to be restarted.
See also
This floe is used in the Dock One Million Molecules with Gigadock Floe tutorial.
Details
Title : GigadockTags : Large Scale Floes Giga-Docking Gigadock FRED HYBRID Docking Chemgauss4 Virtual ScreeningPython Name : #01_large_scale_docking_floe
Parameters
Inputs
Design Unit or Receptor Dataset(s) Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.Type : data_sourceRequired : TruePython Name : init_input_dataset Input Conformer Collection Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.Type : collection_sourceRequired : FalsePython Name : molecule_input_collection Actives Dataset (Optional) Dataset containing conformer-enumerated known active molecules. If specified, these molecules will be docked to the receptor and their scores compared to the other input molecules in a generated floe report. A dataset named ‘AUC’ will also be created with the AUC of the actives compared to the input molecules.Type : data_sourceRequired : FalsePython Name : actives_in
Inputs: Alternate
Input Conformer Dataset Input dataset of molecules to dock that can be used in place of the normal Giga Docking collection input. The molecules must be conformer-enumerated (this can be done with the ‘Classic Omega’ floe).Type : data_sourceRequired : FalsePython Name : molecule_input_dataset
Outputs
Hit List Dataset Output dataset with the top 10000 scoring docked molecules.Type : dataset_outRequired : TrueDefault : Hit ListPython Name : hit_list_output_dataset Raw Results Collection Output OERecord collection with all docked input molecules.Type : collection_sinkRequired : TrueDefault : Raw ResultsPython Name : raw_results_collection Output Design Unit(s) Dataset Output dataset containing a copy of the design unit(s) docked to.Type : dataset_outRequired : TrueDefault : ReceptorPython Name : output_design_units_dataset AUC Dataset Output dataset with the AUC. This dataset will only be created if active molecules were passed to the Floe.Type : dataset_outRequired : TrueDefault : AUCPython Name : auc_dataset Docked Actives Dataset Output dataset with the docked actives. This dataset will only be created if active molecules were passed to the Floe.Type : dataset_outRequired : TrueDefault : Docked ActivesPython Name : docked_actives_dataset Restart Collection Name of the restart collection. The floe will create this collection to store the data needed to resume a run that was stopped due to exceeding the cost threshold. This collection will automatically be deleted by the floe if it finished without exceeding the threshold.Type : collection_sinkRequired : TrueDefault : RestartPython Name : restart_collection
Options
Cost Threshold ($USD) If the cost estimate of the floe exceeds this value (in U.S. Dollars) the floe will shut down. All molecules from the input collection or dataset will stop being sent for processing. Molecules already processed, or currently processing on an instance, will be output normally. A restart collection will be created that can be used to continue the calculation in a subsequent floe run. Note that the final cost of the run will exceed this threshold, as molecules currently out for processing when this threshold is crossed will finish processing before the floe is shut down.Type : decimalRequired : FalseDefault : 1000.0Python Name : floe_cost_threshold Docking Method Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.Type : stringRequired : FalseDefault : FredChoices :Fred, Hybrid, Fast FredPython Name : docking_method Output Tag(s) If specified this tag(s) will be applied to all output datasets and collections.Type : stringRequired : FalseDefault : []Accepts Multiple ValuesPython Name : output_user_tags
Input Fields
These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.
Design Unit Field Design Unit field on the dataset(s) passed to the ‘Design Unit or Receptor Dataset(s)’ parameter. If unspecified each record will be automatically searched for a single design unit to use, and if a record contains multiple design units an error will be thrown.Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.Type : field_parameterRequired : FalsePython Name : design_unit_field Receptor Field Molecule field on the dataset(s) passed to the the ‘Design Unit or Receptor Dataset(s)’ parameter to look for an old style OE receptor to dock to. If unspecified the default primary molecule field will be searched for a receptor. Note that the dataset(s) passed to ‘Design Unit or Receptor Dataset(s)’ can contain either a receptor or design unit, they do not need both.Type : field_parameter::molRequired : FalsePython Name : receptor_field Input Conformer Field Field on the input record containing the input conformers. Default is the primary molecule field. If specified this applies to all input conformers, i.e., molecules passed to ‘Input Collection Dataset’, ‘Input Conformer Dataset’ , and ‘Actives’. This field generally only needs to be specified if you are docking a dataset, rather than a collection, AND the conformers you want to dock are not in the default primary molecule field.Type : field_parameter::molRequired : FalsePython Name : input_conf_field
Output Fields
These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.
Docked Molecule Field Field on the output records where the docked molecule will be placed. Default is the primary molecule field.Type : field_parameter::molRequired : FalsePython Name : docked_mol_field Design Unit ID Field Output field with with the ID of the design unit the molecule scores best inType : field_parameter::intRequired : FalseDefault : Design Unit IDPython Name : design_unit_id_field Design Unit Link Field Output field with a Link to the design unit the molecule scores best inType : field_parameter::linkRequired : FalseDefault : Design Unit LinkPython Name : design_unit_link_field Docked Score Field Field on the output record where the docked score will be placedType : field_parameter::floatRequired : FalseDefault : Chemgauss4Python Name : score_field Bemis Murcko Field Output field for the Bemis Murcko core SMILES.Type : field_parameter::stringRequired : FalseDefault : Bemis Murcko SMILESPython Name : bemis_murcko_field Bemis Murcko ID Field Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.Type : field_parameter::intRequired : FalseDefault : Bemis Murcko IDPython Name : bemis_murcko_id_field Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)Type : field_parameter::intRequired : FalseDefault : Bemis Murcko RankPython Name : bemis_murcko_rank_field Hetero Bemis Murcko Field Output field for the Hetero Bemis Murcko core SMILES.Type : field_parameter::stringRequired : FalseDefault : Hetero Bemis MurckoPython Name : hetero_bemis_murcko_field Hetero Bemis Murcko ID Field Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.Type : field_parameter::intRequired : FalseDefault : Hetero Bemis Murcko IDPython Name : hetero_bemis_murcko_id_field Hetero Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)Type : field_parameter::intRequired : FalseDefault : Hetero Bemis Murcko RankPython Name : hetero_bemis_murcko_rank_field Steric Score Field Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.Type : field_parameter::floatRequired : FalsePython Name : steric_score_field Clash Score Field Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.Type : field_parameter::floatRequired : FalsePython Name : clash_score_field Protein Desolv Score Field Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.Type : field_parameter::floatRequired : FalsePython Name : protein_desolv_score_field Ligand Desolv Score Field Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.Type : field_parameter::floatRequired : FalsePython Name : ligand_desolv_score_field Ligand Desolv HB Score Field Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.Type : field_parameter::floatRequired : FalsePython Name : ligand_desolv_hb_score_field Hydrogen Bond Score Field Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.Type : field_parameter::floatRequired : FalsePython Name : hydrogen_bond_score_field