Gigadock Warp

Description

Approximates a full Gigadock run with a mixture of AI, FastROCS and docking.

Details

Title : Gigadock Warp
Tags : Large Scale Floes Giga-Docking FRED HYBRID Docking Chemgauss4 Virtual Screening Model Pytorch Floes Prediction
Python Name : gigadock_warp

Parameters

Inputs

  • Design Unit or Receptor Dataset(s) Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.
    Type : data_source
    Required : False
    Python Name : init_input_dataset
  • Input Conformer Collection Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.
    Type : collection_source
    Required : True
    Python Name : input_conformer_collection

Outputs

  • Hit List Dataset Output dataset with the top scoring docked molecules.
    Type : dataset_out
    Required : True
    Default : Gigadock Warp Hit List
    Python Name : hit_list_dataset
  • FastROCS Query Poses Dataset Output dataset with the queries used by FastROCS. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.
    Type : dataset_out
    Required : True
    Default : Gigadock Warp FastROCS Queries
    Python Name : fastrocs_query_poses_dataset
  • Graphsim Queries Dataset Output dataset with the queries used by graphsim. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.
    Type : dataset_out
    Required : True
    Default : Gigadock Warp Graphsim Queries
    Python Name : graphsim_queries_dataset
  • Output Design Unit(s) Dataset Output dataset containing a copy of the design unit(s) docked to.
    Type : dataset_out
    Required : True
    Default : Gigadock Warp Design Unit
    Python Name : output_design_units_dataset
  • Gigadock Warp Temporary Collection Name of the collection to create
    Type : collection_sink
    Required : True
    Default : Temp Collection
    Python Name : gigadock_warp_temporary_collection

Options

  • Hit List Size Size of the final hit list with the top scoring docked molecules.
    Type : integer
    Required : True
    Default : 10000
    Range : 0 to 100000
    Python Name : hit_list_size
  • Docking Method Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.
    Type : string
    Required : False
    Default : Fred
    Choices :Fred, Hybrid, Fast Fred
    Python Name : docking_method

Options: Hardware

These parameters control the specifications of AWS GPU instance used to train the score model. In general there is no need to change these unless this floe fails with an error that indicates one of these parameters should be set to a value or instructed to by OpenEye support.

  • Training Instance Disk Space Required Disk Space on the machine(s) that will do model training. If this value is set to low for the number of input molecules the floe will fail quick with an error indicating the required number of setting. Higher values may result in longer run times because there will be fewer AWS GPU instance with the required amount of disk space and these may be in short supply on AWS.
    Type : decimal
    Required : True
    Default : 500000
    Python Name : training_instance_disk_space
  • Training Instance Types Instance type for the training model. Note this this must be running a local SSD drive.
    Type : string
    Required : False
    Python Name : training_instance_types

Input Fields

These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.

  • Input Conformers Field Field on the input collection that holds the conformers of the molecules to be docked. If unspecified the default primary molecule field will be used.
    Type : field_parameter::mol
    Required : False
    Python Name : input_conformers_field

Output Fields

These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.

  • Docked Score Field Field on the output hit list and raw results collection that will contain the docked score
    Type : field_parameter::float
    Required : True
    Default : Chemgauss4
    Python Name : docked_score_field
  • Docked Pose Field Field on the output hit list and raw results collection that will hold the docked pose. If unspecified the default primary mol field will be used.
    Type : field_parameter::mol
    Required : False
    Python Name : docked_pose_field
  • Steric Score Field Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : steric_score_field
  • Clash Score Field Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : clash_score_field
  • Protein Desolv Score Field Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : protein_desolv_score_field
  • Ligand Desolv Score Field Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : ligand_desolv_score_field
  • Ligand Desolv HB Score Field Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : ligand_desolv_hb_score_field
  • Hydrogen Bond Score Field Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.
    Type : field_parameter::float
    Required : False
    Python Name : hydrogen_bond_score_field
  • Design Unit Field Field on the ‘Output Design Unit(s) Dataset’ that will contain a copy of the design unit(s).
    Type : field_parameter
    Required : False
    Default : Design Unit
    Python Name : design_unit_field
  • Design Unit ID Field Field on the ‘Output Design Unit(s) Dataset’ with a unique (for this run) identifier of the design unit
    Type : field_parameter::int
    Required : True
    Default : Design Unit ID
    Python Name : design_unit_id_field
  • Design Unit Link Field Field on the ‘Output Design Unit(s) Dataset’ containing a link to the design unit
    Type : field_parameter::link
    Required : True
    Default : Design Unit Link
    Python Name : design_unit_link_field
  • Bemis Murcko Field Output field for the Bemis Murcko core SMILES.
    Type : field_parameter::string
    Required : False
    Default : Bemis Murcko SMILES
    Python Name : bemis_murcko_field
  • Bemis Murcko ID Field Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.
    Type : field_parameter::int
    Required : False
    Default : Bemis Murcko ID
    Python Name : bemis_murcko_id_field
  • Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)
    Type : field_parameter::int
    Required : False
    Default : Bemis Murcko Rank
    Python Name : bemis_murcko_rank_field
  • Hetero Bemis Murcko Field Output field for the Hetero Bemis Murcko core SMILES.
    Type : field_parameter::string
    Required : False
    Default : Hetero Bemis Murcko
    Python Name : hetero_bemis_murcko_field
  • Hetero Bemis Murcko ID Field Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.
    Type : field_parameter::int
    Required : False
    Default : Hetero Bemis Murcko ID
    Python Name : hetero_bemis_murcko_id_field
  • Hetero Bemis Murcko Rank Field Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)
    Type : field_parameter::int
    Required : False
    Default : Hetero Bemis Murcko Rank
    Python Name : hetero_bemis_murcko_rank_field