Gigadock Warp

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Gigadock

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/DB Search/Gigadock

  • Task-based/Virtual Screening - Structure-Based

Description

Approximates a full Gigadock run with a mixture of FastROCS and docking.

  1. Docks a random subset of molecules

  2. Runs FastROCS on all input molecules using top scoring poses from the previous step as queries

  3. Creates a feature vector for each molecule with the FastROCS Shape and Color tanimotos from the prior step, the bits of a 4K Tree fingerprint and several basic 2D properties.

  4. Create a regression model of the score based on the molecule docked in the first step and the feature vector. The model will be a neural net model if the number of molecules being docked is greater than 100M and a linear if the number of molecule being docked is less than 100M and greater than 1M (the floe cannot dock fewer than 1M molecules, using Gigadock floe in these cases).

  5. Predict the score of the un-docked molecules with the regression model.

  6. Dock the molecules the regression model predicts to have the best scores.

  7. Output Hit List of top scoring docked molecules.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Design Unit or Receptor Dataset(s) (init_input_dataset): Dataset with the design unit (DU) (or old format receptor) to dock to. Multiple design units are allowed up to a limit of 10 for the Hybrid dock method (see ‘Docking Method’ parameter) and 2 otherwise. The behavior with multiple design units depends on the docking method. For ‘Fred’ or ‘FastFred’ each molecule will be docked to each design unit and the results from the best scoring design unit will be outputted, thus docking time (and cost) will scale roughly linearly with the number of design units. For ‘Hybrid’ each molecule will be docked only into to the design unit with the crystallographic bound ligand most similar (by ROCS Combo Tanimoto) to the molecule being docked, and docking time (and cost) will increase roughly by roughly 5% per addition design unit.

  • Required

  • Type: data_source

Input Conformer Collection (input_conformer_collection): Input collection containing molecules to dock. The collection should have been created by the ‘Prepare Giga Collections’ floe. Several large pre-generated 3rd party vendor docking collections can be made available in to your organization upon request at no charge by e-mailing support@eyesopen.com (if your organization has already requested them you will already have to these pre-generated collections). The collection will be located in the ‘Organization Data->OpenEye Data->Gigadocking Collections’ folder which also automatically contains several smaller collections and collections containing random subsets of the larger vendor collections.

  • Required

  • Type: collection_source

Outputs

Hit List Dataset (hit_list_dataset): Output dataset with the top scoring docked molecules.

  • Required

  • Type: dataset_out

  • Default: Gigadock Warp Hit List

FastROCS Query Poses Dataset (fastrocs_query_poses_dataset): Output dataset with the queries used by FastROCS. The queries are the cluster heads of the top scoring poses from the initial docking of a random subset of molecule from the input collection.

  • Required

  • Type: dataset_out

  • Default: Gigadock Warp FastROCS Queries

Output Design Unit(s) Dataset (output_design_units_dataset): Output dataset containing a copy of the design unit(s) docked to.

  • Required

  • Type: dataset_out

  • Default: Gigadock Warp Design Unit

Gigadock Warp Temporary Collection (gigadock_warp_temporary_collection): Name of the collection to create

  • Required

  • Type: collection_sink

  • Default: Temp Collection

Options

Hit List Size (hit_list_size): Size of the final hit list with the top scoring docked molecules.

  • Required

  • Type: integer

  • Default: 10000

Docking Method (docking_method): Docking method to use. ‘Fred’ is the default structure based scoring method. ‘Hybrid’ biases the the docking towards poses that overlay the crystallographic ligand (the design unit(s) must have a bound ligand). ‘FastFred’ is a faster variant of ‘Fred’ (typically ~2x faster for single design units) that samples less and uses a simpler scoring function in the initial stages of docking.

  • Type: string

  • Default: Fred

  • Choices: [‘Fred’, ‘Hybrid’, ‘Fast Fred’]

Options: Model Training

Fraction Train (fraction_train): The fraction of the input molecules that will be docked to create the training data for the score regression model. Increasing this value will increase the cost of the floe and decrease the minimum number of input molecules the floe requires to run (at the default value of 0.01 the minimum is 934600). Legal values for this parameter are between 0.1 and 0.001.

  • Required

  • Type: decimal

  • Default: 0.01

Final Dock Fraction (final_dock_fraction): The number of top scoring molecules from FastROCS that are passed to the final docking step is equal to this fraction of the size of the input collection(s). Increasing this value will increase the cost of the floe. When docking fewer than ~100M molecules it is recommended that this value be test to 0.08. The legal values for this parameter are between 0.01 and 0.1.

  • Required

  • Type: decimal

  • Default: 0.04

Options: Hardware

Training Instance Disk Space (training_instance_disk_space): Required Disk Space on the machine(s) that will do model training. If this value is set to low for the number of input molecules the floe will fail quick with an error indicating the required number of setting. Higher values may result in longer run times because there will be fewer AWS GPU instance with the required amount of disk space and these may be in short supply on AWS. The total required disk space can be reduced by reducing the fraction of molecules that will be used as training data (see the ‘Options: Model Training -> Fraction Train’ parameter).

  • Type: decimal

  • Default: 3355443.2

FastROCS Instance Types (fastrocs_instance_types): Instance type used by FastROCS. If unspecified an instance type will be chosen automatically

  • Type: string

  • Default: !cdns,!g4dn.metal,!g5.12xlarge,!g5.24xlarge,!g5.48xlarge,!g4dn.12xlarge,!g3s.,!p3.

FastROCS Spot Policy (fastrocs_spot_policy): Control whether spot or non-spot instances will be used for FastROCS cubes. In general spot instances are cheaper than non-spot instances and using them will reduce the cost of the floe, however spot instances can be in short supply and thus using them may increase the run time of the floe. The settings of this parameter have the following meaning. Allowed: Use both spot and non spot instances. Required: Only spot instances will be used. Preferred: Floe will preferentially use spot instances, but non-spot will be used if spot instances are in short supply. NotPreferred: Floe will preferentially use non-spot instances, but spot instances will be used if non-spot instances are in short supply. Prohibited: Only non-spot instances will be used.

  • Type: string

  • Default: Preferred

  • Choices: [‘Allowed’, ‘Preferred’, ‘NotPreferred’, ‘Prohibited’, ‘Required’]

Input Fields

Input Conformers Field (input_conformers_field): Field on the input collection that holds the conformers of the molecules to be docked. If unspecified the default primary molecule field will be used.

  • Type: field_parameter::mol

Output Fields

Docked Score Field (docked_score_field): Field on the output hit list and raw results collection that will contain the docked score

  • Type: field_parameter::float

  • Default: Chemgauss4

Docked Pose Field (docked_pose_field): Field on the output hit list and raw results collection that will hold the docked pose. If unspecified the default primary mol field will be used.

  • Type: field_parameter::mol

Steric Score Field (steric_score_field): Output field with the steric score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Clash Score Field (clash_score_field): Output field with the clash score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Protein Desolv Score Field (protein_desolv_score_field): Output field with the protein desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv Score Field (ligand_desolv_score_field): Output field with the ligand desolvation score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Ligand Desolv HB Score Field (ligand_desolv_hb_score_field): Output field with the ligand desolvation hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Hydrogen Bond Score Field (hydrogen_bond_score_field): Output field with the hydrogen bond score component of the docked molecule. This field will only be created on the output records if this parameter is specified.

  • Type: field_parameter::float

Design Unit Field (design_unit_field): Field on the ‘Output Design Unit(s) Dataset’ that will contain a copy of the design unit(s).

  • Type: field_parameter

  • Default: Design Unit

Design Unit ID Field (design_unit_id_field): Field on the ‘Output Design Unit(s) Dataset’ with a unique (for this run) identifier of the design unit

  • Type: field_parameter::int

  • Default: Design Unit ID

Design Unit Link Field (design_unit_link_field): Field on the ‘Output Design Unit(s) Dataset’ containing a link to the design unit

  • Type: field_parameter::link

  • Default: Design Unit Link

Bemis Murcko Field (bemis_murcko_field): Output field for the Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Bemis Murcko SMILES

Bemis Murcko ID Field (bemis_murcko_id_field): Output Field with an integer ID of the Bemis Murcko core. All molecules with the same Bemis Murcko core SMILES will have the same ID, and those with different Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Bemis Murcko ID

Bemis Murcko Rank Field (bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Bemis Murcko Rank

Hetero Bemis Murcko Field (hetero_bemis_murcko_field): Output field for the Hetero Bemis Murcko core SMILES.

  • Type: field_parameter::string

  • Default: Hetero Bemis Murcko

Hetero Bemis Murcko ID Field (hetero_bemis_murcko_id_field): Output Field with an integer ID of the Hetero Bemis Murcko core. All molecules with the same Hetero Bemis Murcko core SMILES will have the same ID, and those with different Hetero Bemis Murcko core SMILES will have different IDs. The IDs starts at 1 and increments by 1 each time a new Hetero Bemis Murcko core is seen. Thus this integer ID identifier depends on the order the records are passed unlike the Hetero Bemis Murcko core SMILES itself.

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko ID

Hetero Bemis Murcko Rank Field (hetero_bemis_murcko_rank_field): Integer Field with the rank of the molecule within its Hetero Bemis Murcko family (i.e., the rank the molecule would have if the if the hit list contained only the molecules with the same Hetero Bemis Murcko core SMILES)

  • Type: field_parameter::int

  • Default: Hetero Bemis Murcko Rank