Run Gigadock on Billions of Molecules

Context

A Gigadock run on billions of molecules typically costs tens of thousands of dollars. This How-to Guide contains the best practices for using the Gigadock Floe.

Note

We recommend you complete the Dock One Million Molecules with the Gigadock Floe tutorial before doing a full Gigadock run.

Procedure

  1. Choose molecules for docking by one of these two methods:

  2. Prepare receptor(s):

    • Prepare the protein structure for docking using the SPRUCE - Protein Preparation Floe. If there are multiple protein structures, prepare them all.

  3. Do a cost estimate for each receptor (see the Estimate the Cost of a Gigadock Run How-to Guide).

  4. Select a single best receptor.

    • If there are molecules known to be active against the target, use the Determine if Gigadock Will Give Good Results with a Given Receptor How-to Guide to assess the performance of each prepared receptor. Chose the receptor with the highest AUC as the single best receptor.

    • If the are no known actives for the target, choose the receptor with the lowest cost estimate.

  5. Launch the Gigadock Floe with the following parameters:

    • Job Properties

      • Output Folder: Creating a dedicated folder for the output is recommended.

      • Job Cost Limits

        • Email me if this job cost exceeds: Set to the cost estimate from Step 3.

        • Terminate this job if the cost exceeds: Set to the cost estimate from Step 3 plus 75%.

      Warning

      It is important to set this value well above the setting of the Cost Threshold ($USD) parameter. A floe terminated by the Terminate this job if the cost exceeds limit will stop almost immediately but will not be restartable and all computation done prior to termination will be lost. A floe shutdown due to passing the Cost Threshold ($USD) will shutdown cleanly and be restartable, but can overrun the threshold by 10-25%.

    • Inputs

      • Receptor Dataset: Select the receptor chosen in the previous step.

      • Input Conformer Collection: This is semi-optional. Select the collection chosen/prepared in Step 1.

    • Options

      • Docking Methods: Set to the same setting used in the cost estimate from Step 3.

      • Cost Threshold ($USD): Set to the cost estimate value from step #3 plus an additional 15%.

  6. The Gigadock Floe will automatically output a 10K hit list. If a larger hit list is desired, follow the Generate a Gigadock Hit List Dataset of More Than 10K Molecules How-to Guide to create one.

  7. Delete the Raw Results collection created by the Gigadock Floe once you are finished with it.

    Note

    The Raw Results collection contains the scores and structure of every docked molecule. Its primary purpose is to allow the creation of hit lists larger than 10K from the Gigadock run. There is a storage cost associated with retaining it (typically ~$100/month), thus deleting it after the desired hit list has been created is highly recommended.

The Gigadock Floe typically completes in 18–36 hours.