Run Giga Docking on Billions of Molecules

Context

A giga-docking run on billions of molecules typically costs tens of thousands of dollars. This howto contains the best practices for doing a giga docking run.

Note

We recommend you complete the Dock One Million Molecules with Gigadock Floe tutorial before doing a full Giga Docking run.

Procedure

  1. Choose molecule for docking. Either

  2. Prepare receptor(s)

    Prepare the protein structure for docking using spruce (see the first part of the Dock One Million Molecules with Gigadock Floe tutorial). If there are multiple protein structures prepare them all.

  3. Do a cost estimate for each receptor (see the Estimate the cost of a Giga Docking Run howto).

  4. Select a single best receptor.

    • If there are molecule known to be active against the target use the Determine if Giga Docking Will Give Good Results with a Given Receptor howto to asses the performance of each prepared receptor. Chose the receptor with the highest AUC as the single best receptor.

    • If the are no known actives for the target choose the receptor with the lowest cost estimate.

  5. Launch the Gigadock floe with the following parameters

    • Job Properties

      • Output Folder : <creating a dedicated folder for the output is recommended>

      • Job Cost Limits

        • Email me if this job cost exceeds : Set to the cost estimate from step #3.

        • Terminate this job if the cost exceeds : Set to the cost estimate from step #3 plus 75%.

        Warning

        It is important to set this value well above the setting of the Cost Threshold ($USD) parameter. A floe terminated by the Terminate this job if the cost exceeds limit will stop almost immediately but will not be restartable and all computation done prior to termination will be lost. A floe shutdown due to passing the Cost Threshold ($USD) will shutdown cleanly and be restartable, but can overrun the threshold by 10-25%.

    • Promoted Parameters

      • Inputs

        • Receptor Dataset

          Select the receptor chosen in the previous step.

        • Input Conformer Collection (Semi-Optional)

          Select the collection chosen/prepared in step #1

      • Options

        • Docking Methods Set to the same setting used in the cost estimate from step #3.

        • Cost Threshold ($USD) Set to the cost estimate value from step #3 plus an additional 15%.

  6. Once the Gigadock job completes if a hit list larger than the 10K hit list the floe automatically outputs is desired follow the Generate a Giga Docking Hit List Datast of More Than 10K Molecule howto to create a larger hit list.

  7. Delete the ‘Raw Results’ collection created by the Gigadock floe once you are done with it.

    Note

    The Raw Results collection contains the scores and structure of every docked molecule. Its primary use it to allow the creation of larger hitlists than 10K from the Giga Docking run. There is a storage associated with retaining it (typically ~$100/month) which is why deleting it after the desired hitlist has been created is recommended.

Gigadock run typically complete in 18-36hours.