Run Giga Docking on Billions of Molecules
Context
A giga-docking run on billions of molecules typically costs tens of thousands of dollars. This howto contains the best practices for doing a giga docking run.
Note
We recommend you complete the Dock One Million Molecules with Gigadock Floe tutorial before doing a full Giga Docking run.
Procedure
Choose molecule for docking. Either
Prepare the collection of molecules to be screened (see the Prepare Vendor Database for Giga Docking and FastROCS tutorial)
Chose one of the vendor molecule collections OpenEye has already prepared (see the Get Access To Full Vendor Giga Docking Collections Prepared by OpenEye howto).
Prepare receptor(s)
Prepare the protein structure for docking using spruce (see the first part of the Dock One Million Molecules with Gigadock Floe tutorial). If there are multiple protein structures prepare them all.
Do a cost estimate for each receptor (see the Estimate the cost of a Giga Docking Run howto).
Select a single best receptor.
If there are molecule known to be active against the target use the Determine if Giga Docking Will Give Good Results with a Given Receptor howto to asses the performance of each prepared receptor. Chose the receptor with the highest AUC as the single best receptor.
If the are no known actives for the target choose the receptor with the lowest cost estimate.
Launch the Gigadock floe with the following parameters
Job Properties
Output Folder : <creating a dedicated folder for the output is recommended>
Job Cost Limits
Email me if this job cost exceeds : Set to the cost estimate from step #3.
Terminate this job if the cost exceeds : Set to the cost estimate from step #3 plus 75%.
Warning
It is important to set this value well above the setting of the Cost Threshold ($USD) parameter. A floe terminated by the Terminate this job if the cost exceeds limit will stop almost immediately but will not be restartable and all computation done prior to termination will be lost. A floe shutdown due to passing the Cost Threshold ($USD) will shutdown cleanly and be restartable, but can overrun the threshold by 10-25%.
Promoted Parameters
Inputs
Receptor Dataset
Select the receptor chosen in the previous step.
Input Conformer Collection (Semi-Optional)
Select the collection chosen/prepared in step #1
Options
Docking Methods Set to the same setting used in the cost estimate from step #3.
Cost Threshold ($USD) Set to the cost estimate value from step #3 plus an additional 15%.
Once the Gigadock job completes if a hit list larger than the 10K hit list the floe automatically outputs is desired follow the Generate a Giga Docking Hit List Datast of More Than 10K Molecule howto to create a larger hit list.
Delete the ‘Raw Results’ collection created by the Gigadock floe once you are done with it.
Note
The Raw Results collection contains the scores and structure of every docked molecule. Its primary use it to allow the creation of larger hitlists than 10K from the Giga Docking run. There is a storage associated with retaining it (typically ~$100/month) which is why deleting it after the desired hitlist has been created is recommended.
Gigadock run typically complete in 18-36hours.