During a drug discovery campaign, thousands of small molecule inhibitors are made in the course of optimizing molecular properties. For projects that have X-Ray crystallographic (XRC) coordinates, structure-based designs help guide the medicinal chemistry efforts. In many cases XRC provides a detailed picture of the binding of a small-molecule inhibitor into the binding site.
Many techniques exist for pose-prediction and are well documented [Vieth-2004] . However, very few provide a probability that the generated pose is correct, where correct is typically considered to be less than 2.0 Ångströms RMSD (root mean square distance) from experimental crystal structure. In fact, many docking scores such as Chemscore, Chemgauss3, PLP [Martinelli-2010] are not very correlated with correct ligand pose, and worse, are not transferable between systems: the best docking score in one system may not even be close to the best docking score in another.
Two definitions will be used during the remainder of this discussion:
- Bound-Ligand This is a known, experimentally derived bound ligand from the same protein context in which to find poses for ligands.
- Fit-Ligand This is the unknown ligand that is being pose-predicted.
OEDocking’s POSIT methodology overcomes these issues by comparing predicted poses to observed bound ligands in related co-crystals. As the observed ligand becomes more similar to the predicted pose, both the binding mode and, indeed, the shape of the receptor pocket itself tends to become more similar. This is shown in figure: Active Sites.
The similarity measures being used are 2D path-based fingerprints and the 3D TanimotoCombo [Hawkins-2010] that compares shape and the Mills-Dean approximation of electrostatics [MillsDean-1996] . These similarity measures choose the most appropriate system to dock against and provide a prediction of the quality of the result. A full description of the Tanimoto measures are given in the Theory section.
The TanimotoCombo measure is agnostic of how the poses in question are generated; it can be used to validate and provide a pose prediction probability regardless of how the pose was generated. In practice, POSIT exploits the predictive capabilities of the TanimotoCombo measure by using it in an optimization function that drives a flexible fitting routine.
The POSIT method decides, for a given target protein, whether to optimize a structure into the bound ligand or whether to use the bound ligand as a template for rigid searching during a standard docking run. In general, similar ligands are served by optimizing into the known bound ligand, but dissimilar ligands are not.
For similar ligands, during pose optimization, POSIT attempts to force a predicted pose into the binding mode of a known ligand. If the induced strain becomes too large, the optimization stops. This final pose is used to predict the overall quality of fit. In this fashion, POSIT is able to rescue 10-20% of the original rigidly overlaid poses and place them within 2.0 Ångströms RMSD of the experimental crystal structure.
For dissimilar ligands, POSIT employs a standard rigid docking approach using the ChemGauss4 scoring function, but constraining the search space to the known bound ligand and it’s interactions.
Typically during docking, only the protein structure is used to model unknown structures. Given a molecule that is known to bind, POSIT searches through XRC coordinates of known ligand-protein complexes, determines the complex best able to predict the pose of the molecule and then generates both a pose and the probability that the pose is correct, usually in well under a minute per ligand.
POSIT’s basic algorithm:
Given a set of potential complexes, POSIT chooses the appropriate complex based on the 2D or 3D similarity to the bound ligand. The best complex, in general, has the highest 2D or 3D similarity of the input molecule to the chosen complex’s bound ligand.
POSIT can employ multiple methods to generate the initial overlays including an TanimotoCombo and MCS (Maximum Common Substructure) overlays. Both are done by default.
After the complex is chosen, the TanimotoCombo similarity to the known bound ligand is analyzed, and either a flexible fit is performed (for similar molecules) or a standard rigid docking is performed (for dissimilar molecules.)
The flexible optimization (ShapeFit) attempts to match the binding mode of the bound ligand using an adiabatic optimization method [Wlodek-2006] . This optimization method is known as the ShapeFit potential.
The term adiabatic comes from the Greek “impassable”, and in this case ShapeFit sets up a chemical strain boundary that the optimization cannot broach.
ShapeFit seeds the flexible fit by expanding the poses generated by the original 3D similarity as described in (1) and then applying the shape constraint of the bound ligand.
As shown in figure *ShapeFit* Optimization, ShapeFit works by first using the known bound ligand to position the input molecule and follows up by using the bound ligand as a shape constraint during MMFF optimization [Halgren-I-1996] [Halgren-II-1996] [Halgren-III-1996] [Halgren-IV-1996] [Halgren-V-1996] [Halgren-VI-1999] [Halgren-VII-1999] . While the input molecule is being forced into the shape constraint, MMFF strain is monitored to form the adiabatic boundary. When the strain becomes too large, the optimization is reversed or stops altogether.
The interactions from the bound ligand to the protein are then used as a further constraint during ligand-protein optimization. This helps to remove clashes with the protein and provide better interactions between un-constrained ligand atoms.
For dissimilar ligands, a rigid docking is performed using the HYBRID methodology (if a bound ligand exists) or the FRED methodology (if there is no bound ligand).
Finally, POSIT supplies a robust probability that the given pose is reasonable. It is generally recognized that docking and scoring methods have inaccuracies and do not provide a measure that can be compared between different complexes. For example, a docking score from one complex cannot be directly compared to a docking score form another.
POSIT probabilities were generated using a large test set containing over 25,000 pose predictions and verified through a smaller number (around 100) of predictions that were then validated with X-Ray crystallography. It is important to note that POSIT does not give a probability of binding, rather it gives a probability that if the ligand does actually bind, what is the likelihood of the POSIT pose being the actual pose.
This is a long winded way of saying that when POSIT optimizes a structure, POSIT attempts to force the molecule into the known binding mode without creating undue strain on the molecule being placed into the protein. For more detail, please see the Theory section.