# Pose Prediction¶

Docking is the process of determining the structure of a ligand bound in the active site of a target protein. OpenEye‘s Posit method of pose prediction consists of choosing the best method to use when docking a particular ligand to a receptor and then returning the probability that the docked pose is within 2.0A of the actual pose. In the Docking Toolkit this is done with the OEPosit class that takes either a single or multi conformer representation of a ligand and returns the pose with the highest probability of being correctly placed in the active site.

Note

Note that this probability is not the probability of binding, rather, the probability that the pose is correct given the ligand actually binds to the receptor.

The methods Posit uses to dock, in order or reliability, are:

1. MCS Overlay [Bemis-2004]
2. ShapeFit - Shape-guided ligand minimization into the receptor site
3. Hybrid - hybrid method that uses ligand and protein information
4. Fred - Standard docking method that uses no ligand information

Posit automatically chooses the best method based on the 2D (graph) and 3D (structure) similarity of the docked ligand to the bound ligand. As a contrived example, if there is no bound ligand in the receptor, the Fred method is used by default.

By default, in addition to the input conformations, Posit generates new conformations during its search and, in some cases, minimizes these structures with the internal force-field.

Note

The selection of methods can be overridden by constructing an OEPosit with an OEPositOptions object. See the OEPositMethod namespace for a list of available methods.

## A note on stereo - isomerisms and chirality¶

Stereo, and notably nitrogen aniline stereo centers, are currently somewhat problematic for POSIT. Many crystal structures have flat geometries for some stereo centers due to time-averaging during data collection. This makes stereo centers appear to be have flat geometries in the 3D coordinates.

Because the POSIT algorithm internally expands conformations during the flexible fitting procedure the full molecule must be labeled with stereo, either in the 3D coordinate sense or the 2D coordinate sense. This means that some pdb ligand structures will unfortunately be failed by the POSIT algorithm.

To get around this, posit can be told to ignore nitrogen stereo when creating constructing an OEPosit with an OEPositOptions object that had the OEPositOptions::SetIgnoreNitrogenStereo method set to true.

The following are the recommended ways to supply or prepare molecules for POSIT.

1. Use OMEGA or the OMEGA toolkit to prepare molecules or expand the stereo chemistry using OEFlipper. POSIT is guaranteed to work correctly if the input molecules have been generated by OMEGA.
2. Set the OEPositOptions fullConformationalSearch option to be false. See the section OEPositOptions for more details. This method of running POSIT is not recommended as a large portion of the flexible methodology will be deactivated.

Note

Output structures of Posit are not guaranteed to have the same conformations of the input molecule. This is occasionally problematic if the force-field minimizations alter the desired input conformations. This issue will be addressed in a future release of the Docking Toolkit.

## Example¶

Listing 3 is an example program that uses the OEPosit class to dock, score and annotate multiconformer molecules into a single receptor.

Required Parameters

-receptor

Filenames of the receptors to use (can be multiple or specified with a wildcard).

-in

Input file of multiconformer molecules to dock.

-out

Output file for docked molecules.

Listing 3: Example program for posing molecules

/*

TERMS FOR USE OF SAMPLE CODE The software below ("Sample Code") is
provided to current licensees or subscribers of OpenEye products or
SaaS offerings (each a "Customer").
Customer is hereby permitted to use, copy, and modify the Sample Code,
subject to these terms. OpenEye claims no rights to Customer's
modifications. Modification of Sample Code is at Customer's sole and
exclusive risk. Sample Code may require Customer to have a then
current license or subscription to the applicable OpenEye offering.
THE SAMPLE CODE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED.  OPENEYE DISCLAIMS ALL WARRANTIES, INCLUDING, BUT
NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. In no event shall OpenEye be
liable for any damages or liability in connection with the Sample Code
or its use.
*/
#include "openeye.h"
#include "oesystem.h"
#include "oechem.h"
#include "oedocking.h"

using namespace OESystem;
using namespace OEChem;
using namespace OEDocking;
using namespace std;

#include "PoseMolecules.itf"

int main(int argc, char** argv)
{
OEInterface itf(InterfaceData);

if (!OEParseCommandLine(itf,argc,argv))
return 1;

oemolistream imstr(itf.Get<string>("-in"));
oemolostream omstr(itf.Get<string>("-out"));

OEGraphMol receptor;

OEPosit poser;
poser.Initialize(receptor);

OEMol mcmol;
{
OEGraphMol posedMol;
poser.DockMultiConformerMolecule(posedMol, mcmol);

OESetSDScore(posedMol, poser, poser.GetName());
OEWriteMolecule(omstr, posedMol);
}
return 0;
}


## POSIT Algorithm¶

POSIT supplies a robust probability that the given pose is reasonable. It is generally recognized that docking and scoring methods have inaccuracies and do not provide a measure that can be compared between different complexes. For example, a docking score from one complex cannot be directly compared to a docking score form another.

POSIT overcomes this by using 2D, 3D and protein-ligand information to generate a probability that the pose has been correctly placed. This probability is independent of how the ligand was actually placed and hence, can be used to score any prospective pose. Indeed, the ShapeFit method was explicitly designed to maximize this probability.

POSIT probabilities were generated using a large testset containing over 25,000 pose predictions and verified through a smaller number (around 100) of predictions that were then validated with X-Ray crystallography. It is important to note that POSIT does not give a probability of binding, rather it gives a probability that if the ligand does actually bind, what is the likelihood of the POSIT pose being the actual pose.

## ShapeFit Method¶

During a drug discovery campaign, thousands of small molecule inhibitors are made in the course of optimizing molecular properties. For projects that have X-Ray crystallographic (XRC) coordinates, structure-based designs help guide the medicinal chemistry efforts. In many cases XRC provides a detailed picture of the binding of a small-molecule inhibitor into the binding site.

Many techniques exist for pose-prediction and are well documented [Vieth-2004] . However, very few provide a probability that the generated pose is correct, where correct is typically considered to be less than 2.0 Ångströms RMSD (root mean square distance) from experimental crystal structure. In fact, many docking scores such as Chemscore, Chemgauss3, PLP [Martinelli-2010] are not very correlated with correct ligand pose, and worse, are not transferable between systems: the best docking score in one system may not even be close to the best docking score in another.

Two definitions will be used during the remainder of this discussion:

• Bound-Ligand This is a known, experimentally derived bound ligand from the same protein context in which SHAPEFIT is attempting to find poses of ligands.
• Fit-Ligand This is the unknown ligand that is being pose-predicted.

### Using the Bound Ligand¶

SHAPEFIT overcomes these issues by comparing predicted poses to observed bound ligands in related co-crystals. As the observed ligand becomes more similar to the the predicted pose, both the binding mode and, indeed, the shape of the receptor pocket itself tends to become more similar.

The similarity measures being used are 2D path-based fingerprints and the 3D TanimotoCombo [Hawkins-2010] that compares shape and the Mills-Dean approximation of electrostatics [MillsDean-1996] . These similarity measures choose the most appropriate system to dock against and provide a prediction of the quality of the result.

The TanimotoCombo measure is agnostic of how the poses in question are generated; it can be used to validate and provide a pose prediction probability regardless of how the pose was generated. In practice, SHAPEFIT exploits the predictive capabilities of the TanimotoCombo measure by using it in an optimization function that drives a flexible fitting routine.

Essentially, during pose optimization, SHAPEFIT attempts to force a predicted pose into the binding mode of a known ligand. If the induced strain becomes too large, the optimization stops. This final pose is used to predict the overall quality of fit. In this fashion, SHAPEFIT is able to rescue 10-20% of the original rigidly overlaid poses and place them within 2.0 Ångströms RMSD of the experimental crystal structure.

Typically during docking, only the protein structure is used to model unknown structures. Given a molecule that is known to bind, SHAPEFIT searches through XRC coordinates of known ligand-protein complexes, determines the complex best able to predict the pose of the molecule and then generates both a pose and the probability that the pose is correct, usually in well under a minute per ligand.

SHAPEFIT‘s basic algorithm:

1. Given a set of potential complexes, SHAPEFIT chooses the appropriate complex based on the 2D or 3D similarity to the bound ligand. The best complex, in general, has the highest 2D or 3D similarity of the input molecule to the chosen complex’s bound ligand.

2. After the complex is chosen, a flexible fit is performed that attempts to match the binding mode of the bound ligand using an adiabatic optimization method [Wlodek-2006] . This optimization method is known as the SHAPEFIT potential.

The term adiabatic comes from the Greek “impassable”, and in this case SHAPEFIT sets up a chemical strain boundary that the optimization cannot broach.

SHAPEFIT seeds the flexible fit by expanding the poses generated by the original 3D similarity as described in (1) and then applying the shape constraint of the bound ligand.

As shown in figure *SHAPEFIT* Optimization, SHAPEFIT works by first using the known bound ligand to position the input molecule and follows up by using the bound ligand as a shape constraint during MMFF optimization [Halgren-I-1996] [Halgren-II-1996] [Halgren-III-1996] [Halgren-IV-1996] [Halgren-V-1996] [Halgren-VI-1999] [Halgren-VII-1999] . While the input molecule is being forced into the shape constraint, MMFF strain is monitored to form the adiabatic boundary. When the strain becomes too large, the optimization is reversed or stops altogether.

3. The interactions from the bound ligand are then used as a further constraint during ligand-protein optimization. This helps to remove clashes with the protein and provide better interactions between un-constrained ligand atoms.

This is a long winded way of saying that SHAPEFIT‘s optimization attempts to force the molecule into the known binding mode without creating undue strain on the molecule being placed into the protein.

SHAPEFIT Optimization: Starting from an initial alignment, use the shape constraint of the bound-ligand to drive a flexible fit while simultaneously limiting strain

As shown in figure SHAPEFIT Cross Docking Results, analyzing the Kinase data set used in Tuccinardi et al, [Martinelli-2010] pose-prediction using SHAPEFIT is seen to perform remarkably better for similar ligands than standard docking techniques at higher TanimotoCombo values:

SHAPEFIT Cross Docking Results: Probability of finding a good pose based on bound-ligand fit-ligand TanimotoCombo similarity. Standard docking results are essentially the same and follow the same trajectories flattening out as they hit their limit of accuracy. While SHAPEFIT performs worse at low similarities it continually increases as similarity increases.

While SHAPEFIT is not a good technique for determining the pose between known bound-ligands and fit ligands with low similarity, as the similarity increases, the probability of determining the correct pose increases rapidly. The similarity where this crosses over is remarkably small, only around 0.9 TanimotoCombo. This is most likely because as the similarity increases, the active site similarity also increases. Below 0.9 TanimotoCombo POSIT reverts to the Hybrid method.

### On Clashes¶

Unlike the FRED and HYBRID methods, SHAPEFIT is heavily biased towards the known bound ligand. In some cases this causes the pose to clash with the protein. This is especially true if the original bound ligand already clashes with the protein.

Currently, we do not filter out clashing poses although options and functionality for controlling this with SHAPEFIT will be investigated in future releases.

#### Setup¶

Typically, POSIT doesn’t require explicit setup. Simply create a new OEPosit object. For details about the customization available to OEPosit see the OEPositOptions documentation.

OEPosit poser;


#### Initialization¶

An OEPosit object must be initialized using a receptor prior to posing or scoring molecules. This is done by passing a receptor to the OEPosit::Initialize method, as shown in the following code snippet from Listing 3.

poser.Initialize(receptor);


#### Posing Molecules¶

Once the OEPosit object has been initialized, molecules are docked using the OEPosit::DockMultiConformerMolecule method, as shown in this code snippet from Listing 3.

poser.DockMultiConformerMolecule(posedMol, mcmol);


mcmol is a multiconformer representation of the molecule being docked, and posedMol is an OEMolBase the resulting top scoring docked pose is returned in. The probability of the docked molecule can be obtained by calling the OEMolBase::GetEnergy method of pose.

In this example the best scoring pose as returned into posedMol.

Note

DockMultiConformerMolecule may return fewer than the requested number of poses.

Listing 3 uses the convenience function OESetSDScore to assign the scores to SD data on the molecules. This function simply assigns the result of ‘OEPosit::ScoreLigand’ to the SD data with tag sdtag.

OESetSDScore(posedMol, poser, poser.GetName());


# Using Multiple Receptors¶

Using multiple receptors to dock or pose molecules has been shown to greatly increase the reliability of docking.

The Docking Toolkit provides a convenience function that helps identify the receptor most likely to accept the proposed ligand. The method used is similar to the POSIT Algorithm in that it assigns a probability that, if the ligand binds to the protein, that the ligand adopts the correct binding mode.

Note

If the receptor in question does not have a bound ligand, the probability currently returned is zero.

Note

The following examples show OEPosit but can trivially be modified to use OEHybrid

## Posing To Multiple Receptors¶

The first examples, Listing 4, shows how to pose a ligand against all receptors.

Listing 4: Example program for posing molecules with multiple receptors

/*

TERMS FOR USE OF SAMPLE CODE The software below ("Sample Code") is
provided to current licensees or subscribers of OpenEye products or
SaaS offerings (each a "Customer").
Customer is hereby permitted to use, copy, and modify the Sample Code,
subject to these terms. OpenEye claims no rights to Customer's
modifications. Modification of Sample Code is at Customer's sole and
exclusive risk. Sample Code may require Customer to have a then
current license or subscription to the applicable OpenEye offering.
THE SAMPLE CODE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED.  OPENEYE DISCLAIMS ALL WARRANTIES, INCLUDING, BUT
NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. In no event shall OpenEye be
liable for any damages or liability in connection with the Sample Code
or its use.
*/
#include "openeye.h"
#include "oesystem.h"
#include "oechem.h"
#include "oedocking.h"

using namespace OESystem;
using namespace OEChem;
using namespace OEDocking;
using namespace std;

#include "PoseMolsWithMultiReceptors.itf"

int main(int argc, char** argv)
{
OEInterface itf(InterfaceData);

if (!OEParseCommandLine(itf,argc,argv))
return 1;

vector<OEGraphMol> receptors;
for(OEIter<const string> receptor_filename = itf.GetList<string>("-receptors");
receptor_filename;
++receptor_filename)
{
OEGraphMol receptor;
receptors.push_back(receptor);
}

OEPosit poser;

// Loop through all the receptors and pose molecules
for (size_t receptor_idx = 0;
receptor_idx < receptors.size();
++receptor_idx)
{
// write out a new output file for each receptor
//  <prefix>_#.oeb.gz
oemolostream omstr(itf.Get<string>("-prefix") +
"_" + OENumberToString(receptor_idx) + ".oeb");

poser.Initialize(receptors[receptor_idx]);

OEMol mcmol;
oemolistream imstr(itf.Get<string>("-in"));
{
OEGraphMol posedMol;
poser.DockMultiConformerMolecule(posedMol, mcmol);

OESetSDScore(posedMol, poser, poser.GetName());
OEWriteMolecule(omstr, posedMol);
}
}
return 0;
}


One of the difficulties of docking to multiple receptors, is that they can consume quite a large amount of memory so holding multiple docking objects at the same time can be problematic.

The solution shown in the examples starts with the first receptor, poses all input ligands to it, outputs them and then moves on to the second receptor. Each receptor outputs to a different file in this example.

First, it is necessary to read the receptors into a list or a vector.

vector<OEGraphMol> receptors;
for(OEIter<const string> receptor_filename = itf.GetList<string>("-receptors");
receptor_filename;
++receptor_filename)
{
OEGraphMol receptor;
receptors.push_back(receptor);
}


Then it is a simple matter of looping through the receptors and posing all ligands.

// Loop through all the receptors and pose molecules
for (size_t receptor_idx = 0;
receptor_idx < receptors.size();
++receptor_idx)
{
// write out a new output file for each receptor
//  <prefix>_#.oeb.gz
oemolostream omstr(itf.Get<string>("-prefix") +
"_" + OENumberToString(receptor_idx) + ".oeb");

poser.Initialize(receptors[receptor_idx]);

OEMol mcmol;
oemolistream imstr(itf.Get<string>("-in"));
{
OEGraphMol posedMol;
poser.DockMultiConformerMolecule(posedMol, mcmol);

OESetSDScore(posedMol, poser, poser.GetName());
OEWriteMolecule(omstr, posedMol);
}
}


## Posing To The Best Receptor¶

In the second example, Listing 5, a helper function that takes a collection of receptors and returns the best one for the current ligand is used to pose to a single receptor.

Listing 5: Example program for posing molecules with best receptor

/*

TERMS FOR USE OF SAMPLE CODE The software below ("Sample Code") is
provided to current licensees or subscribers of OpenEye products or
SaaS offerings (each a "Customer").
Customer is hereby permitted to use, copy, and modify the Sample Code,
subject to these terms. OpenEye claims no rights to Customer's
modifications. Modification of Sample Code is at Customer's sole and
exclusive risk. Sample Code may require Customer to have a then
current license or subscription to the applicable OpenEye offering.
THE SAMPLE CODE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED.  OPENEYE DISCLAIMS ALL WARRANTIES, INCLUDING, BUT
NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. In no event shall OpenEye be
liable for any damages or liability in connection with the Sample Code
or its use.
*/
#include "openeye.h"
#include "oesystem.h"
#include "oechem.h"
#include "oedocking.h"

using namespace OESystem;
using namespace OEChem;
using namespace OEDocking;
using namespace std;

#include "PoseMolsWithBestReceptor.itf"

// Return the index of the best receptor for posing ligand
unsigned int BestReceptorIndex(const OEMCMolBase &ligand,
const std::vector<OEGraphMol> &receptors)
{
size_t best_idx = receptors.max_size();
double best_probability = 0.0;
for (size_t receptor_idx = 0; receptor_idx < receptors.size(); ++receptor_idx)
{
double probability = OEGetEstimatedPositProbability(
ligand,
receptors[receptor_idx]);
if (probability > best_probability)
{
best_probability = probability;
best_idx = receptor_idx;
}
}

return (unsigned int)best_idx;
}

int main(int argc, char** argv)
{
OEInterface itf(InterfaceData);

if (!OEParseCommandLine(itf,argc,argv))
return 1;

vector<OEGraphMol> receptors;
for(OEIter<const string> receptor_filename = itf.GetList<string>("-receptors");
receptor_filename;
++receptor_filename)
{
OEGraphMol receptor;
receptors.push_back(receptor);
}

OEPosit poser;
const std::string tag = poser.GetName();

// identify the best receptor for each molecule
OEMol mcmol;
vector<size_t> bestReceptorIdx;
oemolistream imstr(itf.Get<string>("-in"));
bestReceptorIdx.push_back(BestReceptorIndex(mcmol, receptors));
imstr.close();

// Loop through all the receptors and dock molecules
for (size_t receptor_idx = 0;
receptor_idx < receptors.size();
++receptor_idx)
{
// write out a new output file for each receptor
//  <prefix>_#.oeb.gz
oemolostream omstr(itf.Get<string>("-prefix") +
"_" + OENumberToString(receptor_idx) + ".oeb");

poser.Initialize(receptors[receptor_idx]);

size_t ligno = 0u;
oemolistream ifs(itf.Get<string>("-in"));

{
// Only pose if we are the best receptor
if (receptor_idx == bestReceptorIdx[ligno])
{
OEThrow.Info("Posing ligand %d to receptor %d", ligno, receptor_idx);
OEGraphMol posedMol;
poser.DockMultiConformerMolecule(posedMol, mcmol);

OESetSDScore(posedMol, poser, poser.GetName());
OEWriteMolecule(omstr, posedMol);
}
ligno++;
}
}
return 0;
}


The helper function is a simple loop through a list or vector:

// Return the index of the best receptor for posing ligand
unsigned int BestReceptorIndex(const OEMCMolBase &ligand,
const std::vector<OEGraphMol> &receptors)
{
size_t best_idx = receptors.max_size();
double best_probability = 0.0;
for (size_t receptor_idx = 0; receptor_idx < receptors.size(); ++receptor_idx)
{
double probability = OEGetEstimatedPositProbability(
ligand,
receptors[receptor_idx]);
if (probability > best_probability)
{
best_probability = probability;
best_idx = receptor_idx;
}
}

return (unsigned int)best_idx;
}


Once this function exists, we can use it to check if the current receptor in the receptor loop is the appropriate one to pose against:

while (OEReadMolecule(ifs, mcmol))
{
// Only pose if we are the best receptor
if (receptor_idx == bestReceptorIdx[ligno])
{
OEThrow.Info("Posing ligand %d to receptor %d", ligno, receptor_idx);
OEGraphMol posedMol;
poser.DockMultiConformerMolecule(posedMol, mcmol);

OESetSDScore(posedMol, poser, poser.GetName());
OEWriteMolecule(omstr, posedMol);
}
ligno++;
}


Remember that these examples can also be used with OEHybrid or OEDock.