Pose Prediction

Docking is the process of determining the structure of a ligand bound in the active site of a target protein. OpenEye‘s Posit method of pose prediction consists of choosing the best method to use when docking a particular ligand to a receptor and then returning the probability that the docked pose is within 2.0A of the actual pose. In the Docking Toolkit this is done with the OEPosit class that takes either a single or multi conformer representation of a ligand and returns the pose with the highest probability of being correctly placed in the active site.

Note

Note that this probability is not the probability of binding, rather, the probability that the pose is correct given the ligand actually binds to the receptor.

The methods Posit uses to dock, in order or reliability, are:

  1. MCS Overlay [Bemis-2004]
  2. ShapeFit - Shape-guided ligand minimization into the receptor site
  3. Hybrid - hybrid method that uses ligand and protein information
  4. Fred - Standard docking method that uses no ligand information

Posit automatically chooses the best method based on the 2D (graph) and 3D (structure) similarity of the docked ligand to the bound ligand. As a contrived example, if there is no bound ligand in the receptor, the Fred method is used by default.

By default, in addition to the input conformations, Posit generates new conformations during its search and, in some cases, minimizes these structures with the internal force-field.

Note

The selection of methods can be overridden by constructing an OEPosit with an OEPositOptions object. See the OEPositMethod namespace for a list of available methods.

A note on stereo - isomerisms and chirality

Stereo, and notably nitrogen aniline stereo centers, are currently somewhat problematic for POSIT. Many crystal structures have flat geometries for some stereo centers due to time-averaging during data collection. This makes stereo centers appear to be have flat geometries in the 3D coordinates.

Because the POSIT algorithm internally expands conformations during the flexible fitting procedure the full molecule must be labeled with stereo, either in the 3D coordinate sense or the 2D coordinate sense. This means that some pdb ligand structures will unfortunately be failed by the POSIT algorithm.

To get around this, posit can be told to ignore nitrogen stereo when creating constructing an OEPosit with an OEPositOptions object that had the OEPositOptions.SetIgnoreNitrogenStereo method set to true.

The following are the recommended ways to supply or prepare molecules for POSIT.

  1. Use OMEGA or the OMEGA toolkit to prepare molecules or expand the stereo chemistry using OEFlipper. POSIT is guaranteed to work correctly if the input molecules have been generated by OMEGA.
  2. Set the OEPositOptions fullConformationalSearch option to be false. See the section OEPositOptions for more details. This method of running POSIT is not recommended as a large portion of the flexible methodology will be deactivated.

Note

Output structures of Posit are not guaranteed to have the same conformations of the input molecule. This is occasionally problematic if the force-field minimizations alter the desired input conformations. This issue will be addressed in a future release of the Docking Toolkit.

Example

Listing 3 is an example program that uses the OEPosit class to dock, score and annotate multiconformer molecules into a single receptor.

Required Parameters

-receptor

Filenames of the receptors to use (can be multiple or specified with a wildcard).

-in

Input file of multiconformer molecules to dock.

-out

Output file for docked molecules.

Listing 3: Example program for posing molecules

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oedocking;

import java.net.URL;
import openeye.oechem.*;
import openeye.oedocking.*;

public class PoseMolecules {

    private void pose(OEInterface itf) {
        oemolistream imstr = new oemolistream(itf.GetString("-in"));
        oemolostream omstr = new oemolostream(itf.GetString("-out"));

        OEGraphMol receptor = new OEGraphMol();
        if (!oedocking.OEReadReceptorFile(receptor, itf.GetString("-receptor"))) {
            oechem.OEThrow.Fatal("Unable to read receptor");
        }

        OEPosit poser = new OEPosit();
        poser.Initialize(receptor);

        OEMol mcmol = new OEMol();
        while (oechem.OEReadMolecule(imstr, mcmol)) {
            System.err.format("posing %s\n", mcmol.GetTitle());
            OEGraphMol posedMol = new OEGraphMol();
            poser.DockMultiConformerMolecule(posedMol, mcmol);

            oedocking.OESetSDScore(posedMol, poser, poser.GetName());
            oechem.OEWriteMolecule(omstr, posedMol);
        }
        omstr.close();
        imstr.close();
    }

    public static void main(String argv[]) {
        PoseMolecules app = new PoseMolecules();
        URL fileURL = app.getClass().getResource("PoseMolecules.txt");

        try {
            OEInterface itf = new OEInterface();
            oechem.OEConfigureFromURL(itf, fileURL);
            if (oechem.OEParseCommandLine(itf, argv)) {
                app.pose(itf);
            }
        } catch (java.io.IOException e) {
            System.err.println("Unable to open interface file");
        }
    }
}

POSIT Algorithm

POSIT supplies a robust probability that the given pose is reasonable. It is generally recognized that docking and scoring methods have inaccuracies and do not provide a measure that can be compared between different complexes. For example, a docking score from one complex cannot be directly compared to a docking score form another.

POSIT overcomes this by using 2D, 3D and protein-ligand information to generate a probability that the pose has been correctly placed. This probability is independent of how the ligand was actually placed and hence, can be used to score any prospective pose. Indeed, the ShapeFit method was explicitly designed to maximize this probability.

POSIT probabilities were generated using a large testset containing over 25,000 pose predictions and verified through a smaller number (around 100) of predictions that were then validated with X-Ray crystallography. It is important to note that POSIT does not give a probability of binding, rather it gives a probability that if the ligand does actually bind, what is the likelihood of the POSIT pose being the actual pose.

ShapeFit Method

During a drug discovery campaign, thousands of small molecule inhibitors are made in the course of optimizing molecular properties. For projects that have X-Ray crystallographic (XRC) coordinates, structure-based designs help guide the medicinal chemistry efforts. In many cases XRC provides a detailed picture of the binding of a small-molecule inhibitor into the binding site.

Many techniques exist for pose-prediction and are well documented [Vieth-2004] . However, very few provide a probability that the generated pose is correct, where correct is typically considered to be less than 2.0 Ångströms RMSD (root mean square distance) from experimental crystal structure. In fact, many docking scores such as Chemscore, Chemgauss3, PLP [Martinelli-2010] are not very correlated with correct ligand pose, and worse, are not transferable between systems: the best docking score in one system may not even be close to the best docking score in another.

Two definitions will be used during the remainder of this discussion:

  • Bound-Ligand This is a known, experimentally derived bound ligand from the same protein context in which SHAPEFIT is attempting to find poses of ligands.
  • Fit-Ligand This is the unknown ligand that is being pose-predicted.

Using the Bound Ligand

SHAPEFIT overcomes these issues by comparing predicted poses to observed bound ligands in related co-crystals. As the observed ligand becomes more similar to the the predicted pose, both the binding mode and, indeed, the shape of the receptor pocket itself tends to become more similar.

The similarity measures being used are 2D path-based fingerprints and the 3D TanimotoCombo [Hawkins-2010] that compares shape and the Mills-Dean approximation of electrostatics [MillsDean-1996] . These similarity measures choose the most appropriate system to dock against and provide a prediction of the quality of the result.

The TanimotoCombo measure is agnostic of how the poses in question are generated; it can be used to validate and provide a pose prediction probability regardless of how the pose was generated. In practice, SHAPEFIT exploits the predictive capabilities of the TanimotoCombo measure by using it in an optimization function that drives a flexible fitting routine.

Essentially, during pose optimization, SHAPEFIT attempts to force a predicted pose into the binding mode of a known ligand. If the induced strain becomes too large, the optimization stops. This final pose is used to predict the overall quality of fit. In this fashion, SHAPEFIT is able to rescue 10-20% of the original rigidly overlaid poses and place them within 2.0 Ångströms RMSD of the experimental crystal structure.

Typically during docking, only the protein structure is used to model unknown structures. Given a molecule that is known to bind, SHAPEFIT searches through XRC coordinates of known ligand-protein complexes, determines the complex best able to predict the pose of the molecule and then generates both a pose and the probability that the pose is correct, usually in well under a minute per ligand.

SHAPEFIT‘s basic algorithm:

  1. Given a set of potential complexes, SHAPEFIT chooses the appropriate complex based on the 2D or 3D similarity to the bound ligand. The best complex, in general, has the highest 2D or 3D similarity of the input molecule to the chosen complex’s bound ligand.

  2. After the complex is chosen, a flexible fit is performed that attempts to match the binding mode of the bound ligand using an adiabatic optimization method [Wlodek-2006] . This optimization method is known as the SHAPEFIT potential.

    The term adiabatic comes from the Greek “impassable”, and in this case SHAPEFIT sets up a chemical strain boundary that the optimization cannot broach.

    SHAPEFIT seeds the flexible fit by expanding the poses generated by the original 3D similarity as described in (1) and then applying the shape constraint of the bound ligand.

    As shown in figure *SHAPEFIT* Optimization, SHAPEFIT works by first using the known bound ligand to position the input molecule and follows up by using the bound ligand as a shape constraint during MMFF optimization [Halgren-I-1996] [Halgren-II-1996] [Halgren-III-1996] [Halgren-IV-1996] [Halgren-V-1996] [Halgren-VI-1999] [Halgren-VII-1999] . While the input molecule is being forced into the shape constraint, MMFF strain is monitored to form the adiabatic boundary. When the strain becomes too large, the optimization is reversed or stops altogether.

  3. The interactions from the bound ligand are then used as a further constraint during ligand-protein optimization. This helps to remove clashes with the protein and provide better interactions between un-constrained ligand atoms.

This is a long winded way of saying that SHAPEFIT‘s optimization attempts to force the molecule into the known binding mode without creating undue strain on the molecule being placed into the protein.

*SHAPEFIT* Optimization Technique

SHAPEFIT Optimization: Starting from an initial alignment, use the shape constraint of the bound-ligand to drive a flexible fit while simultaneously limiting strain

As shown in figure SHAPEFIT Cross Docking Results, analyzing the Kinase data set used in Tuccinardi et al, [Martinelli-2010] pose-prediction using SHAPEFIT is seen to perform remarkably better for similar ligands than standard docking techniques at higher TanimotoCombo values:

Cross Docking probability of finding poses within 2.0 |Angstroms|

SHAPEFIT Cross Docking Results: Probability of finding a good pose based on bound-ligand fit-ligand TanimotoCombo similarity. Standard docking results are essentially the same and follow the same trajectories flattening out as they hit their limit of accuracy. While SHAPEFIT performs worse at low similarities it continually increases as similarity increases.

While SHAPEFIT is not a good technique for determining the pose between known bound-ligands and fit ligands with low similarity, as the similarity increases, the probability of determining the correct pose increases rapidly. The similarity where this crosses over is remarkably small, only around 0.9 TanimotoCombo. This is most likely because as the similarity increases, the active site similarity also increases. Below 0.9 TanimotoCombo POSIT reverts to the Hybrid method.

On Clashes

Unlike the FRED and HYBRID methods, SHAPEFIT is heavily biased towards the known bound ligand. In some cases this causes the pose to clash with the protein. This is especially true if the original bound ligand already clashes with the protein.

Currently, we do not filter out clashing poses although options and functionality for controlling this with SHAPEFIT will be investigated in future releases.

Setup

Typically, POSIT doesn’t require explicit setup. Simply create a new OEPosit object. For details about the customization available to OEPosit see the OEPositOptions documentation.

OEPosit poser = new OEPosit();

Initialization

An OEPosit object must be initialized using a receptor prior to posing or scoring molecules. This is done by passing a receptor to the OEPosit.Initialize method, as shown in the following code snippet from Listing 3.

poser.Initialize(receptor);

Posing Molecules

Once the OEPosit object has been initialized, molecules are docked using the OEPosit.DockMultiConformerMolecule method, as shown in this code snippet from Listing 3.

poser.DockMultiConformerMolecule(posedMol, mcmol);

mcmol is a multiconformer representation of the molecule being docked, and posedMol is an OEMolBase the resulting top scoring docked pose is returned in. The probability of the docked molecule can be obtained by calling the OEMolBase.GetEnergy method of pose.

In this example the best scoring pose as returned into posedMol.

Note

DockMultiConformerMolecule may return fewer than the requested number of poses.

Listing 3 uses the convenience function OESetSDScore to assign the scores to SD data on the molecules. This function simply assigns the result of ‘OEPosit::ScoreLigand’ to the SD data with tag sdtag.

oedocking.OESetSDScore(posedMol, poser, poser.GetName());

Using Multiple Receptors

Using multiple receptors to dock or pose molecules has been shown to greatly increase the reliability of docking.

The Docking Toolkit provides a convenience function that helps identify the receptor most likely to accept the proposed ligand. The method used is similar to the POSIT Algorithm in that it assigns a probability that, if the ligand binds to the protein, that the ligand adopts the correct binding mode.

Note

If the receptor in question does not have a bound ligand, the probability currently returned is zero.

Note

The following examples show OEPosit but can trivially be modified to use OEHybrid

Posing To Multiple Receptors

The first examples, Listing 4, shows how to pose a ligand against all receptors.

Listing 4: Example program for posing molecules with multiple receptors

/*******************************************************************************
 * Copyright 2004-2016 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oedocking;

import java.net.URL;

import java.util.Iterator;
import java.util.List;
import java.util.ArrayList;

import openeye.oechem.*;
import openeye.oedocking.*;

public class PoseMolsWithMultiReceptors {

    private void pose(OEInterface itf) {
        // Load the receptors
        List<OEGraphMol> receptors = new ArrayList<OEGraphMol>();

        for (String receptor_filename : itf.GetStringList("-receptors")) {

          OEGraphMol receptor = new OEGraphMol();
          if (!oedocking.OEReadReceptorFile(receptor, receptor_filename)) {
            oechem.OEThrow.Fatal("Unable to read receptor");
          }
          receptors.add(receptor);
        }

        OEPosit poser = new OEPosit();

        // Loop through all the receptors and pose molecules
        Iterator receptor = receptors.iterator();
        int receptor_idx = 0;
        while(receptor.hasNext())
        {
          oemolistream imstr = new oemolistream(itf.GetString("-in"));
          oemolostream omstr = new oemolostream(
            itf.GetString("-prefix") + "_" + Integer.toString(receptor_idx) + ".oeb");

          poser.Initialize((OEGraphMol)receptor.next());
          
          OEMol mcmol = new OEMol();
          while (oechem.OEReadMolecule(imstr, mcmol)) {
            System.err.format("posing %s\n", mcmol.GetTitle());
            OEGraphMol posedMol = new OEGraphMol();
            poser.DockMultiConformerMolecule(posedMol, mcmol);

            oedocking.OESetSDScore(posedMol, poser, poser.GetName());
            oechem.OEWriteMolecule(omstr, posedMol);
          }
          omstr.close();
          imstr.close();
          receptor_idx = receptor_idx + 1;
        }

    }

    public static void main(String argv[]) {
        PoseMolsWithMultiReceptors app = new PoseMolsWithMultiReceptors();
        URL fileURL = app.getClass().getResource("PoseMolsWithMultiReceptors.txt");

        try {
            OEInterface itf = new OEInterface();
            oechem.OEConfigureFromURL(itf, fileURL);
            if (oechem.OEParseCommandLine(itf, argv)) {
                app.pose(itf);
            }
        } catch (java.io.IOException e) {
            System.err.println("Unable to open interface file");
        }
    }
}

One of the difficulties of docking to multiple receptors, is that they can consume quite a large amount of memory so holding multiple docking objects at the same time can be problematic.

The solution shown in the examples starts with the first receptor, poses all input ligands to it, outputs them and then moves on to the second receptor. Each receptor outputs to a different file in this example.

First, it is necessary to read the receptors into a list or a vector.

for (String receptor_filename : itf.GetStringList("-receptors")) {

  OEGraphMol receptor = new OEGraphMol();
  if (!oedocking.OEReadReceptorFile(receptor, receptor_filename)) {
    oechem.OEThrow.Fatal("Unable to read receptor");
  }
  receptors.add(receptor);
}

Then it is a simple matter of looping through the receptors and posing all ligands.

// Loop through all the receptors and pose molecules
Iterator receptor = receptors.iterator();
int receptor_idx = 0;
while(receptor.hasNext())
{
  oemolistream imstr = new oemolistream(itf.GetString("-in"));
  oemolostream omstr = new oemolostream(
    itf.GetString("-prefix") + "_" + Integer.toString(receptor_idx) + ".oeb");

  poser.Initialize((OEGraphMol)receptor.next());
  
  OEMol mcmol = new OEMol();
  while (oechem.OEReadMolecule(imstr, mcmol)) {
    System.err.format("posing %s\n", mcmol.GetTitle());
    OEGraphMol posedMol = new OEGraphMol();
    poser.DockMultiConformerMolecule(posedMol, mcmol);

    oedocking.OESetSDScore(posedMol, poser, poser.GetName());
    oechem.OEWriteMolecule(omstr, posedMol);
  }
  omstr.close();
  imstr.close();
  receptor_idx = receptor_idx + 1;
}

Posing To The Best Receptor

In the second example, Listing 5, a helper function that takes a collection of receptors and returns the best one for the current ligand is used to pose to a single receptor.

Listing 5: Example program for posing molecules with best receptor

/*******************************************************************************
 * Copyright 2004-2016 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oedocking;

import java.net.URL;

import java.util.Iterator;
import java.util.List;
import java.util.ArrayList;

import openeye.oechem.*;
import openeye.oedocking.*;

public class PoseMolsWithBestReceptor {

    private int BestReceptorIndex(OEMCMolBase ligand,
                                  List<OEGraphMol> receptors)
    {
      int best_idx = Integer.MAX_VALUE;
      double best_probability = 0.0;
      
      Iterator receptor = receptors.iterator();
      int receptor_idx = 0;
      while(receptor.hasNext())
      {
        double probability = oedocking.OEGetEstimatedPositProbability(
          ligand,
          (OEGraphMol)receptor.next());
        if (probability > best_probability)
        {
          best_probability = probability;
          best_idx = receptor_idx;
        }
        receptor_idx = receptor_idx + 1;
      }
      
      return best_idx;
    }


    private void pose(OEInterface itf) {
        // Load the receptors
        List<OEGraphMol> receptors = new ArrayList<OEGraphMol>();

        for (String receptor_filename : itf.GetStringList("-receptors")) {

          OEGraphMol receptor = new OEGraphMol();
          if (!oedocking.OEReadReceptorFile(receptor, receptor_filename)) {
            oechem.OEThrow.Fatal("Unable to read receptor");
          }
          receptors.add(receptor);
        }

        OEPosit poser = new OEPosit();

        // Loop through all the receptors and pose molecules to
        //  the best receptor, note - some output files may be
        //  empty.
        Iterator receptor = receptors.iterator();
        int receptor_idx = 0;
        while(receptor.hasNext())
        {
          oemolistream imstr = new oemolistream(itf.GetString("-in"));
          oemolostream omstr = new oemolostream(
            itf.GetString("-prefix") + "_" + Integer.toString(receptor_idx) + ".oeb");

          poser.Initialize((OEGraphMol)receptor.next());
          
          OEMol mcmol = new OEMol();

          while (oechem.OEReadMolecule(imstr, mcmol)) {
            // Only pose if we are the best receptor
            if ( receptor_idx == BestReceptorIndex(mcmol, receptors) )
            {
              System.err.format("Posing ligand %s to receptor %d", 
                                mcmol.GetTitle(), receptor_idx);

              OEGraphMol posedMol = new OEGraphMol();
              poser.DockMultiConformerMolecule(posedMol, mcmol);
              
              oedocking.OESetSDScore(posedMol, poser, poser.GetName());
              oechem.OEWriteMolecule(omstr, posedMol);
            }
          }
          omstr.close();
          imstr.close();
          receptor_idx = receptor_idx + 1;
        }

    }

    public static void main(String argv[]) {
        PoseMolsWithBestReceptor app = new PoseMolsWithBestReceptor();
        URL fileURL = app.getClass().getResource("PoseMolsWithBestReceptor.txt");

        try {
            OEInterface itf = new OEInterface();
            oechem.OEConfigureFromURL(itf, fileURL);
            if (oechem.OEParseCommandLine(itf, argv)) {
                app.pose(itf);
            }
        } catch (java.io.IOException e) {
            System.err.println("Unable to open interface file");
        }
    }
}

The helper function is a simple loop through a list or vector:

private int BestReceptorIndex(OEMCMolBase ligand,
                              List<OEGraphMol> receptors)
{
  int best_idx = Integer.MAX_VALUE;
  double best_probability = 0.0;
  
  Iterator receptor = receptors.iterator();
  int receptor_idx = 0;
  while(receptor.hasNext())
  {
    double probability = oedocking.OEGetEstimatedPositProbability(
      ligand,
      (OEGraphMol)receptor.next());
    if (probability > best_probability)
    {
      best_probability = probability;
      best_idx = receptor_idx;
    }
    receptor_idx = receptor_idx + 1;
  }
  
  return best_idx;
}

Once this function exists, we can use it to check if the current receptor in the receptor loop is the appropriate one to pose against:

while (oechem.OEReadMolecule(imstr, mcmol)) {
  // Only pose if we are the best receptor
  if ( receptor_idx == BestReceptorIndex(mcmol, receptors) )
  {
    System.err.format("Posing ligand %s to receptor %d", 
                      mcmol.GetTitle(), receptor_idx);

    OEGraphMol posedMol = new OEGraphMol();
    poser.DockMultiConformerMolecule(posedMol, mcmol);
    
    oedocking.OESetSDScore(posedMol, poser, poser.GetName());
    oechem.OEWriteMolecule(omstr, posedMol);
  }
}
omstr.close();
imstr.close();
receptor_idx = receptor_idx + 1;

Remember that these examples can also be used with OEHybrid or OEDock.