Scoring

Scoring functions in the Docking Toolkit measure the fitness of ligands posed within the active site of a target protein and assign them a numerical score. Poses with better scores are more likely to be correctly docked compared to other poses of the same ligand. The score of a ligand is the best score of any pose of that ligand, and ligands with better scores are more likely to be active against the target protein compared to other ligands docked.

The following scoring functions are implemented in Docking TK

  1. Shapegauss (OEScoreType::Shapegauss)
  2. PLP (OEScoreType::PLP)
  3. Chemgauss3 (OEScoreType::Chemgauss3)
  4. Chemgauss4 (OEScoreType::Chemgauss4)
  5. Chemscore (OEScoreType::Chemscore)

In the Docking Toolkit scoring, optimization and score annotation with any of these scoring functions is done using the OEScore class.

Listing 1 is an example program that uses the OEScore class to score, optimize and annotate poses.

Required Parameters
-receptor

Filename of receptor.

-in

Input file of poses to rescore.

-out

Output file for rescored molecules

Optional Parameters
-score

Scoring function to use (defaults to OEScoreType::Default)

-optimize

Flag to optimize molecules before rescoring

Listing 1: Example program for rescoring, optimizing and annotating molecules

#include "openeye.h"
#include "oesystem.h"
#include "oechem.h"
#include "oedocking.h"
#include <string>

using namespace OESystem;
using namespace OEChem;
using namespace OEDocking;
using namespace std;

#include "RescorePoses.itf"

int main(int argc, char** argv)
{
  OEInterface itf(InterfaceData);
  OEScoreTypeConfigure(itf, "-score");
  if (!OEParseCommandLine(itf,argc,argv)) 
    return 1;

  oemolistream imstr(itf.Get<string>("-in"));
  oemolostream omstr(itf.Get<string>("-out"));

  OEGraphMol receptor;
  if (!OEReadReceptorFile(receptor, itf.Get<string>("-receptor")))
    OEThrow.Fatal("Unable to read receptor");

  unsigned int scoreType = OEScoreTypeGetValue(itf, "-score");
  OEScore score(scoreType);
  score.Initialize(receptor);

  OEMol ligand;
  bool optimize = itf.Get<bool>("-optimize");
  while (OEReadMolecule(imstr, ligand))
  {
    if (optimize)
      score.SystematicSolidBodyOptimize(ligand);
    score.AnnotatePose(ligand);
    string sdtag = score.GetName();
    OESetSDScore(ligand, score, sdtag);
    OESortConfsBySDTag(ligand, sdtag, score.GetHighScoresAreBetter()); 
    OEWriteMolecule(omstr, ligand);
  }
  return 0;
}

Choosing a scoring function

The scoring function is chosen when the OEScore object is constructed, as shown in this snippet of code from Listing 1.

OEScore score(scoreType);

scoreType is an unsigned int constant from the OEScoreType namespace identifying a scoring function.

Supported scoring functions are

Note

OEScore may also be constructed without the scoreType parameter, in which case the default scoring function (Chemgauss4) is used.

Initialization

An OEScore object must be initialized with the structure of the target protein and the location of the active site. This is done by passing either a receptor object (see Receptors chapter) or protein and box enclosing the active site to the OEScore::Initialize method.

Initializing with a receptor is the simplest method as shown in this snippet of code from Listing 1.

score.Initialize(receptor);

OEScore objects can also be initialized using a protein and a box enclosing the active site. The following example initializes an OEScore object using the structure of a target protein and a box that is setup by first enclosing a docked ligand and then adding a 4 Angstrom padding to all sides of the box.

OEGraphMol protein;
oemolistream imstr(argv[1]);
OEReadMolecule(imstr, protein);

OEGraphMol pose;
imstr.open(argv[2]);
OEReadMolecule(imstr, pose);

float addbox = 4.0f;
OEBox box;
OESetupBox(box, pose, addbox);

OEScore oescore;
oescore.Initialize(protein, box);

Initialization can take 1-2 minutes for large active sites. This time is invariant to the initialization method (i.e. receptor vs. protein-box).

Scoring Poses

The OEScore class can calculate the score of a pose as well as break down that score into contributions from individual components of the scoring function as shown in the following example.

void PrintScore(OEScore& score, const OEMolBase& pose)
{
  cout << "Total ligand score = " << score.ScoreLigand(pose) << endl;

  cout << "Score components contributions to score:" << endl; 
  for (OEIter<const string> comp = score.GetComponentNames() ; comp ; ++comp)
    cout << *comp << ": " << score.ScoreLigandComponent(pose, *comp) << endl;
}

Scores can also be calculated on a per-atom basis, and broken down into contributions from individual components of the scoring function.

void PrintAtomScore(OEScore& score, 
                    const OEMolBase& pose, 
                    const OEAtomBase* atom)
{
  cout << endl << "Atom: " << atom->GetIdx() << " score: " << score.ScoreAtom(atom, pose) << endl;

  cout << "Score components contribution to atom scores:" << endl; 
  for (OEIter<const string> comp = score.GetComponentNames() ; comp ; ++comp)
    cout << *comp << ": " << score.ScoreAtomComponent(atom, pose, *comp) << endl;
}

Listing 1 uses the convenience function OESetSDScore to assign the scores to SD data on the molecules. This function simply assigns the result of OEScore::ScoreLigand to SD data with tag sdtag.

OESetSDScore(ligand, score, sdtag);

Annotating

Annotating is a method of attaching score information to a molecule that can then be viewed in VIDA. When an annotated molecule is loaded into VIDA the score information will appear in the list window and the scores of individual atoms shown in the 3D window. The following snippet of code from Listing 1 annotates a pose.

score.AnnotatePose(ligand);

Warning

Annotated poses must be saved in .oeb or .oeb.gz format. Saving to another format will result in the loss of the annotation information.

Optimizing

The OEScore class can optimize poses using a systematic solid body optimization, as shown in this snippet from Listing 1

score.SystematicSolidBodyOptimize(ligand);

This optimization is rigid with respect to the ligand and the protein. Each pose is optimized by taking one positive and one negative step for each of the 6 degrees of freedom (3 rotational and 3 translational) for a total of 729 positions tested (3 to the 6th).

Sorting Poses

Multiple poses of a docked molecule are generally stored as conformers of an OEMol object. The original ordering of poses in the OEMol is not altered when OEScore::SystematicSolidBodyOptimize is called (or any of the score assignment free functions). Listing 1 uses OEChem free function OESortConfsBySDTag to insure that the rescored poses of each molecule are ordered from best to worst.

OESortConfsBySDTag(ligand, sdtag, score.GetHighScoresAreBetter()); 

The method OEScore::GetHighScoresAreBetter will return false if lower value scores are better scores and true if higher value scores are better scores.

Command Line Interface

Listing 1 uses the OEInterface class to provide its command line interface. The primary documentation for the OEInterface class is in the OEChem documentation. Listing 1, however, makes uses of 2 convenience functions unique to the Docking Toolkit to define and use the -score command line parameter.

The following snippet from the example program shown in Listing 1 adds the -score flag to the command line, which allows the user to set the scoring function (see Scoring Function Implementations section).

OEScoreTypeConfigure(itf, "-score");

The user specified value, or the default value OEScoreType::Default, is obtained in the following snippet from Listing 1.

unsigned int scoreType = OEScoreTypeGetValue(itf, "-score");

Scoring Function Implementations

Shapegauss

Shapegauss [McGann-2003] is a shape based scoring function that favors poses that complement the active site well, regardless of any chemical interactions (e.g. hydrogen bonds). The Shapegauss scoring function represents the shape of each atom as a smooth Gaussian function.

The Shapegauss score is calculated by summing a pairwise potential between all protein atoms and all ligand heavy atoms. This potential is most favorable when the two atoms touch but do not overlap. A correction term is then applied to further penalize atoms which significantly overlap the protein.

PLP

PLP [Verkhivker-2000] or Piecewise Linear Potential scoring function calculates both the shape and hydrogen bond complementarity of poses to the active site.

The PLP score is a pairwise additive scoring function. All pairs of ligand-protein heavy atoms are assigned either a hydrogen bonding potential, if the pair is an acceptor-donor pair, or otherwise a lipophilic potential. These pairwise potentials are summed to obtain the final pose score.

The PLP implementation in the Docking Toolkit has also been extended to include interaction between metals on the target protein and acceptors on the ligand.

Chemscore

Chemscore [Eldridge-1997] is a sum of the following interaction terms

lipophilic

Favorable interactions that occur when two non-polar heavy atoms (one ligand atom and one protein atom) are placed near each other.

hydrogen bonding

Favorable interactions that occur when acceptor-donor interactions are formed between the ligand and protein.

metal chelator

Favorable interactions that occur when acceptor atoms on the ligand are placed near metal atoms on the protein.

clash

Penalty term that occurs when heavy atoms on the ligand and protein clash.

rotatable bond

This penalty term is proportional to the number of rotatable bonds on the ligand that are no longer free to rotate when the ligand is docked.

Chemgauss3

The Chemgauss3 scoring function uses Gaussian smoothed potentials to measure the complementarity of ligand poses within the active site. Chemgauss3 recognizes the following types of interactions.

  • Shape
  • Hydrogen bonding between ligand and protein
  • Hydrogen bonding interactions with implicit solvent
  • Metal-chelator interactions.

All interaction potentials in Chemgauss are initially constructed using step functions to describe the interaction of atom pairs (or other chemical points) as a function of distance. These interactions are mapped onto a grid that is then convoluted with a spherical Gaussian function, which smoothes the potential making it less sensitive to small changes in the ligand position. Smoothing the score in this way serves two purposes, first docking can be run at lower resolution than would be required if the score were not smooth since small changes in position to do not cause large changes in score. Second it reduces the error associated with the rigid protein approximation by effectively accounting for the ability of the protein to make small structural re-arrangements to accommodate the ligand.

Shape interactions in Chemgauss are based on a united atom model (i.e. only heavy atoms are relevant to the shape calculation). Each ligand heavy atom is assigned a fixed clash penalty score if the distance between it and a protein heavy atom is less than the sum of the VdW radii, otherwise it is assigned a score proportional to the count of the number of protein heavy atoms within 1.25 and 2.5 times the sum of the VdW radii (atoms within 2.5 count one tenth as much as those within 1.25). From this score a penalty equal to two close protein atom contacts is subtracted to represent the VdW interactions with solvent water that are lost when the ligand docks. This score is pre-computed at grid points throughout the active site and the resulting grid is then smoothed.

Hydrogen bonding groups are modeled with one or more lone-pair or polar-hydrogen positions that describe the directionality of potential hydrogen bonds (with respect to the hydrogen bonding group’s heavy atom). Donor groups have lone pair positions representing the possible location of the donor hydrogen atoms relative to the donating molecule, while acceptors have lone-pair positions representing the possible locations of the donated hydrogen relative to the acceptor. A hydrogen bond is detected and assigned a constant score when a hydrogen bonding position on the ligand is within 1.0 Angstroms of a complementary hydrogen bonding position on the protein (i.e. when the polar-hydrogen position of a donor overlaps the lone-pair position of an acceptor). If the ligand hydrogen bonding group has multiple polar-hydrogens and/or lone-pair positions (groups can be both donors and acceptors) then this calculation is performed for each position and the result is summed. As with all Chemgauss terms the hydrogen bond potential is pre-computed at grid points throughout the site and then smoothed.

Hydrogen bonds with solvent that break when the ligand docks into the active site are penalized by the Chemgauss scoring function. Broken protein-solvent hydrogen bonds are accounted for by calculating how many hydrogen bonds water can make with the protein at the position of each heavy atom of the docked ligand, and a penalty score is assigned which is proportional to the number of hydrogen bonds. Broken ligand-solvent hydrogen bonds are accounted for by calculating desolvation positions around each hydrogen-bonding group on the ligand that represent the positions water could occupy when making a hydrogen bonding interaction with the protein. A penalty is then assessed that is proportional to the number of desolvation positions that can no longer be occupied by water because the water in these positions would clash with the protein. As before, this potential is placed on a grid and smoothed.

Chelating interactions between protein metals and ligand chelating groups are accounted for by Chemgauss (protein-chelator and ligand-metal chelating interactions are not). For each chelator on the ligand one or more chelating-positions are calculated. If a protein metal is within 1.0 Angstroms of any chelating-position of a chelating group then a fixed score is assigned, otherwise a zero score is assigned. As before this potential is placed on a grid and smoothed.

Chemgauss4

The Chemgauss4 is a modification of the Chemgauss3 scoring function that has improved hydrogen bonding and metal chelator (The shape and implicit solvent interaction terms are identical to those in Chemgauss3). The new hydrogen bonding and metal chelator terms have better perception of the directionality of these interactions and also account for hydrogen bond networking effects.

To calculate the hydrogen bonding score for a ligand-protein hydrogen bond two distances are measured.

  1. How far the donor heavy atom is from the position the acceptor atom would consider to be an ideal for a hydrogen bonding to form.
  2. How far the acceptor heavy atom is from the position the donor atom would consider to be ideal for a hydrogen bonding interaction to occur.

The score for the hydrogen bond interaction is a product of two Gaussian functions of these distances scaled by the strength of the hydrogen bonding groups involved in the interaction.

HBscore = strength*g(distance1)*g(distance2)

To compute the total hydrogen bonding score for the ligand-protein complex the individual pairwise scores are calculated for all protein-ligand donor-acceptor. Individual HB interaction are then eliminated if either the donor or acceptor exceeds the maximum number of interactions allowed (e.g., a hydroxyl with one hydrogen is not allowed to make more than one donor interaction), with the lowest scoring interactions eliminated first. The final hydrogen bond score is then calculated by summing the scores of the remained individual acceptor-donor interactions.

Chemical Gaussian Overlay

The Chemical Gaussian Overlay function (or CGO) is a hybrid scoring function that scores poses based on their similarity to a known bound ligand and the interactions both the docked and bound ligand make with the protein active site. This scoring function is not implemented by the OEScore class. This scoring function is used by the OEDock class during the exhaustive search (see Docking Algorithm section) when using the hybrid docking method (see Hybrid Method section).

CGO is primarily a ligand-based scoring functions although some information from the protein structure is used as well. The similarities computed are based on the overall shape of the molecules as well as the position of hydrogen bonding and metal chelating groups. This scoring function requires a bound ligand pose along with the structure of the target protein. Typically the ligand structure is obtained from X-ray crystallography along with the structure of the target protein, although a docked ligand could also, in principal, be used.

CGO represents molecules as a set of spherical Gaussian functions describing the shape and chemistry (acceptors, donors and chelators) of the molecule. The Gaussians representing the shape of the molecule are centered at the heavy atom positions, those for donors are centered on polar-hydrogen positions (i.e. positions where the donating hydrogen could be when it is involved in a hydrogen bond), those for acceptors are centered on lone-pair positions (i.e. positions where a donating hydrogen could be when a hydrogen bond is formed) and those for chelators are centered at chelating positions (i.e. locations where a metal could have a chelating interaction). The overlap of the Gaussians on the docked ligand to those on the bound ligand are computed for each type of Gaussian (e.g. shape, donor, acceptor and chelator) by summing the overlap of individual pairs of Gaussian. The overlap of each individual pair is calculated by integrating the product of the two. To prevent chemistry not relevant to binding from contributing to the overall score, when calculating the chemistry overlaps (i.e. acceptor, donor and chelator) only groups that make the interaction with the protein are accounted for (e.g. a chelator that does not interact with a metal on the protein is ignored in the overlap calculation). The sum of all four types of overlaps is the CGO score.