Scoring functions in the Docking Toolkit measure the fitness of ligands posed within the active site of a target protein and assign them a numerical score. Poses with better scores are more likely to be correctly docked compared to other poses of the same ligand. The score of a ligand is the best score of any pose of that ligand, and ligands with better scores are more likely to be active against the target protein compared to other ligands docked.
The following scoring functions are implemented in Docking TK
- Shapegauss (OEScoreType.Shapegauss)
- PLP (OEScoreType.PLP)
- Chemgauss3 (OEScoreType.Chemgauss3)
- Chemgauss4 (OEScoreType.Chemgauss4)
- Chemscore (OEScoreType.Chemscore)
In the Docking Toolkit scoring, optimization and score annotation with any of these scoring functions is done using the OEScore class.
Scoring Function Implementations¶
Shapegauss [McGann-2003] is a shape based scoring function that favors poses that complement the active site well, regardless of any chemical interactions (e.g. hydrogen bonds). The Shapegauss scoring function represents the shape of each atom as a smooth Gaussian function.
The Shapegauss score is calculated by summing a pairwise potential between all protein atoms and all ligand heavy atoms. This potential is most favorable when the two atoms touch but do not overlap. A correction term is then applied to further penalize atoms which significantly overlap the protein.
PLP [Verkhivker-2000] or Piecewise Linear Potential scoring function calculates both the shape and hydrogen bond complementarity of poses to the active site.
The PLP score is a pairwise additive scoring function. All pairs of ligand-protein heavy atoms are assigned either a hydrogen bonding potential, if the pair is an acceptor-donor pair, or otherwise a lipophilic potential. These pairwise potentials are summed to obtain the final pose score.
The PLP implementation in the Docking Toolkit has also been extended to include interaction between metals on the target protein and acceptors on the ligand.
Chemscore [Eldridge-1997] is a sum of the following interaction terms
lipophilicFavorable interactions that occur when two non-polar heavy atoms (one ligand atom and one protein atom) are placed near each other.
hydrogen bondingFavorable interactions that occur when acceptor-donor interactions are formed between the ligand and protein.
metal chelatorFavorable interactions that occur when acceptor atoms on the ligand are placed near metal atoms on the protein.
clashPenalty term that occurs when heavy atoms on the ligand and protein clash.
rotatable bondThis penalty term is proportional to the number of rotatable bonds on the ligand that are no longer free to rotate when the ligand is docked.
The Chemgauss3 scoring function uses Gaussian smoothed potentials to measure the complementarity of ligand poses within the active site. Chemgauss3 recognizes the following types of interactions.
- Hydrogen bonding between ligand and protein
- Hydrogen bonding interactions with implicit solvent
- Metal-chelator interactions.
All interaction potentials in Chemgauss are initially constructed using step functions to describe the interaction of atom pairs (or other chemical points) as a function of distance. These interactions are mapped onto a grid that is then convoluted with a spherical Gaussian function, which smoothes the potential making it less sensitive to small changes in the ligand position. Smoothing the score in this way serves two purposes, first docking can be run at lower resolution than would be required if the score were not smooth since small changes in position to do not cause large changes in score. Second it reduces the error associated with the rigid protein approximation by effectively accounting for the ability of the protein to make small structural re-arrangements to accommodate the ligand.
Shape interactions in Chemgauss are based on a united atom model (i.e. only heavy atoms are relevant to the shape calculation). Each ligand heavy atom is assigned a fixed clash penalty score if the distance between it and a protein heavy atom is less than the sum of the VdW radii, otherwise it is assigned a score proportional to the count of the number of protein heavy atoms within 1.25 and 2.5 times the sum of the VdW radii (atoms within 2.5 count one tenth as much as those within 1.25). From this score a penalty equal to two close protein atom contacts is subtracted to represent the VdW interactions with solvent water that are lost when the ligand docks. This score is pre-computed at grid points throughout the active site and the resulting grid is then smoothed.
Hydrogen bonding groups are modeled with one or more lone-pair or polar-hydrogen positions that describe the directionality of potential hydrogen bonds (with respect to the hydrogen bonding group’s heavy atom). Donor groups have lone pair positions representing the possible location of the donor hydrogen atoms relative to the donating molecule, while acceptors have lone-pair positions representing the possible locations of the donated hydrogen relative to the acceptor. A hydrogen bond is detected and assigned a constant score when a hydrogen bonding position on the ligand is within 1.0 Angstroms of a complementary hydrogen bonding position on the protein (i.e. when the polar-hydrogen position of a donor overlaps the lone-pair position of an acceptor). If the ligand hydrogen bonding group has multiple polar-hydrogens and/or lone-pair positions (groups can be both donors and acceptors) then this calculation is performed for each position and the result is summed. As with all Chemgauss terms the hydrogen bond potential is pre-computed at grid points throughout the site and then smoothed.
Hydrogen bonds with solvent that break when the ligand docks into the active site are penalized by the Chemgauss scoring function. Broken protein-solvent hydrogen bonds are accounted for by calculating how many hydrogen bonds water can make with the protein at the position of each heavy atom of the docked ligand, and a penalty score is assigned which is proportional to the number of hydrogen bonds. Broken ligand-solvent hydrogen bonds are accounted for by calculating desolvation positions around each hydrogen-bonding group on the ligand that represent the positions water could occupy when making a hydrogen bonding interaction with the protein. A penalty is then assessed that is proportional to the number of desolvation positions that can no longer be occupied by water because the water in these positions would clash with the protein. As before, this potential is placed on a grid and smoothed.
Chelating interactions between protein metals and ligand chelating groups are accounted for by Chemgauss (protein-chelator and ligand-metal chelating interactions are not). For each chelator on the ligand one or more chelating-positions are calculated. If a protein metal is within 1.0 Angstroms of any chelating-position of a chelating group then a fixed score is assigned, otherwise a zero score is assigned. As before this potential is placed on a grid and smoothed.
The Chemgauss4 is a modification of the Chemgauss3 scoring function that has improved hydrogen bonding and metal chelator (The shape and implicit solvent interaction terms are identical to those in Chemgauss3). The new hydrogen bonding and metal chelator terms have better perception of the directionality of these interactions and also account for hydrogen bond networking effects.
To calculate the hydrogen bonding score for a ligand-protein hydrogen bond two distances are measured.
- How far the donor heavy atom is from the position the acceptor atom would consider to be an ideal for a hydrogen bonding to form.
- How far the acceptor heavy atom is from the position the donor atom would consider to be ideal for a hydrogen bonding interaction to occur.
The score for the hydrogen bond interaction is a product of two Gaussian functions of these distances scaled by the strength of the hydrogen bonding groups involved in the interaction.
HBscore = strength*g(distance1)*g(distance2)
To compute the total hydrogen bonding score for the ligand-protein complex the individual pairwise scores are calculated for all protein-ligand donor-acceptor. Individual HB interaction are then eliminated if either the donor or acceptor exceeds the maximum number of interactions allowed (e.g., a hydroxyl with one hydrogen is not allowed to make more than one donor interaction), with the lowest scoring interactions eliminated first. The final hydrogen bond score is then calculated by summing the scores of the remained individual acceptor-donor interactions.
Chemical Gaussian Overlay¶
The Chemical Gaussian Overlay function (or CGO) is a hybrid scoring function that scores poses based on their similarity to a known bound ligand and the interactions both the docked and bound ligand make with the protein active site. This scoring function is not implemented by the OEScore class. This scoring function is used by the OEDock class during the exhaustive search (see Docking Algorithm section) when using the hybrid docking method (see Hybrid Method section).
CGO is primarily a ligand-based scoring functions although some information from the protein structure is used as well. The similarities computed are based on the overall shape of the molecules as well as the position of hydrogen bonding and metal chelating groups. This scoring function requires a bound ligand pose along with the structure of the target protein. Typically the ligand structure is obtained from X-ray crystallography along with the structure of the target protein, although a docked ligand could also, in principal, be used.
CGO represents molecules as a set of spherical Gaussian functions describing the shape and chemistry (acceptors, donors and chelators) of the molecule. The Gaussians representing the shape of the molecule are centered at the heavy atom positions, those for donors are centered on polar-hydrogen positions (i.e. positions where the donating hydrogen could be when it is involved in a hydrogen bond), those for acceptors are centered on lone-pair positions (i.e. positions where a donating hydrogen could be when a hydrogen bond is formed) and those for chelators are centered at chelating positions (i.e. locations where a metal could have a chelating interaction). The overlap of the Gaussians on the docked ligand to those on the bound ligand are computed for each type of Gaussian (e.g. shape, donor, acceptor and chelator) by summing the overlap of individual pairs of Gaussian. The overlap of each individual pair is calculated by integrating the product of the two. To prevent chemistry not relevant to binding from contributing to the overall score, when calculating the chemistry overlaps (i.e. acceptor, donor and chelator) only groups that make the interaction with the protein are accounted for (e.g. a chelator that does not interact with a metal on the protein is ignored in the overlap calculation). The sum of all four types of overlaps is the CGO score.