Calculating simple overlap

The simplest object in the Shape toolkit, OEOverlap, is used to calculate simple, static, overlap between two objects (molecules or grids). Note that static means that the two input species (ref and fit) are not moved at all. This object simply calculate the overlap given the input positions. Performing calculations that actually optimize the alignment/overlap are done with the OEBestOverlay object (see Theory chapter).

This first example reads in a reference molecule and a few fit molecules and prints out the overlap calculated. Note that this example uses all default settings for OEOverlap (discussed in the following sections).

Listing 1: Simple overlap using Exact Overlap

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <reffile> <fitfile>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])

    refmol = OEGraphMol()
    OEReadMolecule(reffs, refmol)

    ov = OEOverlap()
    ov.SetRefMol(refmol)

    res = OEOverlapResults()
    for fitmol in fitfs.GetOEGraphMols():
        ov.Overlap(fitmol, res)
        print("title: %s  exact tanimoto = %.2f" %
              (fitmol.GetTitle(), res.tanimoto))

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Overlap Methods

This nature of the algorithm used to calculate overlap is determined by the OEOverlap.SetMethod member of OEOverlap. At present there are four types of algorithms implemented: Exact, Analytic, Analytic2, and Grid. These are defined in the OEOverlapMethod namespace.

The Exact option is as detailed above: all pairs are considered and the Gaussian overlaps are calculated exactly. The two Analytic options use an approximation to the overlap between two Gaussians that is accurate to about one part in a thousand. In addition, Analytic2 uses a proximity grid to only calculate those atoms pairs that are within a certain threshold distance (by default 4.5 Å). This approximation adds another one part in a thousand average error but is faster for larger molecules. The final option, Grid, uses a grid representation of the volume of the target molecule. It requires significant set-up time relative to the cost of a single overlap calculation (~0.01s compared to ~0.0001s) but is significantly faster than other methods for the evaluation of each overlap once set. Grid suffers a few caveats and drawbacks. First is that, currently, although reference radii are all treated as given, fit atoms are all treated as if they have one radius (that assigned to carbon, and settable via OEOverlap.SetCarbonRadius). The second is that the approximation is slightly worse, typically a few parts in a thousand, at typical grid resolutions. Both Analytic2 and Grid improve performance when the fit molecule is large (>20 atoms) because, if there are n atoms in the fit and m in the reference, the work per atom in the fit is proportional to a constant not n.

By default, OEOverlap performs an Exact overlap calculation. The OEBestOverlay object (used in ROCS) uses Grid for speed.

Radii approximations

Since we are considering molecular volume overlap as a measure, the radii used for each atom is important. There are essentially two settings. A radii approximation of OEOverlapRadii_All means each atom will be treated with the radius as passed in. Alternatively, one can treat all heavy atoms as similar and apply the OEOverlapRadii_Carbon radius approximation. With this, all heavy atoms will be assigned the same radius, regardless of the value attached to each atom. As noted above, if the OEOverlapMethod_Grid method is chosen, the OEOverlapRadii_Carbon radius approximation is also turned on.

Using hydrogens

Shape is a heavy atom property and most OpenEye uses (as well as the defaults for OEOverlap and OEBestOverlay) ignore hydrogens. Hydrogens can be present or not, if OEOverlap is told to not use them, they will not affect the overlap scores.

Adding scores to molecules using SD data

This next example program switches on the OEOverlapMethod_Analytic overlap method and also uses a little extra OEChem to attach the overlap scores to each molecule as SD data. This can be used to rescore a set of ROCS hits or to measure the overlap of ROCS alignments against an exclusion region in the binding site.

Listing 2: Rescoring pre-aligned structures

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 4:
        OEThrow.Usage("%s <reffile> <rocs_hits_file> <output.sdf>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])
    outfs = oemolostream(argv[3])

    refmol = OEGraphMol()
    OEReadMolecule(reffs, refmol)

    ov = OEOverlap()
    ov.SetMethod(OEOverlapMethod_Analytic)
    ov.SetRefMol(refmol)

    res = OEOverlapResults()
    for fitmol in fitfs.GetOEGraphMols():
        ov.Overlap(fitmol, res)
        OESetSDData(fitmol, "AnalyticTanimoto", "%.2f" % res.tanimoto)
        OEWriteMolecule(outfs, fitmol)

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Shape from Hermite expansion

Using Hermite expansion we can obtain two functionalities: expand a molecule into Hermite representation and compare two molecules that are in the Hermite representation by computing their shape-Tanimoto coefficient. We illustrate both functionalities below with two examples.

Hermite expansion

In this example we input a molecule, set the level of Hermite expansion using NPolyMax variable, determine the optimal parameters for \(\lambda_x, \lambda_y, \lambda_z\), and perform the computation of the Hermite expansion. At the end we output the resulting Hermite representation of the molecule as a grid, using CreateGrid method of the OEHermite class. The user can vary NPolyMax from low numbers, like 4 or 5, to higher numbers like 10-30 and observe convergence of the shape. Which value to use in practice depends on the size of the input molecule. Input molecule can be drug-like as well as a protein.

Listing 14: Expanding a molecular shape into Hermite polynomials

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2017 OpenEye Scientific Software, Inc.
##############################################################################
import os, sys

from openeye import oechem
from openeye import oeshape
from openeye import oegrid
from openeye.oechem import OEThrow


def main(argv=[__name__]):
  itf = oechem.OEInterface(InterfaceData, argv)

  NPolyMax = itf.GetInt("-NPolyMax")
  gridspacing   = itf.GetFloat("-gridspacing")

  ifname = itf.GetString("-inputfile")
  ofname = itf.GetString("-outputgrid")


  ifs = oechem.oemolistream()
  if (not ifs.open(ifname)):
    OEThrow.Fatal("Unable to open %s for reading" % ifname)

  
  if (not ofname.endswith(".grd")):
    OEThrow.Fatal("Output grid file extension hast to be '.grd' ")

  mol = oechem.OEMol()

  if (not oechem.OEReadMolecule(ifs, mol)):
    OEThrow.Fatal("Unable to read molecule in %s" % ifname)
  
  oechem.OESuppressHydrogens(mol)
  
  hermiteoptions = oeshape.OEHermiteOptions()
  hermiteoptions.SetNPolyMax(NPolyMax)
  hermiteoptions.SetUseOptimalLambdas(True)
 
  hermite = oeshape.OEHermite(hermiteoptions)

  if (not hermite.Setup(mol)):
    OEThrow.Fatal("Was not able to Setup the molecule for the OEHermite class.")

  hopts = hermite.GetOptions()
  print("Best lambdas found X=" + str(hopts.GetLambdaX()) + 
        "  Y=" + str(hopts.GetLambdaY())
       +"  Z=" + str(hopts.GetLambdaZ())
       )
    
  print("Hermite self-overlap=", hermite.GetSelfOverlap())

  coeffs = hermite.GetCoefficients()
  NPolyMaxstring = str(NPolyMax)
  print("Hermite coefficients f_{l,m,n} in the following order l = 0..." 
         +NPolyMaxstring+", m = 0..."+NPolyMaxstring+"-l, n = "
         +NPolyMaxstring+"-l-m :"
        )
  for x in coeffs:
    print(str(x) + " "),

  grid = oegrid.OEScalarGrid()

  hermite.CreateGrid(grid, gridspacing)

  if (not oegrid.OEWriteGrid(ofname, grid)):
    OEThrow.Fatal("Unable to write grid file")

  return 0


#############################################################################
InterfaceData = '''
!BRIEF [-inputfile] <InputFileName> [-outputgrid] <OutputGridFileName> [-NPolyMax] <NPolyMax> [-numgridpoints] <numgridpoints> [-gridspacing] <gridspacing>

!CATEGORY "input/output options :" 1

  !PARAMETER -inputfile 1
    !ALIAS -in
    !TYPE string
    !REQUIRED true
    !BRIEF Filename of the input molecule
    !KEYLESS 1
  !END

  !PARAMETER -outputgrid 2
  !ALIAS -out
    !TYPE string
    !REQUIRED true
    !BRIEF Filename of the output Hermite grid (needs to have .grd file extension)
    !KEYLESS 2
  !END

!END

!CATEGORY "Hermite options :" 2

  !PARAMETER -NPolyMax 3
    !ALIAS -NP
    !TYPE int
    !REQUIRED false
    !DEFAULT 5
    !LEGAL_RANGE 0 30
    !BRIEF Resolution parameter of Hermite Prep
    !KEYLESS 3
  !END


  !PARAMETER -gridspacing 4
    !ALIAS -gs
    !TYPE float
    !REQUIRED false
    !DEFAULT 1.0
    !LEGAL_RANGE 0.01 10.0
    !BRIEF Grid spacing of the output Hermite grid
    !KEYLESS 4
  !END

!END
'''

#############################################################################
if __name__ == "__main__":
    sys.exit(main(sys.argv))

Hermite Tanimoto comparison of shape

In the second example below we setup two molecules and compute the Tanimoto-shape similarity coefficient, using Hermite expansion. The Hermite prep parameter NPolyMax_MAX is used to vary NPolyMax parameter and compute the shape-Tanimoto each time for the first molecule in the reference input file versus all molecules in the fit input file. All results are attached as SD data in the output file. One can observe convergence of the Hermite Tanimoto coefficients as the level of expansion becomes more accurate.

Listing 15: Calculating shape Tanimoto using Hermite expansion

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2017 OpenEye Scientific Software, Inc.
##############################################################################
import os, sys


from openeye import oechem
from openeye import oeshape
from openeye import oeff
from openeye.oechem import OEThrow

def main(argv=[__name__]):
  itf = oechem.OEInterface(InterfaceData, argv)

  NPolyMax_MAX = itf.GetInt("-NPolyMax_MAX")

  ifrefname = itf.GetString("-inputreffile")
  iffitname = itf.GetString("-inputfitfile")
  ofname = itf.GetString("-outputfile")


  ifsref = oechem.oemolistream()
  ifsfit = oechem.oemolistream()
  ofs    = oechem.oemolostream()

  if (not ifsref.open(ifrefname)):
    OEThrow.Fatal("Unable to open %s for reading" % ifrefname)

  if (not ifsfit.open(iffitname)):
    OEThrow.Fatal("Unable to open %s for reading" % iffitname)


  refmol = oechem.OEMol()
  fitmol = oechem.OEMol()

  newtonopt = oeff.OENewtonOpt()
  newtonopt.SetTolerance(0.0001)
  newtonopt.SetIterLimit(200)

  if (not oechem.OEReadMolecule(ifsref, refmol)):
    OEThrow.Fatal("Unable to read molecule in %s" % ifrefname)
    
  oechem.OESuppressHydrogens(refmol)

  if (not ofs.open(ofname)):
    OEThrow.Fatal("Unable to open %s for writing" % ofname)
  oechem.OEWriteMolecule(ofs, refmol)
  
  idx = 0
  while (oechem.OEReadMolecule(ifsfit, fitmol)):
    oechem.OESuppressHydrogens(fitmol)
    print("Working on the fit molecule with idx = "+str(idx), "...")

    best = oeshape.OEBestOverlay()
    best.SetRefMol(refmol)
      
    print("ROCS Taniomoto list= { "),
    for res in best.Overlay(fitmol):
      for score in res.GetScores():
        print(score.tanimoto),
    print("}")

    for i in range(0, NPolyMax_MAX+1):
      print("Working on NPolyMax = " + str(i))

      hermiteoptionsref = oeshape.OEHermiteOptions()
      hermiteoptionsref.SetNPolyMax(i)
      hermiteoptionsref.SetUseOptimalLambdas(True)

      hermiteref = oeshape.OEHermite(hermiteoptionsref)
      if (not hermiteref.Setup(refmol)):
        OEThrow.Fatal("Was not able to Setup the reference molecule for the OEHermite class.")
      
      hermiteoptionsfit = oeshape.OEHermiteOptions()
      hermiteoptionsfit.SetNPolyMax(i)
      hermiteoptionsfit.SetUseOptimalLambdas(True)

      hermitefit = oeshape.OEHermite(hermiteoptionsfit)
      if (not hermitefit.Setup(fitmol)):
        OEThrow.Fatal("Was not able to Setup the fit molecule for the OEHermite class.")
      
      refvol = hermiteref.GetSelfOverlap() 
      fitvol = hermitefit.GetSelfOverlap()

      hrefopts = hermiteref.GetOptions()
      hfitopts = hermitefit.GetOptions()

      hermiteshapeoverlap = oeshape.OEHermiteShapeFunc(hrefopts, hfitopts)

      if (not hermiteshapeoverlap.SetupRef(refmol)):
        OEThrow.Fatal("Was not able to Setup the reference molecule for the OEHermiteShapeFunc class.")

      if (not hermiteshapeoverlap.Setup(fitmol)):
        OEThrow.Fatal("Was not able to Setup the fit molecule for the OEHermiteShapeFunc class.")

      x = oechem.OEDoubleArray(6)
      for j in range(0, 6):
        x[j] = 0.0
      xOpt = oechem.OEDoubleArray(6)

      bestoverlap = - newtonopt(hermiteshapeoverlap, x, xOpt)
      hermitetanimoto = bestoverlap/(refvol+fitvol-bestoverlap)

      print("Hermite Tanimoto = " + str(hermitetanimoto))
      oechem.OESetSDData(fitmol, "HermiteTanimoto_"+str(i), str(hermitetanimoto))

    idx += 1
    oechem.OEWriteMolecule(ofs, fitmol)

  if (idx==0):
    OEThrow.Warning("Fit molecules file does not contain valid molecules. The output file will be empty")


  ofs.flush()
  ofs.close()

#############################################################################
InterfaceData = '''
!BRIEF [-inputreffile] <InputReferenceFileName> [-inputfitfile] <InputFitFileName> [-outputfile] <OutputFileName> [-NPolyMax_MAX] <NPolyMax_MAX>

!CATEGORY "input/output options :" 1

  !PARAMETER -inputreffile 1
    !ALIAS -inref
    !TYPE string
    !REQUIRED true
    !BRIEF Input file name with reference molecule (will only read the first molecule).
    !KEYLESS 1
  !END
  !PARAMETER -inputfitfile 2
    !ALIAS -infit
    !TYPE string
    !REQUIRED true
    !BRIEF Input file name with fit molecules
    !KEYLESS 2
  !END
  !PARAMETER -outputfile 3
    !ALIAS -out
    !TYPE string
    !REQUIRED true
    !BRIEF Output file name
    !KEYLESS 3
  !END

!END

!CATEGORY "Hermite options :" 2

  !PARAMETER -NPolyMax_MAX 4
    !ALIAS -NP_MAX
    !TYPE int
    !REQUIRED false
    !DEFAULT 5
    !LEGAL_RANGE 0 30
    !BRIEF Maximum value of the parameter of the NPolyMax parameter of the Hermite prep (NPolyMax will be varied from 0 to this value)
    !KEYLESS 4
  !END

!END
'''

#############################################################################
if __name__ == "__main__":
    sys.exit(main(sys.argv))

Color Overlap

Overlap of color atoms

As a way of comparing chemical functionality, the Shape toolkit also provides a color overlap function, OEColorOverlap. The underlying algorithm is similar to pure shape, but instead of overlapping all heavy atoms, in this case we overlap similarly tagged color atoms. The decision of what becomes a color atom is done using SMARTS pattern matching. And the radius and weight of color interactions is completely user-definable.

Using built-in color force fields

Two color force fields, ImplicitMillsDean and ExplicitMillsDean, are built into the Shape toolkit. (The OEColorForceField class is defined in the next section.) Both of these force fields define similar 6 TYPE color force-fields. The types are hydrogen-bond donors, hydrogen-bond acceptors, hydrophobes, anions, cations, and rings. The ImplicitMillsDean force field is recommended.

ImplicitMillsDean includes a simple pKa model that assumes pH=7. It defines cations, anions, donors and acceptors in such a way that they will be assigned the appropriate value independent of the protonation state in the reference or fit molecule. For example, if a molecule contains a carboxylate, ImplicitMillsDean will consider it an anionic center independent of whether it is protonated or deprotonated. This is convenient for searching databases which have not had careful curation of their protonation states. The ExplicitMillsDean file has a similar overall interaction model, however, it does not include a pKa model. It interprets the protonation and charge state of each molecule exactly. Thus, if a sulfate is protonated and neutral, it will not be considered an anion.

The hydrogen-bond models in both ImplicitMillsDean and ExplicitMillsDean are extensions of the original model presented by Mills and Dean [MillsDean-1996]. They both have donors and acceptors segregated into strong, moderate and weak categories.

Listing 3: Calculating color score

#!/usr/bin/env python
####################################################################
# Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
####################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <reffile> <fitfile>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])

    refmol = OEGraphMol()
    OEReadMolecule(reffs, refmol)

    ov = OEColorOverlap()
    ov.SetColorForceField(OEColorFFType_ImplicitMillsDean)
    ov.SetRefMol(refmol)

    res = OEColorResults()
    fitmol = OEGraphMol()
    while OEReadMolecule(fitfs, fitmol):
        ov.ColorScore(fitmol, res)
        print("title: %s  color score = %.2f" %
              (fitmol.GetTitle(), res.colorscore))

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

This next example uses the ImplicitMillsDean force field to rescore a set of ROCS hits and add the color scores to SD tags.

Listing 4: Using color to add scores to pre-aligned molecules.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2015 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <reffile> <overlayfile>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])

    refmol = OEGraphMol()
    OEReadMolecule(reffs, refmol)

    ov = OEColorOverlap()
    ov.SetColorForceField(OEColorFFType_ImplicitMillsDean)
    ov.SetRefMol(refmol)

    print("Ref. Title: %s Self color: %.3f" %
          (refmol.GetTitle(), ov.GetSelfColor()))

    res = OEColorResults()
    for fitmol in fitfs.GetOEGraphMols():
        ov.ColorScore(fitmol, res)
        print("Fit Title: %s  Color Tanimoto: %.2f" %
              (fitmol.GetTitle(), res.GetTanimoto()))

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

OEColorForceField

The color force field (CFF) can be used to measure chemical complementarity, and to refine shape-based superpositions based on chemical similarity. The CFF is composed of SMARTS rules that determine chemical centers plus rules to determine how such centers interact. In addition to the built-in color force fields, there are several methods for user-defined force fields.

User-defined interactions

As a step toward writing a complete color force field, it is possible to combine built-in rules for color atom assignment with user defined interactions. A new OEColorForceField object can be created using one of the built-in types, then the interactions can be cleared using OEColorForceField.ClearInteractions and subsequent user interactions added with OEColorForceField.AddInteraction.

For example, to use the ImplicitMillsDean atom typing rules, but to only consider donor-donor and acceptor-acceptor interactions, one can use the following:

Listing 5: Using ImplicitMillsDean with user interactions.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <reffile> <overlayfile>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])

    refmol = OEGraphMol()
    OEReadMolecule(reffs, refmol)

    cff = OEColorForceField()
    cff.Init(OEColorFFType_ImplicitMillsDean)
    cff.ClearInteractions()
    donorType = cff.GetType("donor")
    accepType = cff.GetType("acceptor")
    cff.AddInteraction(donorType, donorType, "gaussian", -1.0, 1.0)
    cff.AddInteraction(accepType, accepType, "gaussian", -1.0, 1.0)

    ov = OEColorOverlap()
    ov.SetColorForceField(cff)
    ov.SetRefMol(refmol)

    print("Ref. Title: %s Self color: %.3f" %
          (refmol.GetTitle(), ov.GetSelfColor()))

    res = OEColorResults()
    for fitmol in fitfs.GetOEGraphMols():
        ov.ColorScore(fitmol, res)
        print("Fit Title: %s  Color Tanimoto: %.2f" %
              (fitmol.GetTitle(), res.GetTanimoto()))

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Writing color force field files (CFF)

As an alternative to the built-in force fields, the user can define a new color force field using the format described in this section.

The following is a simplified example of a color force field specification.

DEFINE hetero [#7,#8,#15,#16]
DEFINE notNearHetero [!#1;!$($hetero);!$(*[$hetero])]
#
#
TYPE donor
TYPE acceptor
TYPE rings
TYPE positive
TYPE negative
TYPE structural
#
#
PATTERN donor [$hetero;H]
PATTERN acceptor [#8&!$(*~N~[OD1]),#7&H0;!$([D4]);!$([D3]-*=,:[$hetero])]
PATTERN rings [R]~1~[R]~[R]~[R]1
PATTERN rings [R]~1~[R]~[R]~[R]~[R]1
PATTERN rings [R]~1~[R]~[R]~[R]~[R]~[R]1
PATTERN rings [R]~1~[R]~[R]~[R]~[R]~[R]~[R]1
PATTERN positive [+,$([N;!$(*-*=O)])]
PATTERN negative [-]
PATTERN negative [OD1+0]-[!#7D3]~[OD1+0]
PATTERN negative [OD1+0]-[!#7D4](~[OD1+0])~[OD1+0]
PATTERN structural [$notNearHetero]
#
#
INTERACTION donor donor attractive gaussian weight=1.0 radius=1.0
INTERACTION acceptor acceptor attractive gaussian weight=1.0 radius=1.0
INTERACTION rings rings attractive gaussian weight=1.0 radius=1.0
INTERACTION positive positive attractive gaussian weight=1.0 radius=1.0
INTERACTION negative negative attractive gaussian weight=1.0 radius=1.0
INTERACTION structural structural attractive gaussian weight=1.0 radius=1.0

There are four basic keywords in a cff file: DEFINE, TYPE, PATTERN, and INTERACTION. The TYPE field can be any user-defined term such as “donor”, “acceptor”, “lipophilic anion” etc. The PATTERN keyword is used to associate SMARTS patterns with these types. There is no restriction on the number of patterns that can be associated with a user defined type. The position in Cartesian space of the PATTERN is taken as the average of the coordinates of the atoms that match the SMARTS pattern. If the desired location of the PATTERN is a single atom of a larger SMARTS pattern, recursive SMARTS (written as [$(SMARTS)]) can be used. Only the first atom in a recursive SMARTS pattern ‘matches’ the molecule, and the rest of the SMARTS pattern defines an environment. By writing a SMARTS pattern in recursive notation the location of the PATTERN will be taken as the atomic position of the first matching atom in the pattern. In order to simplify both reading and writing SMARTS, intermediate SMARTS can be associated with words using the DEFINE keyword. Once defined, these words can then be used as atom primitives in subsequent SMARTS patterns with the $ prefix (see “DEFINE hetero” and “PATTERN donor” above).

Interactions between types are defined with the INTERACTION keyword. Two user-defined TYPE values must be listed, and whether their interaction is attractive or repulsive. By default the interaction decays as a Gaussian of weight 1.0 Å and radius (decay to 1/e) 1.0 Å. The weight and radius can be modified by keywords WEIGHT and RADIUS. At present, the only alternative to a Gaussian decay is invoked by the DISCRETE keyword. A discrete interaction contributes all of WEIGHT if the inter-type distance is less than RADIUS, or zero. Since it is not differentiable it makes no contribution to optimization (i.e. because the gradient of a DISCRETE function is 0 or infinite).

Best Overlay

OEBestOverlay is the object used to maximize shape (and color) overlap between two objects. OEBestOverlay uses the same overlap function and color function as described earlier, but it optimizes the overlap between the reference and fit object. For speed, the default overlap settings for OEBestOverlay are to use the OEOverlapMethod_Grid overlap method (which implies the OEOverlapRadii_Carbon radius approximation) and hydrogens are ignored.

Starting positions for optimization

Since OEBestOverlay performs an optimization, the starting point of the optimization is important. By default, OEBestOverlay uses an inertial frame alignment method to decide on starting positions (OEBOOrientation_Inertial). The reference structure is aligned by its principal moments of inertia, then the fit object is aligned in 4 positions with the primary and secondary moments of inertia in both possible directions.

Thus, for any given optimization, there are 4 results returned, usually only one of which is useful. In order to deal with structures with symmetrical moments of inertia, OEBestOverlay may perform additional starting points. For a reference or fit where the 2 major moments of inertia are equal (to a user-defined percent, nominally 15%), 4 extra starting positions are generated. In the rare case of a molecule with all 3 moments of inertia essentially equal, 20 random starting translations and rotations are chosen as starting positions.

For virtual screening uses, where the reference and fit molecule are similar in size, OEBOOrientation_Inertial provides an excellent choice for starting positions that balancing quality results with speed. There are times when there is a large difference in size and a more elaborate search is needed. For these, there are a couple of built-in searches as well a user-defined search. To increase searching, instead of doing the inertial frame rotations with to 2 molecule centers-of-mass (COM) aligned, we can do a set of inertial rotations at many more locations. Using OEBOOrientation_InertialAtHeavyAtoms, OEBestOverlay will translate the COM of the fit molecule to each heavy atom and the COM of the reference molecule. At each of these translations, it will perform the 4 basic inertial transforms. Note that this means that instead of 4 (or 8) starting positions, there will be 4 x number of reference molecule heavy atoms starts, resulting in 20-30 more starting positions compared to the default. Obviously this will slow the overall calculation so this is not recommended for high-volume virtual screening, but is very useful for low-volume fragment searching. Using OEBOOrientation_Subrocs will automatically do OEBOOrientation_InertialAtHeavyAtoms on each heavy atom and the COM of the larger molecule, regardless of which molecule is set as the reference.

A slightly less aggressive set of starting positions (OEBOOrientation_InertialAtColorAtoms) translates the COM of the fit molecule to each color atom position and does the 4 inertial rotations. For an average reference molecule with ~10 color atoms, this will be faster than using all heavy atom positions, but not as elaborate of a search.

Finally, they user can set OEBOOrientation_UserInertialStarts and then provide a set of coordinates to OEBestOverlay.SetUserStarts. OEBestOverlay will then use each of these absolute coordinates as a place to translate the COM of the fit molecule and do the inertial transforms.

Alternatively, the user may desire to start the optimization from a single, pre-aligned position. For this case, the user can specify OEBOOrientation_AsIs} as a starting position.

And finally, there is a method to generate N random starting positions where N is user-defined as well as the maximum translation allowed between random starts.

In most cases, the default OEBOOrientation_Inertial frame starting positions are completely sufficient to find optimal overlap.

Note that the fit object is not moved during the optimization. Part of the results returned from the calculation are the transforms necessary to move the fit object into the final orientation. This allows the user to skip this step if only scores are desired. It also allows application of the same transform to other objects.

Ways to deal with OEBestOverlay results

Unlike OEOverlap and OEColorOverlap, OEBestOverlay uses multi-conformer molecules as the reference and fit (mostly for efficiency). As such, a single OEBestOverlay calculation can return numerous results. Basically, each calculation will return N results where N is the number of ref conformers times the number of fit conformers times the number of starting positions for each pair. So comparing 2 molecules with 10 conformers each could return 400 or more results.

There are two helper classes designed solely to contain these results and to make it easy to extract all or just the desired subset. OEBestOverlayResults holds the results of a single pair of conformers. It contains a set of OEBestOverlayScore objects, one for each starting position.

The first example, uses 2 iterators to show all the results.

Listing 6: Getting all the scores from OEBestOverlay.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
##############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <reffile> <fitfile>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])

    refmol = OEMol()
    OEReadMolecule(reffs, refmol)

    best = OEBestOverlay()
    best.SetRefMol(refmol)

    print("Ref. Title:", refmol.GetTitle(), "Num Confs:", refmol.NumConfs())

    for fitmol in fitfs.GetOEMols():
        print("Fit Title:", fitmol.GetTitle(), "Num Confs:", fitmol.NumConfs())

        resCount = 0
        for res in best.Overlay(fitmol):
            for score in res.GetScores():
                print("FitConfIdx: %-4d RefConfIdx: %-4d Tanimoto: %.2f" %
                      (score.fitconfidx, score.refconfidx, score.tanimoto))
                resCount += 1

        print(resCount, "results returned")

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

But in most cases, one does not want or need all the results. Most times, the single best overlap for each conformer-conformer pair is desired. The next example shows the OESortOverlayScores function used to turn the double iterator as shown in the example above into a single iterator of OEBestOverlayScore. Note that the third argument is a functor used to sort the list, such that in this next example, we get one OEBestOverlayScore for each pair of conformers and they are returned in Tanimoto order.

Listing 7: Getting all the best scores from OEBestOverlay.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
##############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <reffile> <fitfile>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])

    refmol = OEMol()
    OEReadMolecule(reffs, refmol)

    best = OEBestOverlay()
    best.SetRefMol(refmol)

    print("Ref. Title:", refmol.GetTitle(), "Num Confs:", refmol.NumConfs())

    for fitmol in fitfs.GetOEMols():
        print("Fit Title:", fitmol.GetTitle(), "Num Confs:", fitmol.NumConfs())

        resCount = 0
        scoreiter = OEBestOverlayScoreIter()
        OESortOverlayScores(scoreiter, best.Overlay(fitmol),
                            OEHighestTanimoto())
        for score in scoreiter:
            print("FitConfIdx: %-4d RefConfIdx: %-4d Tanimoto: %.2f" %
                  (score.fitconfidx, score.refconfidx, score.tanimoto))
            resCount += 1

        print(resCount, "results returned")

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

The next example is a slight variation on the previous. It adds a user parameter for the number of results to keep. Since the results are sorted by Tanimoto, keeping the first 5, keeps the best 5.

Listing 8: Keeping a few top scores from OEBestOverlay.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
##############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 4:
        OEThrow.Usage("%s <reffile> <fitfile> <keepsize>" % argv[0])

    reffs = oemolistream(sys.argv[1])
    fitfs = oemolistream(sys.argv[2])
    keepsize = int(sys.argv[3])

    refmol = OEMol()
    OEReadMolecule(reffs, refmol)

    best = OEBestOverlay()
    best.SetRefMol(refmol)

    print("Ref. Title:", refmol.GetTitle(), "Num Confs:", refmol.NumConfs())

    for fitmol in fitfs.GetOEMols():
        print("Fit Title:", fitmol.GetTitle(), "Num Confs:", fitmol.NumConfs())

        resCount = 0
        scoreiter = OEBestOverlayScoreIter()
        OESortOverlayScores(scoreiter, best.Overlay(fitmol),
                            OEHighestTanimoto())
        for score in scoreiter:
            print("FitConfIdx: %-4d RefConfIdx: %-4d Tanimoto: %.2f" %
                  (score.fitconfidx, score.refconfidx, score.tanimoto))
            resCount += 1
            if resCount == keepsize:
                break

        print(resCount, "results returned")

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Saving aligned structures

OEBestOverlay does not actually move the fit molecule. Part of OEBestOverlayScore is the rotation matrix and translation matrix necessary to move the fit molecule into the final overlap position. (N.B. the rotation matrix is left-multiplied so can be regarded as left-handed.) It is up to the user to apply these transformations. The OpenEye standard is rotation then translation, but as a convenience, OEBestOverlayScore has OEBestOverlayScore.Transform method that will apply the transforms in the proper order. This next example expands from the previous. Now, not only can we choose the number of overlays to keep, we are also going to align each fit conformer to the ref conformer it was overlaid on, and then write the pair to an output file. If SD or OEB is used as the output file type, then scores will also be stored in SD tags.

Listing 9: Writing aligned structures from OEBestOverlay.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
##############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 5:
        OEThrow.Usage("%s <reffile> <fitfile> <out.sdf> <keepsize>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])
    outfs = oemolostream(argv[3])
    keepsize = int(argv[4])

    refmol = OEMol()
    OEReadMolecule(reffs, refmol)

    best = OEBestOverlay()
    best.SetRefMol(refmol)

    print("Ref. Title:", refmol.GetTitle(), "Num Confs:", refmol.NumConfs())

    for fitmol in fitfs.GetOEMols():
        print("Fit Title:", fitmol.GetTitle(), "Num Confs:", fitmol.NumConfs())

        resCount = 0
        scoreiter = OEBestOverlayScoreIter()
        OESortOverlayScores(scoreiter, best.Overlay(fitmol),
                            OEHighestTanimoto())
        for score in scoreiter:
            outmol = OEGraphMol(fitmol.GetConf(OEHasConfIdx(score.fitconfidx)))
            score.Transform(outmol)

            OESetSDData(outmol, "RefConfIdx", "%-d" % score.refconfidx)
            OESetSDData(outmol, "ShapeTanimoto", "%-.3f" % score.tanimoto)

            OEWriteMolecule(
                outfs, refmol.GetConf(OEHasConfIdx(score.refconfidx)))
            OEWriteMolecule(outfs, outmol)

            resCount += 1
            if resCount == keepsize:
                break

        print(resCount, "results returned")

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Using color along with shape

In order to maximize chemical similarity along with shape, an OEColorForceField can be added to OEBestOverlay. Once added, color scores will be calculated along with shape overlap scores. By default, the color force field only appears as a post-optimization scoring function. That is, after the shape overlap is maximized, color scores are calculated.

Color force can, however, also be used as part of the optimization. Color optimization is turned on by the OEBestOverlay.SetColorOptimize member function.

Listing 10: Using color with OEBestOverlay.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 5:
        OEThrow.Usage("%s <reffile> <fitfile> <out.sdf> <keepsize>" % argv[0])

    reffs = oemolistream(argv[1])
    fitfs = oemolistream(argv[2])
    outfs = oemolostream(argv[3])
    keepsize = int(argv[4])

    refmol = OEMol()
    OEReadMolecule(reffs, refmol)

    best = OEBestOverlay()
    best.SetRefMol(refmol)
    best.SetColorForceField(OEColorFFType_ImplicitMillsDean)
    best.SetColorOptimize(True)

    print("Ref. Title: %s Num Confs: %d Self color: %-.3f" %
          (refmol.GetTitle(), refmol.NumConfs(), best.GetRefSelfColor()))

    for fitmol in fitfs.GetOEMols():
        print("Fit Title:", fitmol.GetTitle(), "Num Confs:", fitmol.NumConfs())

        resCount = 0
        scoreiter = OEBestOverlayScoreIter()
        OESortOverlayScores(scoreiter, best.Overlay(fitmol),
                            OEHighestTanimotoCombo())
        for score in scoreiter:
            outmol = OEGraphMol(fitmol.GetConf(OEHasConfIdx(score.fitconfidx)))
            score.Transform(outmol)

            OESetSDData(outmol, "RefConfIdx", "%-d" % score.refconfidx)
            OESetSDData(outmol, "TanimotoCombo",
                        "%-.3f" % score.GetTanimotoCombo())
            OESetSDData(outmol, "ShapeTanimoto",
                        "%-.3f" % score.GetShapeTanimoto())
            OESetSDData(outmol, "ColorTanimoto",
                        "%-.3f" % score.GetColorTanimoto())

            OEWriteMolecule(outfs,
                            refmol.GetConf(OEHasConfIdx(score.refconfidx)))
            OEWriteMolecule(outfs, outmol)

            resCount += 1
            if resCount == keepsize:
                break

        print(resCount, "results returned")


if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Using a shape query

An OEShapeQueryPublic can be used as the reference for OEBestOverlay. Shape queries written from OpenEye’s vROCS application can be read in using OEReadShapeQuery.

Listing 11: Using a shape query

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 5:
        OEThrow.Usage("%s <refquery> <fitfile> <out.sdf> <keepsize>" % argv[0])

    refquery = OEShapeQueryPublic()
    OEReadShapeQuery(argv[1], refquery)
    fitfs = oemolistream(argv[2])
    outfs = oemolostream(argv[3])
    keepsize = int(argv[4])

    best = OEBestOverlay()
    best.SetRef(refquery)
    best.SetColorForceField(OEColorFFType_ImplicitMillsDean)
    best.SetColorOptimize(True)

    print("Ref. Title: %s Self color: %-.3f" %
          (refquery.GetTitle(), best.GetRefSelfColor()))

    for fitmol in fitfs.GetOEMols():
        print("Fit Title:", fitmol.GetTitle(), "Num Confs:", fitmol.NumConfs())

        resCount = 0
        scoreiter = OEBestOverlayScoreIter()
        OESortOverlayScores(scoreiter, best.Overlay(fitmol),
                            OEHighestTanimotoCombo())
        for score in scoreiter:
            outmol = OEGraphMol(fitmol.GetConf(OEHasConfIdx(score.fitconfidx)))
            score.Transform(outmol)

            OESetSDData(outmol, "RefConfIdx", "%-d" % score.refconfidx)
            OESetSDData(outmol, "TanimotoCombo",
                        "%-.3f" % score.GetTanimotoCombo())
            OESetSDData(outmol, "ShapeTanimoto",
                        "%-.3f" % score.GetShapeTanimoto())
            OESetSDData(outmol, "ColorTanimoto",
                        "%-.3f" % score.GetColorTanimoto())

            OEWriteMolecule(outfs, outmol)

            resCount += 1
            if resCount == keepsize:
                break

        print(resCount, "results returned")

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Calculating NxN scores

There are times when you want to have all the pair-wise scores among a set of molecules. Maintaining this 2D matrix of scores is more complicated than the single set of scores from the ref vs. set of fit molecules examples shown above.

This example takes a set of molecules and performs all the pair-wise shape optimizations with OEBestOverlay. It outputs a spreadsheet of the scores.

Listing 12: Calculating NxN scores.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from __future__ import print_function

from openeye.oechem import *
from openeye.oeshape import *


def oneConf(conf, filename, csvfile):
    csvfile.write("%s_%d" % (conf.GetTitle(), conf.GetIdx()))
    refmol = OEMol(conf)
    best = OEBestOverlay()
    best.SetRefMol(refmol)

    bfs = oemolistream(filename)
    fitmol = OEMol()
    while OEReadMolecule(bfs, fitmol):
        scoreiter = OEBestOverlayScoreIter()
        OESortOverlayScores(scoreiter, best.Overlay(fitmol),
                            OEHighestTanimoto(), 1, True)
        for score in scoreiter:
            csvfile.write(",%.2f" % score.tanimoto)
    csvfile.write("\n")


def genHeader(filename, csvfile):
    csvfile.write("name")
    ifs = oemolistream(filename)
    mol = OEMol()
    while OEReadMolecule(ifs, mol):
        for conf in mol.GetConfs():
            csvfile.write(",%s_%d" % (conf.GetTitle(), conf.GetIdx()))
    csvfile.write("\n")


def main(argv=[__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <infile> <csvfile>" % argv[0])

    csvfile = open(argv[2], "w")
    genHeader(sys.argv[1], csvfile)

    afs = oemolistream(argv[1])
    for molA in afs.GetOEMols():
        print(molA.GetTitle())
        for conf in molA.GetConfs():
            oneConf(conf, sys.argv[1], csvfile)


if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Calculating shape characteristics

While most functionality in the Shape Toolkit involves comparison of pairs of molecules, there are a few fundamental properties that can be calculated for a single molecule. All of these calculations are done using the same basic Gaussian description of a molecule as described above.

The simplest property to calculate is volume, using OECalcVolume.

In addition to simple volume calculations, the steric multipoles of a molecule can also be calculated. See section OECalcShapeMultipoles.

This next example demonstrates the calculation of volume and shape multipoles.

Listing 13: Calculating shape properties.

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2005-2014 OpenEye Scientific Software, Inc.
#############################################################################
from openeye.oechem import *
from openeye.oeshape import *


def main(argv=[__name__]):
    if len(argv) != 2:
        OEThrow.Usage("%s <infile>" % argv[0])

    ifs = oemolistream(argv[1])

    for mol in ifs.GetOEGraphMols():
        OEThrow.Info("              Title: %s" % mol.GetTitle())
        OEThrow.Info("             Volume: %8.2f" % OECalcVolume(mol))
        OEThrow.Info("Volume: (without H): %8.2f\n" % OECalcVolume(mol, False))

        smult = OECalcShapeMultipoles(mol)

        OEThrow.Info("  Steric multipoles:")
        OEThrow.Info("           monopole: %8.2f" % smult[0])
        OEThrow.Info("        quadrupoles: %8.2f %8.2f %8.2f" %
                     (smult[1], smult[2], smult[3]))
        OEThrow.Info("          octopoles:")
        OEThrow.Info("                xxx: %8.2f  yyy: %8.2f  zzz: %8.2f" %
                     (smult[4], smult[5], smult[6]))
        OEThrow.Info("                xxy: %8.2f  xxz: %8.2f  yyx: %8.2f" %
                     (smult[7], smult[8], smult[9]))
        OEThrow.Info("                yyz: %8.2f  zzx: %8.2f  zzy: %8.2f" %
                     (smult[10], smult[11], smult[12]))
        OEThrow.Info("                xyz: %8.2f\n" % smult[13])

if __name__ == "__main__":
    import sys
    sys.exit(main(sys.argv))

Calculating excluded volume

This code demonstrates how to use the shape toolkit to do an overlay to a crystallographic ligand, then calculate the overlap of the database molecule to the cognate protein. The output includes a new score Rescore which is the ShapeTanimotoCombo penalised by the fraction overlap.

Command Line Interface

A description of the command line interface can be obtained by executing the program with the –help argument.

prompt> python excludevolume.py --help

will generate the following output:

-q : Query file name
-e : Protein to use as exclusion volume
-d : Database file name
-o : Output file name

Code

Download code

excludevolume.py