Calculating simple overlap

The simplest object in the Shape toolkit, OEOverlap, is used to calculate simple, static, overlap between two objects (molecules or grids). Note that static means that the two input species (ref and fit) are not moved at all. This object simply calculate the overlap given the input positions. Performing calculations that actually optimize the alignment/overlap are done with the OEBestOverlay object (see Theory chapter).

This first example reads in a reference molecule and a few fit molecules and prints out the overlap calculated. Note that this example uses all default settings for OEOverlap (discussed in the following sections).

Listing 1: Simple overlap using Exact Overlap

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class SimpleOverlap {

    public static void main(String[] args) {
        oemolistream reffs = new oemolistream("a.mol2");
        oemolistream fitfs = new oemolistream("rocs_hits_1.sdf");

        OEGraphMol refmol = new OEGraphMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEOverlap ov = new OEOverlap();
        ov.SetRefMol(refmol);

        OEOverlapResults res = new OEOverlapResults();
        OEGraphMol fitmol = new OEGraphMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print(fitmol.GetTitle());
            ov.Overlap(fitmol, res);
            System.out.println(" exact tanimoto = "+res.getTanimoto());
        }
        res.delete();
        ov.delete();
        fitfs.close();
    }
}

Overlap Methods

This nature of the algorithm used to calculate overlap is determined by the OEOverlap.SetMethod member of OEOverlap. At present there are four types of algorithms implemented: Exact, Analytic, Analytic2, and Grid. These are defined in the OEOverlapMethod namespace.

The Exact option is as detailed above: all pairs are considered and the Gaussian overlaps are calculated exactly. The two Analytic options use an approximation to the overlap between two Gaussians that is accurate to about one part in a thousand. In addition, Analytic2 uses a proximity grid to only calculate those atoms pairs that are within a certain threshold distance (by default 4.5 Å). This approximation adds another one part in a thousand average error but is faster for larger molecules. The final option, Grid, uses a grid representation of the volume of the target molecule. It requires significant set-up time relative to the cost of a single overlap calculation (~0.01s compared to ~0.0001s) but is significantly faster than other methods for the evaluation of each overlap once set. Grid suffers a few caveats and drawbacks. First is that, currently, although reference radii are all treated as given, fit atoms are all treated as if they have one radius (that assigned to carbon, and settable via OEOverlap.SetCarbonRadius). The second is that the approximation is slightly worse, typically a few parts in a thousand, at typical grid resolutions. Both Analytic2 and Grid improve performance when the fit molecule is large (>20 atoms) because, if there are n atoms in the fit and m in the reference, the work per atom in the fit is proportional to a constant not n.

By default, OEOverlap performs an Exact overlap calculation. The OEBestOverlay object (used in ROCS) uses Grid for speed.

Radii approximations

Since we are considering molecular volume overlap as a measure, the radii used for each atom is important. There are essentially two settings. A radii approximation of OEOverlapRadii.All means each atom will be treated with the radius as passed in. Alternatively, one can treat all heavy atoms as similar and apply the OEOverlapRadii.Carbon radius approximation. With this, all heavy atoms will be assigned the same radius, regardless of the value attached to each atom. As noted above, if the OEOverlapMethod.Grid method is chosen, the OEOverlapRadii.Carbon radius approximation is also turned on.

Using hydrogens

Shape is a heavy atom property and most OpenEye uses (as well as the defaults for OEOverlap and OEBestOverlay) ignore hydrogens. Hydrogens can be present or not, if OEOverlap is told to not use them, they will not affect the overlap scores.

Adding scores to molecules using SD data

This next example program switches on the OEOverlapMethod.Analytic overlap method and also uses a little extra OEChem to attach the overlap scores to each molecule as SD data. This can be used to rescore a set of ROCS hits or to measure the overlap of ROCS alignments against an exclusion region in the binding site.

Listing 2: Rescoring pre-aligned structures

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class Rescore {

    public static void main(String[] args) {
        if (args.length!=3) {
            System.out.println("Rescore <reffile> <rocs_hits> <output.sdf>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);
        oemolostream outfs = new oemolostream(args[2]);

        OEGraphMol refmol = new OEGraphMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEOverlap ov = new OEOverlap();
        ov.SetMethod(OEOverlapMethod.Analytic);
        ov.SetRefMol(refmol);

        OEOverlapResults res = new OEOverlapResults();
        OEGraphMol fitmol = new OEGraphMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            ov.Overlap(fitmol, res);
            oechem.OESetSDData(fitmol, "AnalyticTanimoto", 
                    String.valueOf(res.getTanimoto()));
            oechem.OEWriteMolecule(outfs, fitmol);
        }
        fitfs.close();
        outfs.close();
        ov.delete();
        res.delete();
    }
}

Shape from Hermite expansion

Using Hermite expansion we can obtain two functionalities: expand a molecule into Hermite representation and compare two molecules that are in the Hermite representation by computing their shape-Tanimoto coefficient. We illustrate both functionalities below with two examples.

Hermite expansion

In this example we input a molecule, set the level of Hermite expansion using NPolyMax variable, determine the optimal parameters for \(\lambda_x, \lambda_y, \lambda_z\), and perform the computation of the Hermite expansion. At the end we output the resulting Hermite representation of the molecule as a grid, using CreateGrid method of the OEHermite class. The user can vary NPolyMax from low numbers, like 4 or 5, to higher numbers like 10-30 and observe convergence of the shape. Which value to use in practice depends on the size of the input molecule. Input molecule can be drug-like as well as a protein.

Hermite Tanimoto comparison of shape

In the second example below we setup two molecules and compute the Tanimoto-shape similarity coefficient, using Hermite expansion. The Hermite prep parameter NPolyMax_MAX is used to vary NPolyMax parameter and compute the shape-Tanimoto each time for the first molecule in the reference input file versus all molecules in the fit input file. All results are attached as SD data in the output file. One can observe convergence of the Hermite Tanimoto coefficients as the level of expansion becomes more accurate.

Color Overlap

Overlap of color atoms

As a way of comparing chemical functionality, the Shape toolkit also provides a color overlap function, OEColorOverlap. The underlying algorithm is similar to pure shape, but instead of overlapping all heavy atoms, in this case we overlap similarly tagged color atoms. The decision of what becomes a color atom is done using SMARTS pattern matching. And the radius and weight of color interactions is completely user-definable.

Using built-in color force fields

Two color force fields, ImplicitMillsDean and ExplicitMillsDean, are built into the Shape toolkit. (The OEColorForceField class is defined in the next section.) Both of these force fields define similar 6 TYPE color force-fields. The types are hydrogen-bond donors, hydrogen-bond acceptors, hydrophobes, anions, cations, and rings. The ImplicitMillsDean force field is recommended.

ImplicitMillsDean includes a simple pKa model that assumes pH=7. It defines cations, anions, donors and acceptors in such a way that they will be assigned the appropriate value independent of the protonation state in the reference or fit molecule. For example, if a molecule contains a carboxylate, ImplicitMillsDean will consider it an anionic center independent of whether it is protonated or deprotonated. This is convenient for searching databases which have not had careful curation of their protonation states. The ExplicitMillsDean file has a similar overall interaction model, however, it does not include a pKa model. It interprets the protonation and charge state of each molecule exactly. Thus, if a sulfate is protonated and neutral, it will not be considered an anion.

The hydrogen-bond models in both ImplicitMillsDean and ExplicitMillsDean are extensions of the original model presented by Mills and Dean [MillsDean-1996]. They both have donors and acceptors segregated into strong, moderate and weak categories.

Listing 3: Calculating color score

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class ColorOverlap {

    public static void main(String[] args) {
        oemolistream reffs = new oemolistream("a.mol2");
        oemolistream fitfs = new oemolistream("rocs_hits_1.sdf");

        OEGraphMol refmol = new OEGraphMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEColorOverlap ov = new OEColorOverlap();
        ov.SetColorForceField(OEColorFFType.ImplicitMillsDean);
        ov.SetRefMol(refmol);

        OEColorResults res = new OEColorResults();
        OEGraphMol fitmol = new OEGraphMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print(fitmol.GetTitle());
            ov.ColorScore(fitmol, res);
            System.out.println(" color score = "+res.getColorscore());
        }
        fitfs.close();
        res.delete();
        ov.delete();
    }
}

This next example uses the ImplicitMillsDean force field to rescore a set of ROCS hits and add the color scores to SD tags.

Listing 4: Using color to add scores to pre-aligned molecules.

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class ColorScore {

    public static void main(String[] args) {
        if (args.length!=3) {
            System.out.println("ColorScore <reffile> <rocs_hits> <output.sdf>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);
        oemolostream outfs = new oemolostream(args[2]);

        OEGraphMol refmol = new OEGraphMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEColorOverlap ov = new OEColorOverlap();
        ov.SetColorForceField(OEColorFFType.ImplicitMillsDean);
        ov.SetRefMol(refmol);

        System.out.print("Ref. Title: "+refmol.GetTitle()+" ");
        System.out.println("Self color: "+ov.GetSelfColor());

        OEColorResults res = new OEColorResults();
        OEGraphMol fitmol = new OEGraphMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            ov.ColorScore(fitmol, res);
            oechem.OESetSDData(fitmol, "MyColorScore", 
                    String.valueOf(res.getColorscore()));
            oechem.OESetSDData(fitmol, "MyScaledColor", 
                    String.valueOf(res.getScaledcolor()));
            oechem.OEWriteMolecule(outfs, fitmol);

            System.out.print("Fit. Title: "+fitmol.GetTitle()+" ");
            System.out.print("Color Score: "+res.getColorscore()+" ");
            System.out.println("Scaled color: "+res.getScaledcolor());
        } 
        ov.delete();
        res.delete();
        fitfs.close();
        outfs.close();
    }
}

OEColorForceField

The color force field (CFF) can be used to measure chemical complementarity, and to refine shape-based superpositions based on chemical similarity. The CFF is composed of SMARTS rules that determine chemical centers plus rules to determine how such centers interact. In addition to the built-in color force fields, there are several methods for user-defined force fields.

User-defined interactions

As a step toward writing a complete color force field, it is possible to combine built-in rules for color atom assignment with user defined interactions. A new OEColorForceField object can be created using one of the built-in types, then the interactions can be cleared using OEColorForceField.ClearInteractions and subsequent user interactions added with OEColorForceField.AddInteraction.

For example, to use the ImplicitMillsDean atom typing rules, but to only consider donor-donor and acceptor-acceptor interactions, one can use the following:

Listing 5: Using ImplicitMillsDean with user interactions.

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class UserColor {

    public static void main(String[] args) {
        if (args.length!=3) {
            System.out.println("UserColor <reffile> <rocs_hits> <output.sdf>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);
        oemolostream outfs = new oemolostream(args[2]);

        OEGraphMol refmol = new OEGraphMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEColorForceField cff = new OEColorForceField();
        cff.Init(OEColorFFType.ImplicitMillsDean);
        cff.ClearInteractions();
        int donorType = cff.GetType("donor");
        int accepType = cff.GetType("acceptor");
        cff.AddInteraction(donorType, donorType, "gaussian", -1.0f, 1.0f);
        cff.AddInteraction(accepType, accepType, "gaussian", -1.0f, 1.0f);

        OEColorOverlap ov = new OEColorOverlap();
        ov.SetColorForceField(cff);
        ov.SetRefMol(refmol);

        System.out.print("Ref. Title: "+refmol.GetTitle()+" ");
        System.out.println("Self color: "+ov.GetSelfColor());

        OEColorResults res = new OEColorResults();
        OEGraphMol fitmol = new OEGraphMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            ov.ColorScore(fitmol, res);
            oechem.OESetSDData(fitmol, "MyColorScore", 
                    String.valueOf(res.getColorscore()));
            oechem.OESetSDData(fitmol, "MyScaledColor", 
                    String.valueOf(res.getScaledcolor()));
            oechem.OEWriteMolecule(outfs, fitmol);

            System.out.print("Fit. Title: "+fitmol.GetTitle()+" ");
            System.out.print("Color Score: "+res.getColorscore()+" ");
            System.out.println("Scaled color: "+res.getScaledcolor());
        } 
        fitfs.delete();
        outfs.delete();
        ov.delete();
        res.delete();
    }
}

Writing color force field files (CFF)

As an alternative to the built-in force fields, the user can define a new color force field using the format described in this section.

The following is a simplified example of a color force field specification.

DEFINE hetero [#7,#8,#15,#16]
DEFINE notNearHetero [!#1;!$($hetero);!$(*[$hetero])]
#
#
TYPE donor
TYPE acceptor
TYPE rings
TYPE positive
TYPE negative
TYPE structural
#
#
PATTERN donor [$hetero;H]
PATTERN acceptor [#8&!$(*~N~[OD1]),#7&H0;!$([D4]);!$([D3]-*=,:[$hetero])]
PATTERN rings [R]~1~[R]~[R]~[R]1
PATTERN rings [R]~1~[R]~[R]~[R]~[R]1
PATTERN rings [R]~1~[R]~[R]~[R]~[R]~[R]1
PATTERN rings [R]~1~[R]~[R]~[R]~[R]~[R]~[R]1
PATTERN positive [+,$([N;!$(*-*=O)])]
PATTERN negative [-]
PATTERN negative [OD1+0]-[!#7D3]~[OD1+0]
PATTERN negative [OD1+0]-[!#7D4](~[OD1+0])~[OD1+0]
PATTERN structural [$notNearHetero]
#
#
INTERACTION donor donor attractive gaussian weight=1.0 radius=1.0
INTERACTION acceptor acceptor attractive gaussian weight=1.0 radius=1.0
INTERACTION rings rings attractive gaussian weight=1.0 radius=1.0
INTERACTION positive positive attractive gaussian weight=1.0 radius=1.0
INTERACTION negative negative attractive gaussian weight=1.0 radius=1.0
INTERACTION structural structural attractive gaussian weight=1.0 radius=1.0

There are four basic keywords in a cff file: DEFINE, TYPE, PATTERN, and INTERACTION. The TYPE field can be any user-defined term such as “donor”, “acceptor”, “lipophilic anion” etc. The PATTERN keyword is used to associate SMARTS patterns with these types. There is no restriction on the number of patterns that can be associated with a user defined type. The position in Cartesian space of the PATTERN is taken as the average of the coordinates of the atoms that match the SMARTS pattern. If the desired location of the PATTERN is a single atom of a larger SMARTS pattern, recursive SMARTS (written as [$(SMARTS)]) can be used. Only the first atom in a recursive SMARTS pattern ‘matches’ the molecule, and the rest of the SMARTS pattern defines an environment. By writing a SMARTS pattern in recursive notation the location of the PATTERN will be taken as the atomic position of the first matching atom in the pattern. In order to simplify both reading and writing SMARTS, intermediate SMARTS can be associated with words using the DEFINE keyword. Once defined, these words can then be used as atom primitives in subsequent SMARTS patterns with the $ prefix (see “DEFINE hetero” and “PATTERN donor” above).

Interactions between types are defined with the INTERACTION keyword. Two user-defined TYPE values must be listed, and whether their interaction is attractive or repulsive. By default the interaction decays as a Gaussian of weight 1.0 Å and radius (decay to 1/e) 1.0 Å. The weight and radius can be modified by keywords WEIGHT and RADIUS. At present, the only alternative to a Gaussian decay is invoked by the DISCRETE keyword. A discrete interaction contributes all of WEIGHT if the inter-type distance is less than RADIUS, or zero. Since it is not differentiable it makes no contribution to optimization (i.e. because the gradient of a DISCRETE function is 0 or infinite).

Best Overlay

OEBestOverlay is the object used to maximize shape (and color) overlap between two objects. OEBestOverlay uses the same overlap function and color function as described earlier, but it optimizes the overlap between the reference and fit object. For speed, the default overlap settings for OEBestOverlay are to use the OEOverlapMethod.Grid overlap method (which implies the OEOverlapRadii.Carbon radius approximation) and hydrogens are ignored.

Starting positions for optimization

Since OEBestOverlay performs an optimization, the starting point of the optimization is important. By default, OEBestOverlay uses an inertial frame alignment method to decide on starting positions (OEBOOrientation.Inertial). The reference structure is aligned by its principal moments of inertia, then the fit object is aligned in 4 positions with the primary and secondary moments of inertia in both possible directions.

Thus, for any given optimization, there are 4 results returned, usually only one of which is useful. In order to deal with structures with symmetrical moments of inertia, OEBestOverlay may perform additional starting points. For a reference or fit where the 2 major moments of inertia are equal (to a user-defined percent, nominally 15%), 4 extra starting positions are generated. In the rare case of a molecule with all 3 moments of inertia essentially equal, 20 random starting translations and rotations are chosen as starting positions.

For virtual screening uses, where the reference and fit molecule are similar in size, OEBOOrientation.Inertial provides an excellent choice for starting positions that balancing quality results with speed. There are times when there is a large difference in size and a more elaborate search is needed. For these, there are a couple of built-in searches as well a user-defined search. To increase searching, instead of doing the inertial frame rotations with to 2 molecule centers-of-mass (COM) aligned, we can do a set of inertial rotations at many more locations. Using OEBOOrientation.InertialAtHeavyAtoms, OEBestOverlay will translate the COM of the fit molecule to each heavy atom and the COM of the reference molecule. At each of these translations, it will perform the 4 basic inertial transforms. Note that this means that instead of 4 (or 8) starting positions, there will be 4 x number of reference molecule heavy atoms starts, resulting in 20-30 more starting positions compared to the default. Obviously this will slow the overall calculation so this is not recommended for high-volume virtual screening, but is very useful for low-volume fragment searching. Using OEBOOrientation.Subrocs will automatically do OEBOOrientation.InertialAtHeavyAtoms on each heavy atom and the COM of the larger molecule, regardless of which molecule is set as the reference.

A slightly less aggressive set of starting positions (OEBOOrientation.InertialAtColorAtoms) translates the COM of the fit molecule to each color atom position and does the 4 inertial rotations. For an average reference molecule with ~10 color atoms, this will be faster than using all heavy atom positions, but not as elaborate of a search.

Finally, they user can set OEBOOrientation.UserInertialStarts and then provide a set of coordinates to OEBestOverlay.SetUserStarts. OEBestOverlay will then use each of these absolute coordinates as a place to translate the COM of the fit molecule and do the inertial transforms.

Alternatively, the user may desire to start the optimization from a single, pre-aligned position. For this case, the user can specify OEBOOrientation.AsIs} as a starting position.

And finally, there is a method to generate N random starting positions where N is user-defined as well as the maximum translation allowed between random starts.

In most cases, the default OEBOOrientation.Inertial frame starting positions are completely sufficient to find optimal overlap.

Note that the fit object is not moved during the optimization. Part of the results returned from the calculation are the transforms necessary to move the fit object into the final orientation. This allows the user to skip this step if only scores are desired. It also allows application of the same transform to other objects.

Ways to deal with OEBestOverlay results

Unlike OEOverlap and OEColorOverlap, OEBestOverlay uses multi-conformer molecules as the reference and fit (mostly for efficiency). As such, a single OEBestOverlay calculation can return numerous results. Basically, each calculation will return N results where N is the number of ref conformers times the number of fit conformers times the number of starting positions for each pair. So comparing 2 molecules with 10 conformers each could return 400 or more results.

There are two helper classes designed solely to contain these results and to make it easy to extract all or just the desired subset. OEBestOverlayResults holds the results of a single pair of conformers. It contains a set of OEBestOverlayScore objects, one for each starting position.

The first example, uses 2 iterators to show all the results.

Listing 6: Getting all the scores from OEBestOverlay.

/******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 *****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class BestOverlay1 {

    public static void main(String[] args) {
        if (args.length!=2) {
            System.out.println("BestOverlay1 <reffile> <fitfile>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);

        OEMol refmol = new OEMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEBestOverlay best = new OEBestOverlay();
        best.SetRefMol(refmol);

        System.out.print("Ref. title: "+refmol.GetTitle()+" ");
        System.out.println("Num Confs: "+refmol.NumConfs());

        int resCount=0;

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print("Fit. title: "+fitmol.GetTitle()+" ");
            System.out.println("Num Confs: "+fitmol.NumConfs());
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            for (OEBestOverlayResults res : best.Overlay(fitmol)) {
                for (OEBestOverlayScore score : res.GetScores()) {
                    System.out.print("FitConfIdx: "+score.getFitconfidx()+" ");
                    System.out.print("RefConfIdx: "+score.getRefconfidx()+" ");
                    System.out.println("Tanimoto: "+score.getTanimoto());
                    ++resCount;
                }
            }
        }
        fitfs.close();
        best.delete();
        System.out.println(resCount+" results returned.");  
    }
}

But in most cases, one does not want or need all the results. Most times, the single best overlap for each conformer-conformer pair is desired. The next example shows the OESortOverlayScores function used to turn the double iterator as shown in the example above into a single iterator of OEBestOverlayScore. Note that the third argument is a functor used to sort the list, such that in this next example, we get one OEBestOverlayScore for each pair of conformers and they are returned in Tanimoto order.

Listing 7: Getting all the best scores from OEBestOverlay.

/******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 *****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class BestOverlay2 {

    public static void main(String[] args) {
        if (args.length!=2) {
            System.out.println("BestOverlay2 <reffile> <fitfile>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);

        OEMol refmol = new OEMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.delete();
      
        OEBestOverlay best = new OEBestOverlay();
        best.SetRefMol(refmol);

        System.out.print("Ref. title: "+refmol.GetTitle()+" ");
        System.out.println("Num Confs: "+refmol.NumConfs());

        int resCount=0;

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print("Fit. title: "+fitmol.GetTitle()+" ");
            System.out.println("Num Confs: "+fitmol.NumConfs());
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            OEBestOverlayScoreIter scoreiter = new OEBestOverlayScoreIter();
            oeshape.OESortOverlayScores(scoreiter, resiter, new OEHighestTanimoto());
            for (OEBestOverlayScore score : scoreiter) {
                System.out.print("FitConfIdx: "+score.getFitconfidx()+" ");
                System.out.print("RefConfIdx: "+score.getRefconfidx()+" ");
                System.out.println("Tanimoto: "+score.getTanimoto());
                ++resCount;
            }
        }
        fitfs.close();
        best.delete();
        System.out.println(resCount+" results returned.");  
    }
}

The next example is a slight variation on the previous. It adds a user parameter for the number of results to keep. Since the results are sorted by Tanimoto, keeping the first 5, keeps the best 5.

Listing 8: Keeping a few top scores from OEBestOverlay.

/*****************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class BestOverlay3 {

    public static void main(String[] args) {
        if (args.length!=3) {
            System.out.println("BestOverlay3 <reffile> <fitfile> <keepsize>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);
        int keepsize = Integer.parseInt(args[2]);

        OEMol refmol = new OEMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEBestOverlay best = new OEBestOverlay();
        best.SetRefMol(refmol);

        System.out.print("Ref. title: "+refmol.GetTitle()+" ");
        System.out.println("Num Confs: "+refmol.NumConfs());

        int resCount=0;

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print("Fit. title: "+fitmol.GetTitle()+" ");
            System.out.println("Num Confs: "+fitmol.NumConfs());
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            OEBestOverlayScoreIter scoreiter = new OEBestOverlayScoreIter();
            oeshape.OESortOverlayScores(scoreiter, resiter, new OEHighestTanimoto());
            for (OEBestOverlayScore score : scoreiter) {
                System.out.print("FitConfIdx: "+score.getFitconfidx()+" ");
                System.out.print("RefConfIdx: "+score.getRefconfidx()+" ");
                System.out.println("Tanimoto: "+score.getTanimoto());
                ++resCount;
                if (resCount==keepsize)
                    break;
            }
        }
        fitfs.close();
        best.delete(); 
        System.out.println(resCount+" results returned.");  
    }
}

Saving aligned structures

OEBestOverlay does not actually move the fit molecule. Part of OEBestOverlayScore is the rotation matrix and translation matrix necessary to move the fit molecule into the final overlap position. (N.B. the rotation matrix is left-multiplied so can be regarded as left-handed.) It is up to the user to apply these transformations. The OpenEye standard is rotation then translation, but as a convenience, OEBestOverlayScore has OEBestOverlayScore.Transform method that will apply the transforms in the proper order. This next example expands from the previous. Now, not only can we choose the number of overlays to keep, we are also going to align each fit conformer to the ref conformer it was overlaid on, and then write the pair to an output file. If SD or OEB is used as the output file type, then scores will also be stored in SD tags.

Listing 9: Writing aligned structures from OEBestOverlay.

/*****************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class BestOverlay4 {

    public static void main(String[] args) {
        if (args.length!=4) {
            System.out.println("BestOverlay4 <reffile> <fitfile> <out.sdf> <keepsize>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);
        oemolostream outfs = new oemolostream(args[2]);
        int keepsize = Integer.parseInt(args[3]);

        OEMol refmol = new OEMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEBestOverlay best = new OEBestOverlay();
        best.SetRefMol(refmol);

        System.out.print("Ref. title: "+refmol.GetTitle()+" ");
        System.out.println("Num Confs: "+refmol.NumConfs());

        int resCount=0;

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print("Fit. title: "+fitmol.GetTitle()+" ");
            System.out.println("Num Confs: "+fitmol.NumConfs());
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            OEBestOverlayScoreIter scoreiter = new OEBestOverlayScoreIter();
            oeshape.OESortOverlayScores(scoreiter, resiter, new OEHighestTanimoto());
            for (OEBestOverlayScore score : scoreiter) {
                OEGraphMol outmol = 
                        new OEGraphMol(fitmol.GetConf(new OEHasConfIdx(score.getFitconfidx())));
                score.Transform(outmol);

                oechem.OESetSDData(outmol, "RefConfIdx", String.valueOf(score.getRefconfidx()));
                oechem.OESetSDData(outmol, "Tanimoto", String.valueOf(score.getTanimoto()));

                oechem.OEWriteConstMolecule(outfs,
                        refmol.GetConf(new OEHasConfIdx(score.getRefconfidx())));
                oechem.OEWriteConstMolecule(outfs, outmol);

                ++resCount;
                if (resCount==keepsize)
                    break;
            }
        }
        fitfs.close();
        outfs.close();
        best.delete();
        System.out.println(resCount+" results returned.");  
    }
}

Using color along with shape

In order to maximize chemical similarity along with shape, an OEColorForceField can be added to OEBestOverlay. Once added, color scores will be calculated along with shape overlap scores. By default, the color force field only appears as a post-optimization scoring function. That is, after the shape overlap is maximized, color scores are calculated.

Color force can, however, also be used as part of the optimization. Color optimization is turned on by the OEBestOverlay.SetColorOptimize member function.

Listing 10: Using color with OEBestOverlay.

/*****************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class BestOverlayColor {

    public static void main(String[] args) {
        if (args.length!=4) {
            System.out.println("BestOverlayColor <reffile> <fitfile> <out.sdf> <keepsize>");
            System.exit(0);
        }

        oemolistream reffs = new oemolistream(args[0]);
        oemolistream fitfs = new oemolistream(args[1]);
        oemolostream outfs = new oemolostream(args[2]);
        int keepsize = Integer.parseInt(args[3]);

        OEMol refmol = new OEMol();
        oechem.OEReadMolecule(reffs, refmol);
        reffs.close();

        OEBestOverlay best = new OEBestOverlay();
        best.SetRefMol(refmol);
        best.SetColorForceField(OEColorFFType.ImplicitMillsDean);
        best.SetColorOptimize(true);

        System.out.print("Ref. title: "+refmol.GetTitle()+" ");
        System.out.print("Num Confs: "+refmol.NumConfs()+" ");
        System.out.println("Self color: "+best.GetRefSelfColor());

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print("Fit. title: "+fitmol.GetTitle()+" ");
            System.out.println("Num Confs: "+fitmol.NumConfs());

            int resCount=0;
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            OEBestOverlayScoreIter scoreiter = new OEBestOverlayScoreIter();
            oeshape.OESortOverlayScores(scoreiter, resiter, new OEHighestTanimotoCombo());
            for (OEBestOverlayScore score : scoreiter) {
                OEGraphMol outmol = 
                        new OEGraphMol(fitmol.GetConf(new OEHasConfIdx(score.getFitconfidx())));
                score.Transform(outmol);

                oechem.OESetSDData(outmol, "RefConfIdx", 
                        String.valueOf(score.getRefconfidx()));
                oechem.OESetSDData(outmol, "TanimotoCombo", 
                        String.valueOf(score.GetTanimotoCombo()));
                oechem.OESetSDData(outmol, "Tanimoto", 
                        String.valueOf(score.getTanimoto()));
                oechem.OESetSDData(outmol, "ColorTanimoto", 
                        String.valueOf(score.GetColorTanimoto()));

                oechem.OEWriteConstMolecule(outfs,
                        refmol.GetConf(new OEHasConfIdx(score.getRefconfidx())));
                oechem.OEWriteConstMolecule(outfs, outmol);

                ++resCount;
                if (resCount==keepsize)
                    break;
            }
            System.out.println(resCount+" results returned."); 
        } 
        best.delete();
        fitfs.close();
        outfs.close();
    }
}

Using a shape query

An OEShapeQueryPublic can be used as the reference for OEBestOverlay. Shape queries written from OpenEye’s vROCS application can be read in using OEReadShapeQuery.

Listing 11: Using a shape query

/*****************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class BestOverlayShapeQuery {

    public static void main(String[] args) {
        if (args.length!=4) {
            System.out.println("BestOverlayColor <refquery> <fitfile> <out.sdf> <keepsize>");
            System.exit(0);
        }

        OEShapeQueryPublic refquery = new OEShapeQueryPublic();
        oeshape.OEReadShapeQuery(args[0], refquery);
        oemolistream fitfs = new oemolistream(args[1]);
        oemolostream outfs = new oemolostream(args[2]);
        int keepsize = Integer.parseInt(args[3]);

        OEBestOverlay best = new OEBestOverlay();
        best.SetRef(refquery);
        best.SetColorForceField(OEColorFFType.ImplicitMillsDean);
        best.SetColorOptimize(true);

        System.out.print("Ref. title: "+refquery.GetTitle()+" ");
        System.out.println("Self color: "+best.GetRefSelfColor());
        refquery.delete();

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(fitfs, fitmol)) {
            System.out.print("Fit. title: "+fitmol.GetTitle()+" ");
            System.out.println("Num Confs: "+fitmol.NumConfs());

            int resCount=0;
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            OEBestOverlayScoreIter scoreiter = new OEBestOverlayScoreIter();
            oeshape.OESortOverlayScores(scoreiter, resiter, new OEHighestTanimotoCombo());
            for (OEBestOverlayScore score : scoreiter) {
                OEGraphMol outmol = 
                        new OEGraphMol(fitmol.GetConf(new OEHasConfIdx(score.getFitconfidx())));
                score.Transform(outmol);

                oechem.OESetSDData(outmol, "RefConfIdx", 
                        String.valueOf(score.getRefconfidx()));
                oechem.OESetSDData(outmol, "TanimotoCombo", 
                        String.valueOf(score.GetTanimotoCombo()));
                oechem.OESetSDData(outmol, "Tanimoto", 
                        String.valueOf(score.getTanimoto()));
                oechem.OESetSDData(outmol, "ColorTanimoto", 
                        String.valueOf(score.GetColorTanimoto()));

                oechem.OEWriteConstMolecule(outfs, outmol);

                ++resCount;
                if (resCount==keepsize)
                    break;
            }
            System.out.println(resCount+" results returned.");
        }  
        fitfs.close();
        outfs.close();
        best.delete();
    }
}

Calculating NxN scores

There are times when you want to have all the pair-wise scores among a set of molecules. Maintaining this 2D matrix of scores is more complicated than the single set of scores from the ref vs. set of fit molecules examples shown above.

This example takes a set of molecules and performs all the pair-wise shape optimizations with OEBestOverlay. It outputs a spreadsheet of the scores.

Listing 12: Calculating NxN scores.

/*****************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ****************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;
import java.io.*;
import java.text.DecimalFormat;

public class NxNShape {
    static { 
        oechem.OEThrow.SetStrict(true);
    }

    private static FileOutputStream csv;
    private static PrintWriter csvfile;
    private static DecimalFormat df = new DecimalFormat("#.##");

    private String roundOff(float score) {
        return df.format(score);
    }

    private void oneConf(OEConfBase conf, String filename) {
        csvfile.print(conf.GetTitle() + "_" + conf.GetIdx());
        OEMol refmol = new OEMol(conf);
        OEBestOverlay best = new OEBestOverlay();
        best.SetRefMol(refmol);
        oemolistream bfs = new oemolistream(filename);

        OEMol fitmol = new OEMol();
        while (oechem.OEReadMolecule(bfs, fitmol)) {
            OEBestOverlayResultsIter resiter = best.Overlay(fitmol);
            OEBestOverlayScoreIter scoreiter = new OEBestOverlayScoreIter();
            oeshape.OESortOverlayScores(scoreiter, resiter, new OEHighestTanimoto(), 1, true);
            for (OEBestOverlayScore score : scoreiter) {
                csvfile.print("," + roundOff(score.getTanimoto()));
            }
        }
        best.delete();
        bfs.close();
        csvfile.println();
    }

    private void genHeader(String filename) {
        csvfile.print("name");
        oemolistream ifs = new oemolistream(filename);
        OEMol mol = new OEMol();
        while (oechem.OEReadMolecule(ifs, mol)) {
            System.out.println(mol.GetTitle());
            for (OEConfBase conf : mol.GetConfs()) {
                csvfile.print("," + conf.GetTitle() + "_" + conf.GetIdx());
            }
        }
        ifs.close();
        csvfile.println();
    }

    private void openCSV(String csvfilename) {
        try {
            csv = new FileOutputStream(csvfilename);
            csvfile = new PrintWriter(csv);
        } catch (FileNotFoundException e) {
            System.err.println("Unable to open output file " + csvfilename);
        }
    }

    public void nxn(String filename, String csvfilename) {
        openCSV(csvfilename);
        genHeader(filename);
        oemolistream afs = new oemolistream(filename);
        OEMol molA = new OEMol();
        while (oechem.OEReadMolecule(afs, molA)) {
            for (OEConfBase conf : molA.GetConfs()) {
                oneConf(conf, filename);
            }
        }
        csvfile.close();
        afs.close();
    }

    public static void main(String[] args) {
        if (args.length != 2) {
            System.out.println("NxNShape <infile> <csvfile>");
            System.exit(0);
        }
        NxNShape app = new NxNShape();
        app.nxn(args[0], args[1]);
    }
}

Calculating shape characteristics

While most functionality in the Shape Toolkit involves comparison of pairs of molecules, there are a few fundamental properties that can be calculated for a single molecule. All of these calculations are done using the same basic Gaussian description of a molecule as described above.

The simplest property to calculate is volume, using OECalcVolume.

In addition to simple volume calculations, the steric multipoles of a molecule can also be calculated. See section OECalcShapeMultipoles.

This next example demonstrates the calculation of volume and shape multipoles.

Listing 13: Calculating shape properties.

/*******************************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 ******************************************************************************/
package openeye.examples.oeshape;

import openeye.oechem.*;
import openeye.oeshape.*;

public class ShapeProps {

    public static void main(String[] args) {
        if (args.length!=1) {
            System.out.println("ShapeProps <infile>");
            System.exit(0);
        }

        oemolistream ifs = new oemolistream(args[0]);

        OEGraphMol mol = new OEGraphMol();

        while (oechem.OEReadMolecule(ifs,mol)) {
            System.out.println("              Title: "+mol.GetTitle());
            System.out.println("             Volume: "+oeshape.OECalcVolume(mol));
            System.out.println("Volume: (without H): "+
                    oeshape.OECalcVolume(mol, false));

            float [] smult = new float[14];
            oeshape.OECalcShapeMultipoles(mol, smult);

            System.out.println("  Steric multipoles:");
            System.out.println("           monopole: "+smult[0]);
            System.out.println("        quadrupoles: "+ 
                    smult[1]+" "+smult[2]+" "+smult[3]);
            System.out.println("          octopoles:");
            System.out.println("                xxx: "+smult[4]+
                    " yyy: "+smult[5]+" zzz: "+smult[6]);
            System.out.println("                xxy: "+smult[7]+
                    " xxz: "+smult[8]+" yyx: "+smult[9]);
            System.out.println("                yyz: "+smult[10]+
                    " zzx: "+smult[11]+" zzy: "+smult[12]);
            System.out.println("                xyz: "+smult[13]);

            System.out.println("");
        }
        ifs.close();
    }
}

Calculating excluded volume

This code demonstrates how to use the shape toolkit to do an overlay to a crystallographic ligand, then calculate the overlap of the database molecule to the cognate protein. The output includes a new score Rescore which is the ShapeTanimotoCombo penalised by the fraction overlap.

Command Line Interface

A description of the command line interface can be obtained by executing the program with the –help argument.

prompt> java ExcludeVolume --help

will generate the following output:

-q : Query file name
-e : Protein to use as exclusion volume
-d : Database file name
-o : Output file name

Code

Download code

ExcludeVolume.java and ExcludeVolume.txt interface file