Multi-conformer Molecules

OEMols

Up to this point in the manual, all of the examples have involved using concrete OEGraphMol molecules. These molecules have been utilizing the functionality defined in the API of the OEMolBase abstract base-class. At this point we will introduce another layer of abstraction in OEChem TK’s representation of molecules. In OEChem TK, we draw a distinction between molecules that are limited to a single conformer and those that may have any number of conformers. While this may be an arbitrary decision, it is a pragmatic one which allows more efficient implementation of both classes. The single-conformer molecule’s API is defined by the already familiar OEMolBase abstract base-class. The multi-conformer molecule’s API is defined by another abstract base-class, the OEMCMolBase (here the MC stands for Multi-Conformer). The OEMCMolBase class inherits publicly from OEMolBase, thus the multi-conformer molecule supports the single-conformer API but adds additional functions to manage conformers. Both the single-conformer and the multi-conformer molecules contain atoms and bonds, but only the multi-conformer molecule contains conformers as first-class objects.

Note

OEMCMolBase conformer coordinates can be stored in OEHalfFloat (16-bit), float (32-bit), double (64-bit), or long double (>= 64-bit). The precision is determined by the constructor argument given to OEMol, taken from the OEMCMolType namespace. However, float is the default implementation as a space optimization and for historical purposes.

The OEGraphMol is a concrete class that supports the OEMolBase API and can be passed to functions which take an OEMolBase as an argument. The OEMol is a concrete class that supports the OEMCMolBase API in addition to the OEMolBase API. Therefore, an OEMol can be passed to any function which takes either an OEMolBase or an OEMCMolBase as an argument.

Since an OEMCMolBase is-a OEMolBase all the same powerful graph functions can be applied to the OEMCMolBase. This powerful type of abstraction allows for pure graph algorithms to still be applied to multi-conformer molecules without requiring the algorithms to know that they are operating on a different molecule representation. When the algorithm should be changed to accommodate multiple conformers the function can be overloaded to provide the different functionality for OEMCMolBase.

For example, the behavior in the following are slightly different based upon whether the molecule being passed is an OEMolBase or an OEMCMolBase.

There are also functions that only make sense for multi-conformer molecules. In this case an overload for OEMolBase is simply not provided. This prevents users from inadvertently passing a single-conformer molecule when a multi-conformer molecule is required. An example of this would be in the OEOmega toolkit which is designed to generate multi-conformer molecules, hence it requires an OEMCMolBase for the conformers to be output to.

Conformers

An OEMCMolBase contains one or more conformers. These conformers are managed in a manner very similar to atoms and bonds. Conformers can only be created or destroyed in the context of an OEMCMolBase, and must be accessed via member functions laid out in that API. Conformers are represented by the abstract base-class OEConfBase. The GetActive and SetActive methods are often sufficient for accessing conformations in multi-conformer molecules, but alternate access methods are also provided, e.g., GetConf and GetConfs.

It is sometimes convenient to be able to treat a conformer object as its own single-conformer molecule. For this reason, OEConfBase inherits from OEMolBase. Therefore, although a conformer is contained within a multi-conformer molecule, it can act as a single-conformer molecule, and can be passed to functions that have an OEMolBase argument.

One must be cautious when utilizing this OEMolBase inheritance functionality. Each multi-conformer molecule has only a single heavy-atom graph. For functions which query the graph portion of a molecule, a conformer will reflect the graph properties of its parent multi-conformer molecule. Graph properties include the connection table of atoms and bonds, as well as any properties stored by the atoms and bonds. A conformer is only independent of its parent for non-graph (e.g. conformational) properties. The logical extension of this principle is that changes made to the graph properties of one conformer will effect its parent multi-conformer molecule and thus all the other conformers in that molecule as well. The sharing of a common connection table prevents tautomers from being modeled together with an OEMCMolBase.

See also

See Design Decisions for a complete inheritance graph explaining the relationship of OEChem TK molecules.

Coordinate information is stored by the conformer, not the shared atoms and bonds, which allows the conformers to share the same heavy-atom graph but have different spatial configurations. In OEChem TK, these conformers are represented by OEConfBases, which are first-class objects. The conformer is the only additional property presented by the OEMCMolBase. Access to conformers is similar to atoms and bonds, as shown by the following table.

Description

Return Type

Method

See Also

Number of Conformers

unsigned int

NumConfs

Maximum Conformer Index

unsigned int

GetMaxConfIdx

Atom, Bond, and Conformer Indices

Access to a Conformer

OEConfBase

GetConf

Predicate Functors

Access to all Conformers

OEIterBase

GetConfs

Atom and Bond Traversal

Create a new Conformer

OEConfBase

NewConf

Atom and Bond Creation

Remove a Conformer

bool

DeleteConf

Rearrange Conformers

bool

OrderConfs

Conformer Iteration

Conformers of an OEMCMolBase can be iterated over the same way atoms and bonds of an OEMolBase are iterated over. GetConfs returns an iterator over the conformers of the molecule. This allows one to have multiple conformation objects at once and to treat the OEMCMolBase as a container of single-conformer molecules.

Listing 1 demonstrates the use of the conformers as first class objects. Each conformer is represented by an OEConfBase which inherits from OEMolBase. Thus, each conformer can be treated as an independent molecule with respect to its coordinates as shown in the example code below.

Listing 1: Retrieving the bounding box for a set of conformers

#include <openeye.h>
#include <iostream>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

int main(int, char *argv[])
{
  oemolistream ims(argv[1]);

  OEMol mol;
  while (OEReadMolecule(ims, mol))
  {
    float box[6] = {FLT_MAX, FLT_MAX, FLT_MAX, FLT_MIN, FLT_MIN, FLT_MIN};
    for (OEIter<const OEConfBase> conf = mol.GetConfs(); conf; ++conf)
    {
      vector<float> ctr(3);
      vector<float> ext(3);
      OEGetCenterAndExtents(conf, &ctr[0], &ext[0]);
      box[0] = min(box[0], ctr[0] - ext[0]);
      box[1] = min(box[1], ctr[1] - ext[1]);
      box[2] = min(box[2], ctr[2] - ext[2]);
      box[3] = max(box[3], ctr[0] + ext[0]);
      box[4] = max(box[4], ctr[1] + ext[1]);
      box[5] = max(box[5], ctr[2] + ext[2]);
    }

    cout.precision(3);
    cout << "Bounding box for the conformers of " << mol.GetTitle() << endl;
    cout << fixed << "Lower Extent: " << box[0] << " " << box[1] << " " << box[2];
    cout << fixed << " Upper Extent: " << box[3] << " " << box[4] << " " << box[5];
    cout << endl;
  }

  return 0;
}

OEGetCenterAndExtents takes an OEMolBase as an argument, not an OEConfBase. However, it is still usable on each conformer since OEConfBase inherits from OEMolBase. Therefore, all the functions that were written for OEMolBase are automatically usable on conformers.

Note

OEChem TK makes the same guarantees on conformers of a multi-conformer molecule as it does for atoms and bonds of a molecule. That is, conformer objects and their indices are stable across any other method call, with the exception of SweepConfs.

Conformer Creation

The most common method to create conformers in a molecule is by reading a molecule from a file (see section Input and Output). However, when manipulating molecules it is often necessary to create conformers on-the-fly. In OEChem TK, this is done with the NewConf method. There are numerous overloads of NewConf. All of the overloads create conformers with the capacity to store coordinates for the current number of atoms in the molecule. NewAtom adjusts this capacity as necessary. The default OEMCMolBase constructor puts the molecule in a state with a single empty conformer (as does the Clear method).

See also

Parallel Data Structures for a discussion of how indices are used to index into coordinate arrays.

Listing 2 demonstrates how to generate a multi-conformer water molecule from scratch. It then measures the hydrogen-oxygen-hydrogen angle of the two conformers. Remember, the OEChem TK definition of conformer is very loose, any set of Cartesian coordinates constitutes another conformer. In this case the configuration of the atoms to each other is the same, the location of the molecule has just changed. No effort is given to prevent the user from creating duplicate conformers.

Listing 2: Creating conformers from explicit sets of coordinates

#include <openeye.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;
using namespace OEMath;

int main()
{
  OEMol mol;
  OEAtomBase *o  = mol.NewAtom(OEElemNo::O);
  OEAtomBase *h1 = mol.NewAtom(OEElemNo::H);
  OEAtomBase *h2 = mol.NewAtom(OEElemNo::H);

  const float Acrds[] = { 0.0f   , 0.0f   , 0.0f,
                    0.9584f, 0.0f   , 0.0f,
                   -0.2392f, 0.9281f, 0.0f};
  // Grab the default conformer
  OEConfBase *Acnf = mol.GetConf(OEIsTrue<OEConfBase>());
  Acnf->SetCoords(Acrds);

  const float Bcrds[] = { 0.0f   , 0.0f   , 0.0f,
                    0.9584f, 0.0f   , 0.0f,
                   -0.2392f,-0.9281f, 0.0f};
  const OEConfBase *Bcnf = mol.NewConf(Bcrds);

  cout << "1st Water Angle: " << OEGetAngle(*Acnf, h1, o, h2) * Rad2Deg << endl;
  cout << "2nd Water Angle: " << OEGetAngle(*Bcnf, h1, o, h2) * Rad2Deg << endl;

  return 0;
}

Warning

An OEMCMolBase is constructed with one conformer already present. Special care must be taken in Listing 2 to insert the coordinates of the first conformer into this conformer.

There is also a version of NewConf which takes an OEMolBase and copies the coordinates of the passed molecule into the new conformer. NewConf is expecting that the molecule passed has the same graph as the OEMCMolBase which is the parent of the new conformer. It is important to note that this version of NewConf can take any instance of an OEMolBase, such as an OEGraphMol or an OEMol. When an OEMol is passed to NewConf, the coordinates of the newly created conformer will come from the first conformation of the molecule passed.

Listing 3 demonstrates how to use NewConf with another conformer as the argument. The purpose of this code is to only copy conformers within an arbitrary energy cutoff from the src to the dst molecule.

Listing 3: Filtering conformers based upon energy cutoff

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace OESystem;
using namespace OEChem;

int main(int, char *argv[])
{
  oemolistream ifs(argv[1]);
  oemolostream ofs(argv[2]);

  OEMol src;
  while (OEReadMolecule(ifs, src))
  {
    OEMol dst(src.SCMol());
    dst.DeleteConfs();

    for (OEIter<const OEConfBase> conf = src.GetConfs(); conf; ++conf)
    {
      if (conf->GetEnergy() < 25.0f)
        dst.NewConf(conf);
    }

    if (dst.NumConfs() > 0u)
      OEWriteMolecule(ofs, dst);
  }

  return 0;
}

The dst molecule is copy constructed from the single-conformer representation (the first conformer) of the src molecule. This is to grab the molecule graph information and any other non-conformational information. Then DeleteConfs is called to get rid of the first conformer. It may be copied back in the conformer loop later in the code if it meets the energy criteria.

Warning

An OEMCMolBase with no conformers can be dangerous to pass to other OEChem TK functions, so the number of conformers is checked before calling OEWriteMolecule.

Input and Output

Molecule streams can read both single and multi-conformer molecules from any file format. Many of the file formats supported by OEChem TK are inherently a single conformer format (SDF and MOL2, for example). However, a common practice is to store multiple conformers in these files. OEChem TK supports a rather advanced mechanism for recovering these separate conformers into a single, multi-conformer OEMCMolBase. Note that this does not apply to file formats where conformers are stored together, OEBinary (.oeb), for example. OEBinary files store either single-conformer or multi-conformer molecules explicitly, so the file itself determines how to deal with conformers. Additionally, file formats that have no notion of conformers (i.e. SMILES) are unaffected by this feature.

The oemolistream::SetConfTest method sets a functor that is used to compare the graphs of incoming molecules in order to determine whether to combine them. These functors are subclasses of OEConfTestBase. Several predefined versions include:

OEDefaultConfTest

The OEDefaultConfTest never combines connection tables into multi-conformer molecules.

OEIsomericConfTest

The OEIsomericConfTest combines subsequent connection tables into a multi-conformer molecule if they:

  1. Have the same title (optional)

  2. Have the same numbers of atoms and bonds in the same order

  3. Each atom and bond must have identical properties with its order correspondent in the subsequent connection table

  4. Have the same atom and bond stereochemistry

No changes are made to the connection table.

The constructor for OEIsomericConfTest has a default argument for whether or not to compare titles. If the constructor is called with no arguments or with the argument true, the titles will be required to be the same. Otherwise, the titles will not be compared. In the latter instance, each conformer will have the individual title of its original connection table and the multi-conformer molecule will reflect the title of the active conformer.

OEOmegaConfTest

The OEOmegaConfTest is almost exactly the same as OEIsomericConfTest with the exception that differences in invertible nitrogens are disregarded. Therefore, invertible nitrogens with different stereo chemistry are considered conformers of the same molecule.

This definition is meant to be exactly in line with what Omega will generate. Therefore, OEOmegaConfTest can be used on non-OEB files to generated with Omega to recover the exact set of conformers Omega generated for a particular molecule.

OEAbsoluteConfTest

The OEAbsoluteConfTest combines subsequent connection tables into a multi-conformer molecule if they:

  1. Have the same title (optional)

  2. Have the same number of atoms and bonds in the same order

  3. Each atom and bond must have identical properties with its order correspondent in the subsequent connection table

This conformer test sets all fully specified isomeric values to UNDEFINED.

The constructor for OEAbsoluteConfTest has a default argument for whether or not to compare titles. If the constructor is called with no arguments or with the argument true, the titles will be required to be the same. Otherwise, the titles will not be compared. In the latter instance, each conformer will have the individual title of its original connection table and the multi-conformer molecule will reflect the title of the active conformer.

OEAbsCanonicalConfTest

The OEAbsCanonicalConfTest combines subsequent connection tables into a multi-conformer molecule if they:

  1. Have the same absolute (non-isomeric) graph

This conformer test puts all of the molecules in their canonical atom order. In addition, all fully specified isomeric values are set to UNDEFINED.

Listing 4: Reading in multi-conformer molecules from single-conformer files

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main(int, char *argv[])
{
  OEMol mol;
  oemolistream ifs(argv[1]);
  ifs.SetConfTest(OEIsomericConfTest());

  while (OEReadMolecule(ifs, mol))
  {
    std::cout << mol.GetTitle() << " has "
              << mol.NumConfs() << " conformers" << std::endl;
  }

  return 0;
}

Listing 4 will read multi-conformer molecules from an input file based on OEIsomericConfTest. This assumes that the conformers are ordered next to each other in the input file.

Note

The OEIsomericConfTest constructor can be passed false to allow conformers to be combined when they have different titles. This is very useful when dealing with files created by programs that modify molecule titles to indicate conformer number (i.e. acetsali_1, acetsali_2, acetsali_3). The modified titles are accessible through the GetTitle method on each individual OEConfBase.

Dude, where’s my SD data?

SD tag data can be added to anything that derives from OEMolBase, including OEMCMolBase or OEConfBase. Generally, OEChem TK will never lose any data when reading or writing. However, there are constraints placed on OEChem TK as to where the SD data must go based upon the file format being used.

An ambiguity occurs when adding SD tag data to an OEMCMolBase and then writing it to SDF. SDF files do not support multiple conformers. However, OEChem TK can automatically read consecutive conformers out of a SDF file into a OEMCMolBase. To preserve the SD data OEChem TK has no choice but to push the data onto the conformers.

OEB files do not have this restriction upon them because they do support multi-conformer molecules. The following table shows how to round-trip SD tag data through the SDF and OEB formats.

Attached To

Written To

Read Into

Attached To

OEMCMolBase

SDF

OEMCMolBase

OEConfBase

OEMCMolBase

SDF

OEMolBase

OEMCMolBase

OEB

OEMCMolBase

OEMCMolBase

OEB

OEMolBase

OEConfBase

SDF

OEMCMolBase

OEConfBase

OEConfBase

SDF

OEMolBase

OEConfBase

OEB

OEMCMolBase

OEConfBase

OEConfBase

OEB

OEMolBase

OEMolBase

SDF

OEMCMolBase

OEConfBase

OEMolBase

SDF

OEMolBase

OEMolBase

OEB

OEMCMolBase

OEConfBase

OEMolBase

OEB

OEMolBase

Note

The OEFormat::CSV format behaves identical to the OEFormat::SDF format for how SD data is handled.

Practically, it is best to never attach SD tag data to an OEMCMolBase. This should only be done as a space optimization when it is assured that the multi-conformer molecule will only be written to OEB.

To this end, when an OEMol copy constructs from an OEGraphMol the SD tag data is attached to the first conformer.