Multi-conformer Molecules¶
OEMols¶
Up to this point in the manual, all of the examples have involved using concrete OEGraphMol molecules. These molecules have been utilizing the functionality defined in the API of the OEMolBase abstract base-class. At this point we will introduce another layer of abstraction in OEChem TK’s representation of molecules. In OEChem TK, we draw a distinction between molecules that are limited to a single conformer and those that may have any number of conformers. While this may be an arbitrary decision, it is a pragmatic one which allows more efficient implementation of both classes. The single-conformer molecule’s API is defined by the already familiar OEMolBase abstract base-class. The multi-conformer molecule’s API is defined by another abstract base-class, the OEMCMolBase (here the MC stands for Multi-Conformer). The OEMCMolBase class inherits publicly from OEMolBase, thus the multi-conformer molecule supports the single-conformer API but adds additional functions to manage conformers. Both the single-conformer and the multi-conformer molecules contain atoms and bonds, but only the multi-conformer molecule contains conformers as first-class objects.
Note
OEMCMolBase
conformer coordinates can
be stored in OEHalfFloat (16-bit), float
(32-bit), double
(64-bit), or long double
(>= 64-bit). The
precision is determined by the constructor argument given to
OEMol, taken from the
OEMCMolType
namespace. However, float
is
the default implementation as a space optimization and for
historical purposes.
The OEGraphMol is a concrete class that supports the
OEMolBase API and can be passed to functions which
take an OEMolBase as an argument. The
OEMol is a concrete class that supports the
OEMCMolBase
API in addition to the
OEMolBase API. Therefore, an OEMol
can be passed to any function which takes either an
OEMolBase or an
OEMCMolBase
as an argument.
Since an OEMCMolBase
is-a OEMolBase all
the same powerful graph functions can be applied to the
OEMCMolBase
. This powerful type of
abstraction allows for pure graph algorithms to still be applied to
multi-conformer molecules without requiring the algorithms to know
that they are operating on a different molecule representation. When
the algorithm should be changed to accommodate multiple conformers the
function can be overloaded to provide the different functionality for
OEMCMolBase
.
For example, the behavior in the following are slightly different based upon
whether the molecule being passed is an OEMolBase or an
OEMCMolBase
.
There are also functions that only make sense for multi-conformer
molecules. In this case an overload for OEMolBase is
simply not provided. This prevents users from inadvertently passing a
single-conformer molecule when a multi-conformer molecule is
required. An example of this would be in the OEOmega toolkit which is
designed to generate multi-conformer molecules, hence it requires an
OEMCMolBase
for the conformers to be
output to.
Conformers¶
An OEMCMolBase
contains one or more
conformers. These conformers are managed in a manner very similar to
atoms
and
bonds
. Conformers can only be created or
destroyed in the context of an
OEMCMolBase
, and must be accessed via
member functions laid out in that API. Conformers are represented by
the abstract base-class OEConfBase.
The GetActive
and
SetActive
methods are often sufficient
for accessing conformations in multi-conformer molecules, but alternate
access methods are also provided, e.g.,
GetConf
and
GetConfs
.
It is sometimes convenient to be able to treat a conformer object as
its own single-conformer molecule. For this reason,
OEConfBase
inherits from
OEMolBase. Therefore, although a conformer is contained
within a multi-conformer molecule, it can act as a single-conformer
molecule, and can be passed to functions that have an
OEMolBase argument.
One must be cautious when utilizing this OEMolBase
inheritance functionality. Each multi-conformer molecule has only a
single heavy-atom graph. For functions which query the graph portion
of a molecule, a conformer will reflect the graph properties of its
parent multi-conformer molecule. Graph properties include the
connection table of atoms and bonds, as well as any properties stored
by the atoms and bonds. A conformer is only independent of its parent
for non-graph (e.g. conformational) properties. The logical extension
of this principle is that changes made to the graph properties of one
conformer will effect its parent multi-conformer molecule and thus all
the other conformers in that molecule as well. The sharing of a common
connection table prevents tautomers from being modeled together with
an OEMCMolBase
.
See also
See Design Decisions for a complete inheritance graph explaining the relationship of OEChem TK molecules.
Coordinate information is stored by the conformer, not the shared atoms
and bonds, which allows the conformers to share the same heavy-atom
graph but have different spatial configurations. In OEChem TK, these
conformers are represented by
OEConfBases
, which are first-class
objects. The conformer is the only additional property presented by
the OEMCMolBase
. Access to conformers
is similar to atoms and
bonds, as shown by the following
table.
Description |
Return Type |
Method |
See Also |
---|---|---|---|
Number of Conformers |
|
||
Maximum Conformer Index |
|
||
Access to a Conformer |
|||
Access to all Conformers |
|||
Create a new Conformer |
|||
Remove a Conformer |
|
||
Rearrange Conformers |
|
Conformer Iteration¶
Conformers of an OEMCMolBase
can be
iterated over the same way atoms and
bonds of an
OEMolBase are iterated
over. GetConfs
returns an
iterator over the conformers of the molecule. This allows one to have
multiple conformation objects at once and to treat the
OEMCMolBase
as a container of
single-conformer molecules.
Listing 1
demonstrates the use of the
conformers as first class objects. Each conformer is represented by an
OEConfBase
which inherits from
OEMolBase. Thus, each conformer can be treated as an
independent molecule with respect to its coordinates as shown in the
example code below.
Listing 1: Retrieving the bounding box for a set of conformers
from openeye import oechem
import sys
ifs = oechem.oemolistream(sys.argv[1])
ctr = oechem.OEFloatArray(3)
ext = oechem.OEFloatArray(3)
for mol in ifs.GetOEMols():
box = [float("inf")] * 3 + [float("-inf")] * 3
for conf in mol.GetConfs():
oechem.OEGetCenterAndExtents(conf, ctr, ext)
box[0] = min(box[0], ctr[0] - ext[0])
box[1] = min(box[1], ctr[1] - ext[1])
box[2] = min(box[2], ctr[2] - ext[2])
box[3] = max(box[3], ctr[0] + ext[0])
box[4] = max(box[4], ctr[1] + ext[1])
box[5] = max(box[5], ctr[2] + ext[2])
print("Bounding box for the conformers of " + mol.GetTitle())
print("Lower Extent: %.3f %.3f %.3f" % (box[0], box[1], box[2]), end=" ")
print("Upper Extent: %.3f %.3f %.3f" % (box[3], box[4], box[5]))
OEGetCenterAndExtents
takes an
OEMolBase as an argument, not an
OEConfBase
. However, it is still usable
on each conformer since OEConfBase
inherits from OEMolBase. Therefore, all the functions
that were written for OEMolBase are automatically
usable on conformers.
Note
OEChem TK makes the same guarantees on conformers of a
multi-conformer molecule as it does for atoms and bonds of a
molecule. That is, conformer objects and their indices are stable
across any other method call, with the exception of
SweepConfs
.
Conformer Creation¶
The most common method to create conformers in a molecule is by
reading a molecule from a file (see section Input and Output).
However, when manipulating
molecules it is often necessary to create conformers on-the-fly. In
OEChem TK, this is done with the
NewConf
method. There are numerous overloads of
NewConf
. All of the overloads
create conformers with the capacity to store coordinates for the current
number of atoms in the molecule. NewAtom
adjusts this capacity as necessary. The default
OEMCMolBase
constructor puts the
molecule in a state with a single empty conformer (as does the
Clear
method).
See also
Parallel Data Structures for a discussion of how indices are used to index into coordinate arrays.
Listing 2
demonstrates how to generate a
multi-conformer water molecule from scratch. It then measures the
hydrogen-oxygen-hydrogen angle of the two conformers. Remember, the
OEChem TK definition of conformer is very loose, any set of Cartesian
coordinates constitutes another conformer. In this case the
configuration of the atoms to each other is the same, the location of
the molecule has just changed. No effort is given to prevent the user
from creating duplicate conformers.
Listing 2: Creating conformers from explicit sets of coordinates
from openeye import oechem
from math import degrees
mol = oechem.OEMol()
o = mol.NewAtom(oechem.OEElemNo_O)
h1 = mol.NewAtom(oechem.OEElemNo_H)
h2 = mol.NewAtom(oechem.OEElemNo_H)
Acrds = [0.0, 0.0, 0.0,
0.9584, 0.0, 0.0,
-0.2392, 0.9281, 0.0]
# Grab the default conformer
Aconf = mol.GetConfs().next()
Aconf.SetCoords(oechem.OEFloatArray(Acrds))
Bcrds = [0.0, 0.0, 0.0,
0.9584, 0.0, 0.0,
-0.2392, -0.9281, 0.0]
Bconf = mol.NewConf(oechem.OEFloatArray(Bcrds))
print("1st Water Angle:", degrees(oechem.OEGetAngle(Aconf, h1, o, h2)))
print("2nd Water Angle:", degrees(oechem.OEGetAngle(Bconf, h1, o, h2)))
Warning
An OEMCMolBase
is constructed with
one conformer already present. Special care must be taken in
Listing 2
to insert the coordinates of
the first conformer into this conformer.
There is also a version of
NewConf
which takes an
OEMolBase and copies the coordinates of the passed
molecule into the new
conformer. NewConf
is
expecting that the molecule passed has the same graph as the
OEMCMolBase
which is the parent of the
new conformer. It is important to note that this version of
NewConf
can take any instance
of an OEMolBase, such as an
OEGraphMol or an OEMol. When an
OEMol is passed to
NewConf
, the coordinates of
the newly created conformer will come from the first conformation of
the molecule passed.
Listing 3
demonstrates how to use
NewConf
with another
conformer as the argument. The purpose of this code is to only copy
conformers within an arbitrary energy cutoff from the src
to the
dst
molecule.
Listing 3: Filtering conformers based upon energy cutoff
from openeye import oechem
import sys
ifs = oechem.oemolistream(sys.argv[1])
ofs = oechem.oemolostream(sys.argv[2])
for src in ifs.GetOEMols():
dst = oechem.OEMol(src.SCMol())
dst.DeleteConfs()
for conf in src.GetConfs():
if conf.GetEnergy() < 25.0:
dst.NewConf(conf)
if dst.NumConfs() > 0:
oechem.OEWriteMolecule(ofs, dst)
The dst
molecule is copy constructed from the single-conformer
representation (the first conformer) of the src
molecule. This is
to grab the molecule graph information and any other
non-conformational information. Then
DeleteConfs
is called to
get rid of the first conformer. It may be copied back in the conformer
loop later in the code if it meets the energy criteria.
Warning
An OEMCMolBase
with no conformers
can be dangerous to pass to other OEChem TK functions, so the
number of conformers is checked before calling
OEWriteMolecule
.
Input and Output¶
Molecule streams can read
both single and multi-conformer molecules from any file format. Many
of the file formats supported by OEChem TK are inherently a single
conformer format (SDF and MOL2, for example). However, a common
practice is to store multiple conformers in these files. OEChem TK
supports a rather advanced mechanism for recovering these separate
conformers into a single, multi-conformer
OEMCMolBase
. Note that this does not
apply to file formats where conformers are stored together, OEBinary
(.oeb), for example. OEBinary files store either single-conformer or
multi-conformer molecules explicitly, so the file itself determines
how to deal with conformers. Additionally, file formats that have no
notion of conformers (i.e. SMILES) are unaffected by this
feature.
The oemolistream.SetConfTest
method sets a functor
that is used to compare the graphs of incoming molecules in order to
determine whether to combine them. These functors are subclasses of
OEConfTestBase. Several predefined versions include:
- OEDefaultConfTest
The OEDefaultConfTest never combines connection tables into multi-conformer molecules.
- OEIsomericConfTest
The OEIsomericConfTest combines subsequent connection tables into a multi-conformer molecule if they:
Have the same title (optional)
Have the same numbers of atoms and bonds in the same order
Each atom and bond must have identical properties with its order correspondent in the subsequent connection table
Have the same atom and bond stereochemistry
No changes are made to the connection table.
The constructor for OEIsomericConfTest has a default argument for whether or not to compare titles. If the constructor is called with no arguments or with the argument
true
, the titles will be required to be the same. Otherwise, the titles will not be compared. In the latter instance, each conformer will have the individual title of its original connection table and the multi-conformer molecule will reflect the title of the active conformer.- OEOmegaConfTest
The OEOmegaConfTest is almost exactly the same as OEIsomericConfTest with the exception that differences in invertible nitrogens are disregarded. Therefore, invertible nitrogens with different stereo chemistry are considered conformers of the same molecule.
This definition is meant to be exactly in line with what Omega will generate. Therefore, OEOmegaConfTest can be used on non-
OEB
files to generated with Omega to recover the exact set of conformers Omega generated for a particular molecule.- OEAbsoluteConfTest
The OEAbsoluteConfTest combines subsequent connection tables into a multi-conformer molecule if they:
Have the same title (optional)
Have the same number of atoms and bonds in the same order
Each atom and bond must have identical properties with its order correspondent in the subsequent connection table
This conformer test sets all fully specified isomeric values to UNDEFINED.
The constructor for OEAbsoluteConfTest has a default argument for whether or not to compare titles. If the constructor is called with no arguments or with the argument true, the titles will be required to be the same. Otherwise, the titles will not be compared. In the latter instance, each conformer will have the individual title of its original connection table and the multi-conformer molecule will reflect the title of the active conformer.
- OEAbsCanonicalConfTest
The OEAbsCanonicalConfTest combines subsequent connection tables into a multi-conformer molecule if they:
Have the same absolute (non-isomeric) graph
This conformer test puts all of the molecules in their canonical atom order. In addition, all fully specified isomeric values are set to UNDEFINED.
Listing 4: Reading in multi-conformer molecules from single-conformer files
from openeye import oechem
import sys
mol = oechem.OEMol()
ifs = oechem.oemolistream(sys.argv[1])
ifs.SetConfTest(oechem.OEIsomericConfTest())
for mol in ifs.GetOEMols():
print(mol.GetTitle(), "has", mol.NumConfs(), "conformers")
Listing 4
will read multi-conformer
molecules from an input file based on
OEIsomericConfTest. This assumes that the conformers
are ordered next to each other in the input file.
Note
The OEIsomericConfTest constructor can be passed
false
to allow conformers to be combined when they have
different titles. This is very useful when dealing with files
created by programs that modify molecule titles to indicate
conformer number (i.e. acetsali_1, acetsali_2, acetsali_3). The
modified titles are accessible through the
GetTitle
method on each
individual OEConfBase
.
Dude, where’s my SD data?¶
SD tag data can be added to anything that derives from
OEMolBase, including
OEMCMolBase
or
OEConfBase
. Generally, OEChem TK will
never lose any data when reading or writing. However, there are
constraints placed on OEChem TK as to where the SD data must go based
upon the file format being used.
An ambiguity occurs when adding SD tag data to an
OEMCMolBase
and then writing it to
SDF. SDF files do not support multiple conformers. However, OEChem TK
can automatically read consecutive conformers out of a SDF file into a
OEMCMolBase
. To preserve the SD data
OEChem TK has no choice but to push the data onto the conformers.
OEB files do not have this restriction upon them because they do support multi-conformer molecules. The following table shows how to round-trip SD tag data through the SDF and OEB formats.
Attached To |
Written To |
Read Into |
Attached To |
---|---|---|---|
Note
The OEFormat_CSV
format behaves identical to
the OEFormat_SDF
format for how SD data is
handled.
Practically, it is best to never attach SD tag data to an
OEMCMolBase
. This should only be done
as a space optimization when it is assured that the multi-conformer
molecule will only be written to OEB.
To this end, when an OEMol copy constructs from an OEGraphMol the SD tag data is attached to the first conformer.
See also
Generic Data chapter
SD Tagged Data Manipulation section
PDB Tagged Data Manipulation section