Atom, Bond, and Conformer Indices¶
The following methods return the unique index assigned to its associated object upon creation.
There is a parallel method SetIdx as well. This is an advanced API that should never be used.
Atoms, bonds and conformers are stored inside the molecule. If the molecule is deallocated, the cached atoms, bonds, and conformers can not be accessed any longer.
This index can be used to distinguish one object from another, as it
is unique amongst all objects of the same type of the same molecule.
Indices are also stable, meaning a given object will have the same
index throughout its lifetime, independent of any other molecule
manipulations, e.g., reordering the molecule
OEMolBase.OrderAtoms), or the creation or deletion
of other objects (
OEMolBase.DeleteAtom). The exception is the
These methods were designed with the intent of changing something about the underlying structure of the molecule, and thus are allowed to invalidate indices if they deem it necessary.
Indices can be assumed to be dense small integers greater than or equal to zero and less than the values returned by the following methods.
The index created on a new atom, bond, or conformer using
OEMCMolBase.NewConf respectively is guaranteed to
be greater than or equal to the index returned by the above methods.
Parallel Data Structures¶
Indices are ideal for indexing into densely packed arrays of
information about the molecule. Many OEChem TK functions use them to
this end, e.g.,
OEDetermineComponents. The coordinates of a molecule
are retrieved as an array of floating point values indexed by the atom
indices. The following example demonstrates how to use the atom
indices for a rudimentary XYZ file format writer.
Listing 1: Rudimentary XYZ writer
from openeye import oechem import sys if len(sys.argv) != 2: oechem.OEThrow.Usage("%s <input>" % sys.argv) ifs = oechem.oemolistream() if not ifs.open(sys.argv): oechem.OEThrow.Fatal("Unable to open %s" % sys.argv) for mol in ifs.GetOEGraphMols(): print(mol.NumAtoms()) print(mol.GetTitle()) coords = oechem.OEFloatArray(mol.GetMaxAtomIdx() * 3) mol.GetCoords(coords) for atom in mol.GetAtoms(): idx = atom.GetIdx() syb = oechem.OEGetAtomicSymbol(atom.GetAtomicNum()) print("%-3s%11.5f%11.5f%11.5f" % (syb, coords[idx * 3], coords[idx * 3 + 1], coords[idx * 3 + 2]))
Computers are very efficient at doing this sort of sequential lookup. Whenever an efficient temporary data structure is needed to track information about a molecule indices should be used.
Indices for Molecule Lookup Considered Harmful¶
Note that atom, bond, and conformer indices are not guaranteed to be sequential, or even created sequentially, and hence atom indices can not and should not be used to retrieve all of the atoms of a molecule. Even typing the following idiom may invalidate any chance of support and eliminate any glimmer of respect from OpenEye Scientific Software or the computational chemistry/cheminformatics community.
# Never ever, ever do this!!! for i in xrange(mol.NumAtoms()): atom = mol.GetAtom(OEHasAtomIdx(i)) # pretend atom is valid
There are far more efficient methods of crashing computer software that should be used instead.
The common misconception is that OEChem TK indices should be stored in
order to reference back to an atom in the molecule. This leads to the
idiom used in
Listing 2 which is technically
legal OEChem TK, however, is just as insidious as the previous code
snippet. The code is supposed to mimic a common technique to cache
particular atoms based on some expensive to calculate property. The
property in this instance is whether the atom is alpha beta
Listing 2: Evil atom cache
from openeye import oechem mol = oechem.OEGraphMol() # initialized somehow acache =  # cache of atoms for atom in mol.GetAtoms(oechem.OEHasAlphaBetaUnsat()): acache.append(atom.GetIdx()) # evil! # pretend this code is deep in some inner loop that needs to go fast for aidx in acache: catom = mol.GetAtom(oechem.OEHasAtomIdx(aidx)) # O(n) lookup! # do something with the cached atom "catom" catom.SetName("Hello World")
OEMolBase.GetAtom method performs a linear,
\(O(mol.NumAtoms())\), search over the molecule looking for the
first atom that matches the predicate. This leads to a multiplicative
effect when looping over multiple atoms, calling
GetAtom for each atom. The
resulting algorithm is quadratic \(O(mol.NumAtoms()^2)\), possibly
destroying any benefits of caching the user’s expensive per atom
The proper way to store references to atoms for later use is to
OEAtomBase itself. It is guaranteed that
these atoms will be valid for the lifetime of the molecule.
This is regardless of molecule manipulations, e.g., reordering the
OEMolBase.OrderAtoms), or the creation or
deletion of other atoms (
Compress methods are again exceptions to this rule.
Listing 3 demonstrates the proper way to
create a cache of atoms. It is important to remember that the
OEAtomBases in the container are only
valid while the molecule exists. Using the atoms after the molecule is
destroyed is undefined behavior (usually a segmentation fault).
Listing 3: Atom caching
from openeye import oechem mol = oechem.OEGraphMol() # initialized somehow acache =  # cache of atoms for atom in mol.GetAtoms(oechem.OEHasAlphaBetaUnsat()): acache.append(atom) # pretend this code is deep in some inner loop that needs to go fast for catom in acache: # do something with the cached atom "catom" catom.SetName("Hello World")