Atom, Bond, and Conformer Indices¶
The following methods return the unique index assigned to its associated object upon creation.
Note
There is a parallel method SetIdx as well. This is an advanced API that should never be used.
Warning
Atoms, bonds and conformers are stored inside the molecule. If the molecule is deallocated, the cached atoms, bonds, and conformers can not be accessed any longer.
Unique Identifiers¶
This index can be used to distinguish one object from another, as it
is unique amongst all objects of the same type of the same molecule.
Indices are also stable, meaning a given object will have the same
index throughout its lifetime, independent of any other molecule
manipulations, e.g., reordering the molecule
(OEMolBase.OrderAtoms
), or the creation or deletion
of other objects (OEMolBase.NewAtom
or
OEMolBase.DeleteAtom
). The exception is the
following methods.
These methods were designed with the intent of changing something about the underlying structure of the molecule, and thus are allowed to invalidate indices if they deem it necessary.
Indices can be assumed to be dense small integers greater than or equal to zero and less than the values returned by the following methods.
OEMolBase.GetMaxAtomIdx
for atoms
OEMolBase.GetMaxBondIdx
for bonds
OEMCMolBase.GetMaxConfIdx
for conformers
The index created on a new atom, bond, or conformer using
OEMolBase.NewAtom
,
OEMolBase.NewBond
, or
OEMCMolBase.NewConf
respectively is guaranteed to
be greater than or equal to the index returned by the above methods.
Parallel Data Structures¶
Indices are ideal for indexing into densely packed arrays of
information about the molecule. Many OEChem TK functions use them to
this end, e.g., OEAddMols
,
OESubsetMol
, or
OEDetermineComponents
. The coordinates of a molecule
are retrieved as an array of floating point values indexed by the atom
indices. The following example demonstrates how to use the atom
indices for a rudimentary XYZ file format writer.
Listing 1: Rudimentary XYZ writer
from openeye import oechem
import sys
if len(sys.argv) != 2:
oechem.OEThrow.Usage("%s <input>" % sys.argv[0])
ifs = oechem.oemolistream()
if not ifs.open(sys.argv[1]):
oechem.OEThrow.Fatal("Unable to open %s" % sys.argv[1])
for mol in ifs.GetOEGraphMols():
print(mol.NumAtoms())
print(mol.GetTitle())
coords = oechem.OEFloatArray(mol.GetMaxAtomIdx() * 3)
mol.GetCoords(coords)
for atom in mol.GetAtoms():
idx = atom.GetIdx()
syb = oechem.OEGetAtomicSymbol(atom.GetAtomicNum())
print("%-3s%11.5f%11.5f%11.5f" % (syb,
coords[idx * 3],
coords[idx * 3 + 1],
coords[idx * 3 + 2]))
Computers are very efficient at doing this sort of sequential lookup. Whenever an efficient temporary data structure is needed to track information about a molecule indices should be used.
Indices for Molecule Lookup Considered Harmful¶
Note that atom, bond, and conformer indices are not guaranteed to be sequential, or even created sequentially, and hence atom indices can not and should not be used to retrieve all of the atoms of a molecule. Even typing the following idiom may invalidate any chance of support and eliminate any glimmer of respect from OpenEye, Cadence Molecular Sciences or the computational chemistry/cheminformatics community.
Warning
# Never ever, ever do this!!!
for i in xrange(mol.NumAtoms()):
atom = mol.GetAtom(OEHasAtomIdx(i))
# pretend atom is valid
There are far more efficient methods of crashing computer software that should be used instead.
The common misconception is that OEChem TK indices should be stored in
order to reference back to an atom in the molecule. This leads to the
idiom used in Listing 2
which is technically
legal OEChem TK, however, is just as insidious as the previous code
snippet. The code is supposed to mimic a common technique to cache
particular atoms based on some expensive to calculate property. The
property in this instance is whether the atom is alpha beta
unsaturated.
Warning
Listing 2: Evil atom cache
from openeye import oechem
mol = oechem.OEGraphMol() # initialized somehow
acache = [] # cache of atoms
for atom in mol.GetAtoms(oechem.OEHasAlphaBetaUnsat()):
acache.append(atom.GetIdx()) # evil!
# pretend this code is deep in some inner loop that needs to go fast
for aidx in acache:
catom = mol.GetAtom(oechem.OEHasAtomIdx(aidx)) # O(n) lookup!
# do something with the cached atom "catom"
catom.SetName("Hello World")
The OEMolBase.GetAtom
method performs a linear,
\(O(mol.NumAtoms())\), search over the molecule looking for the
first atom that matches the predicate. This leads to a multiplicative
effect when looping over multiple atoms, calling
GetAtom
for each atom. The
resulting algorithm is quadratic \(O(mol.NumAtoms()^2)\), possibly
destroying any benefits of caching the user’s expensive per atom
calculation.
The proper way to store references to atoms for later use is to store the OEAtomBase itself. It is guaranteed that these atoms will be valid for the lifetime of the molecule.
This is regardless of molecule manipulations, e.g., reordering the
molecule (OEMolBase.OrderAtoms
), or the creation or
deletion of other atoms (OEMolBase.NewAtom
or
OEMolBase.DeleteAtom
). The Sweep
and
Compress
methods are again exceptions to this rule.
Listing 3
demonstrates the proper way to
create a cache of atoms. It is important to remember that the
OEAtomBases
in the container are only
valid while the molecule exists. Using the atoms after the molecule is
destroyed is undefined behavior (usually a segmentation fault).
Listing 3: Atom caching
from openeye import oechem
mol = oechem.OEGraphMol() # initialized somehow
acache = [] # cache of atoms
for atom in mol.GetAtoms(oechem.OEHasAlphaBetaUnsat()):
acache.append(atom)
# pretend this code is deep in some inner loop that needs to go fast
for catom in acache:
# do something with the cached atom "catom"
catom.SetName("Hello World")