Atom, Bond, and Conformer Indices

The following methods return the unique index assigned to its associated object upon creation.

Note

There is a parallel method SetIdx as well. This is an advanced API that should never be used.

Warning

Atoms, bonds and conformers are stored inside the molecule. If the molecule is deallocated, the cached atoms, bonds, and conformers can not be accessed any longer.

Unique Identifiers

This index can be used to distinguish one object from another, as it is unique amongst all objects of the same type of the same molecule. Indices are also stable, meaning a given object will have the same index throughout its lifetime, independent of any other molecule manipulations, e.g., reordering the molecule (OEMolBase.OrderAtoms), or the creation or deletion of other objects (OEMolBase.NewAtom or OEMolBase.DeleteAtom). The exception is the following methods.

These methods were designed with the intent of changing something about the underlying structure of the molecule, and thus are allowed to invalidate indices if they deem it necessary.

Indices can be assumed to be dense small integers greater than or equal to zero and less than the values returned by the following methods.

The index created on a new atom, bond, or conformer using OEMolBase.NewAtom, OEMolBase.NewBond, or OEMCMolBase.NewConf respectively is guaranteed to be greater than or equal to the index returned by the above methods.

Parallel Data Structures

Indices are ideal for indexing into densely packed arrays of information about the molecule. Many OEChem TK functions use them to this end, e.g., OEAddMols, OESubsetMol, or OEDetermineComponents. The coordinates of a molecule are retrieved as an array of floating point values indexed by the atom indices. The following example demonstrates how to use the atom indices for a rudimentary XYZ file format writer.

Listing 1: Rudimentary XYZ writer

from openeye import oechem
import sys

if len(sys.argv) != 2:
    oechem.OEThrow.Usage("%s <input>" % sys.argv[0])

ifs = oechem.oemolistream()
if not ifs.open(sys.argv[1]):
    oechem.OEThrow.Fatal("Unable to open %s" % sys.argv[1])

for mol in ifs.GetOEGraphMols():
    print(mol.NumAtoms())
    print(mol.GetTitle())

    coords = oechem.OEFloatArray(mol.GetMaxAtomIdx() * 3)
    mol.GetCoords(coords)

    for atom in mol.GetAtoms():
        idx = atom.GetIdx()
        syb = oechem.OEGetAtomicSymbol(atom.GetAtomicNum())

        print("%-3s%11.5f%11.5f%11.5f" % (syb,
                                          coords[idx * 3],
                                          coords[idx * 3 + 1],
                                          coords[idx * 3 + 2]))

Computers are very efficient at doing this sort of sequential lookup. Whenever an efficient temporary data structure is needed to track information about a molecule indices should be used.

Indices for Molecule Lookup Considered Harmful

Note that atom, bond, and conformer indices are not guaranteed to be sequential, or even created sequentially, and hence atom indices can not and should not be used to retrieve all of the atoms of a molecule. Even typing the following idiom may invalidate any chance of support and eliminate any glimmer of respect from OpenEye, Cadence Molecular Sciences or the computational chemistry/cheminformatics community.

Warning

# Never ever, ever do this!!!
for i in xrange(mol.NumAtoms()):
   atom = mol.GetAtom(OEHasAtomIdx(i))
   # pretend atom is valid

There are far more efficient methods of crashing computer software that should be used instead.

The common misconception is that OEChem TK indices should be stored in order to reference back to an atom in the molecule. This leads to the idiom used in Listing 2 which is technically legal OEChem TK, however, is just as insidious as the previous code snippet. The code is supposed to mimic a common technique to cache particular atoms based on some expensive to calculate property. The property in this instance is whether the atom is alpha beta unsaturated.

Warning

Listing 2: Evil atom cache

from openeye import oechem

mol = oechem.OEGraphMol()  # initialized somehow

acache = []  # cache of atoms
for atom in mol.GetAtoms(oechem.OEHasAlphaBetaUnsat()):
    acache.append(atom.GetIdx())  # evil!

# pretend this code is deep in some inner loop that needs to go fast
for aidx in acache:
    catom = mol.GetAtom(oechem.OEHasAtomIdx(aidx))  # O(n) lookup!

    # do something with the cached atom "catom"
    catom.SetName("Hello World")

The OEMolBase.GetAtom method performs a linear, \(O(mol.NumAtoms())\), search over the molecule looking for the first atom that matches the predicate. This leads to a multiplicative effect when looping over multiple atoms, calling GetAtom for each atom. The resulting algorithm is quadratic \(O(mol.NumAtoms()^2)\), possibly destroying any benefits of caching the user’s expensive per atom calculation.

The proper way to store references to atoms for later use is to store the OEAtomBase itself. It is guaranteed that these atoms will be valid for the lifetime of the molecule.

This is regardless of molecule manipulations, e.g., reordering the molecule (OEMolBase.OrderAtoms), or the creation or deletion of other atoms (OEMolBase.NewAtom or OEMolBase.DeleteAtom). The Sweep and Compress methods are again exceptions to this rule.

Listing 3 demonstrates the proper way to create a cache of atoms. It is important to remember that the OEAtomBases in the container are only valid while the molecule exists. Using the atoms after the molecule is destroyed is undefined behavior (usually a segmentation fault).

Listing 3: Atom caching

from openeye import oechem

mol = oechem.OEGraphMol()  # initialized somehow

acache = []  # cache of atoms
for atom in mol.GetAtoms(oechem.OEHasAlphaBetaUnsat()):
    acache.append(atom)

# pretend this code is deep in some inner loop that needs to go fast
for catom in acache:
    # do something with the cached atom "catom"
    catom.SetName("Hello World")