Atom, Bond, and Conformer Indices

The following methods return the unique index assigned to its associated object upon creation.

OEAtomBase.GetIdx

OEBondBase.GetIdx

OEConfBase.GetIdx

Note

There is a parallel method SetIdx as well. This is an advanced API that should never be used.

Warning

Atoms, bonds and conformers are stored inside the molecule. If the molecule is deallocated, the cached atoms, bonds, and conformers can not be accessed any longer.

Unique Identifiers

This index can be used to distinguish one object from another, as it is unique amongst all objects of the same type of the same molecule. Indices are also stable, meaning a given object will have the same index throughout its lifetime, independent of any other molecule manipulations, e.g., reordering the molecule (OEMolBase.OrderAtoms), or the creation or deletion of other objects (OEMolBase.NewAtom or OEMolBase.DeleteAtom). The exception is the following methods.

OEMolBase.Sweep

OEMolBase.Compress

OEMolBase.UnCompress

OEMCMolBase.SweepConfs

These methods were designed with the intent of changing something about the underlying structure of the molecule, and thus are allowed to invalidate indices if they deem it necessary.

Indices can be assumed to be dense small integers greater than or equal to zero and less than the values returned by the following methods.

OEMolBase.GetMaxAtomIdx for atoms

OEMolBase.GetMaxBondIdx for bonds

OEMCMolBase.GetMaxConfIdx for conformers

The index created on a new atom, bond, or conformer using OEMolBase.NewAtom, OEMolBase.NewBond, or OEMCMolBase.NewConf respectively is guaranteed to be greater than or equal to the index returned by the above methods.

Parallel Data Structures

Indices are ideal for indexing into densely packed arrays of information about the molecule. Many OEChem TK functions use them to this end, e.g., OEAddMols, OESubsetMol, or OEDetermineComponents. The coordinates of a molecule are retrieved as an array of floating point values indexed by the atom indices. The following example demonstrates how to use the atom indices for a rudimentary XYZ file format writer.

Listing 1: Rudimentary XYZ writer

using System;
using OpenEye.OEChem;

public class XYZWriter
{
    public static int Main(string[] args)
    {
        if (args.Length != 1)
        {
            OEChem.OEThrow.Usage("XYZWriter <input>");
        }

        oemolistream ifs = new oemolistream();

        if (!ifs.open(args[0]))
        {
            OEChem.OEThrow.Fatal("Unable to open " + args[0]);
        }

        OEGraphMol mol = new OEGraphMol();
        while (OEChem.OEReadMolecule(ifs, mol))
        {
            Console.WriteLine(mol.NumAtoms());
            Console.WriteLine(mol.GetTitle());

            float[] coords = new float[mol.GetMaxAtomIdx() * 3];
            mol.GetCoords(coords);

            foreach (OEAtomBase atom in mol.GetAtoms())
            {
                uint idx = atom.GetIdx();

                Console.WriteLine("{0,-3}{1,11:0.00000}{2,11:0.00000}{3,11:0.00000}",
                                  OEChem.OEGetAtomicSymbol(atom.GetAtomicNum()),
                                  coords[idx * 3],
                                  coords[idx * 3 + 1],
                                  coords[idx * 3 + 2]);
            }
        }
        return 0;
    }
}

Computers are very efficient at doing this sort of sequential lookup. Whenever an efficient temporary data structure is needed to track information about a molecule indices should be used.

Indices for Molecule Lookup Considered Harmful

Note that atom, bond, and conformer indices are not guaranteed to be sequential, or even created sequentially, and hence atom indices can not and should not be used to retrieve all of the atoms of a molecule. Even typing the following idiom may invalidate any chance of support and eliminate any glimmer of respect from OpenEye, Cadence Molecular Sciences or the computational chemistry/cheminformatics community.

Warning

// Never ever, ever do this!!!
for (uint i=0; i<=mol.NumAtoms(); ++i)
{
   OEAtomBase atom = mol.GetAtom(new OEHasAtomIdx(i));
   // pretend atom is valid
}

There are far more efficient methods of crashing computer software that should be used instead.

The common misconception is that OEChem TK indices should be stored in order to reference back to an atom in the molecule. This leads to the idiom used in Listing 2 which is technically legal OEChem TK, however, is just as insidious as the previous code snippet. The code is supposed to mimic a common technique to cache particular atoms based on some expensive to calculate property. The property in this instance is whether the atom is alpha beta unsaturated.

Warning

Listing 2: Evil atom cache

using System;
using System.Collections.Generic;
using OpenEye.OEChem;

public class AtomSubsetEvil
{
    public static void Main(string[] argv) 
    {
        OEGraphMol mol = new OEGraphMol();
        OEChem.OESmilesToMol(mol, "CCCC=O");

        List<uint> acache = new List<uint>(); // cache of atoms
        foreach (OEAtomBase atom in mol.GetAtoms(new OEHasAlphaBetaUnsat()))
        {
            acache.Add(atom.GetIdx()); // evil!
        }

        // pretend this code is deep in some inner loop that needs to go fast
        foreach (uint aidx in acache)
        {
            OEAtomBase catom = mol.GetAtom(new OEHasAtomIdx(aidx)); // O(n) lookup!

            // do something with the cached atom "catom"
            catom.SetName("Hello World");
        }
    }
}

The OEMolBase.GetAtom method performs a linear, \(O(mol.NumAtoms())\), search over the molecule looking for the first atom that matches the predicate. This leads to a multiplicative effect when looping over multiple atoms, calling GetAtom for each atom. The resulting algorithm is quadratic \(O(mol.NumAtoms()^2)\), possibly destroying any benefits of caching the user’s expensive per atom calculation.

The proper way to store references to atoms for later use is to store the OEAtomBase itself. It is guaranteed that these atoms will be valid for the lifetime of the molecule.

This is regardless of molecule manipulations, e.g., reordering the molecule (OEMolBase.OrderAtoms), or the creation or deletion of other atoms (OEMolBase.NewAtom or OEMolBase.DeleteAtom). The Sweep and Compress methods are again exceptions to this rule.

Listing 3 demonstrates the proper way to create a cache of atoms. It is important to remember that the OEAtomBases in the container are only valid while the molecule exists. Using the atoms after the molecule is destroyed is undefined behavior (usually a segmentation fault).

Listing 3: Atom caching

using System;
using System.Collections.Generic;
using OpenEye.OEChem;

public class AtomSubsetGood
{
    public static void Main(string[] argv) 
    {
        OEGraphMol mol = new OEGraphMol();
        OEChem.OESmilesToMol(mol, "CCCC=O");

        List<OEAtomBase> acache = new List<OEAtomBase>(); // cache of atoms
        foreach (OEAtomBase atom in mol.GetAtoms(new OEHasAlphaBetaUnsat()))
        {
            acache.Add(atom);
        }

        // pretend this code is deep in some inner loop that needs to go fast
        foreach (OEAtomBase catom in acache)
        {
            // do something with the cached atom "catom"
            catom.SetName("Hello World");
        }
    }
}