Atom and Bond Traversal

OEChem molecules contain atoms and bonds which have APIs described by the OEAtomBase and OEBondBase abstract base-classes respectively. Atoms and bonds in OEChem can only be created and destroyed in the context of an OEChem molecule. While they can be accessed as pointers through various member functions of molecules, their memory is owned by the molecules and they are deallocated during the molecules’ destruction. Attempting to use references to atoms or bonds of a molecule after the molecule has gone out of scope results in undefined behavior.

Iterators

The standard way of processing each item or member of a set or collection in OEChem is by the use of an iterator. The use of iterators is a common abstraction (or design pattern) in object oriented programming because it hides the way the collection/container is implemented from the user. Hence a set of atoms could be implemented internally as an array, a linked list, a hash table, or any similar data structure, but its behavior to the programmer is independent of the actual implementation. An iterator can be thought of as a current position indicator.

OEChem iterators make use of C++’s template mechanism. The use of templates allows the functionality of an iterator to be specified (implemented) independently of the type of the collection being iterated over. An iterator over a type T, has the type OEIter<T>. Hence, an iterator over the atoms of a molecule (represented by OEAtomBase) has the type OEIter<OEAtomBase> and an iterator over the bonds of a molecule has type OEIter<OEBondBase>.

The three most common operations of an OEIter are assignment, testing, and increment. These three iterator methods allow OEChem iterators to resemble conventional for loops in high level programming languages. Assignment specifies which collection/container the iterator is intended to loop over, testing determines whether the iterator has seen all of the items, and increment advances the iterator to the next position.

One possible source of confusion is that most functions and methods that return an iterator actually return a result of type OEIterBase<T> rather than OEIter<T>. The template class OEIterBase<T> is an internal abstraction used by OEChem, and should be treated as an opaque type by the user. Suffice it to say that values of type OEIterBase<T> can be assigned to variables of type OEIter<T> as created by the user.

Another technical point is that OEChem iterators only support the prefix ++ operator, and not the suffix ++ operator. This means that in order to advance the iterator, users must write ++i and not i++. This is actually a performance issue, since in C and C++ the operator i++ must make a copy of its argument. This is to support the syntax j = i++ where j is assigned the value of i before the increment. This copying may potentially be expensive and must be performed even if the value is not assigned. For primitive types such as integers, most C/C++ compilers can determine the value is not used and optimize i++ to ++i. However, for C++ classes, most compilers are unable to perform this optimization, hence ++i is the preferred idiom. Even if OEChem changed the semantics of i++ to perform the same thing as ++i and return the value after the increment, the i++ form is marginally less efficient (requiring an “invisible” integer argument to be passed to the operator). Hence OpenEye’s policy is to only implement the “correct” behavior and hope that users of OEChem will adopt ++i even for integer loops as good coding style.

Finally, the template OEIter is defined in the OESystem namespace rather than the OEChem namespace. This is because iterators (like random number generators) are not chemistry specific, and the use of two namespaces makes this explicit. It does however mean that using namespace OESystem; is required, as shown in our examples.

Atom and Bond Iteration

Listing 1 shows the minimal use of OEChem‘s iterators. These examples use the OEMolBase methods GetAtoms and GetBonds, which return iterators over the atoms and bonds of a molecule, respectively.

Listing 1: Using iterators to loop over atoms and bonds

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cocc1");

  cout << "atoms" << endl;
  for (OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
    cout << atom->GetAtomicNum() << endl;

  cout << "bonds" << endl;
  for (OEIter<OEBondBase> bond = mol.GetBonds(); bond; ++bond)
    cout << bond->GetOrder() << endl;

  return 0;
}

One point to notice is that once again C++’s destructors mean that it is not necessary to explicitly deallocate or destroy the iterator after use. Once the variable goes out of scope, it is cleaned up automatically.

Note

Listing 1 introduced the GetAtomicNum and GetOrder methods. These and other OEAtomBase and OEBondBase methods will be covered in more detail in chapters Atom Properties and Bond Properties, respectively.

Bonds of an Atom Iteration

The exact same idiom is used for iterating over the bonds attached to an atom. The GetBonds method returns an iterator over the bonds connected to that atom. Listing 2 shows how to use this iterator to determine the explicit degree of an atom, i.e. the number of bonds to it, not including bonds to implicit hydrogen atoms.

Listing 2: Looping over the bonds of an atom

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;

unsigned int MyGetExplicitDegree(const OEAtomBase *atm)
{
  unsigned int result = 0;
  for (OEIter<OEBondBase> bond = atm->GetBonds(); bond; ++bond)
    ++result;
  return result;
}

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cocc1Br");
  
  for (OEIter<OEAtomBase> atom=mol.GetAtoms(); atom; ++atom)
    std::cout << "Atom "        << atom->GetIdx() << 
                 " has degree " << MyGetExplicitDegree(atom) << std::endl;

  return 0;
}

Atom Neighbor Iteration

Often it is not the bonds around the atoms that you wish to loop over, but the neighboring atoms. One way to do this would be to use the GetBonds method described in the previous section and use the GetNbr method on each OEBondBase to get the atom across the bond from the input atom.

Listing 3: Finding the neighbors of an atom (version 1)

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cocc1Br");
  
  for (OEIter<OEAtomBase> atom=mol.GetAtoms(); atom; ++atom)
  {
    cout << "Atom: " << atom->GetIdx() << " Neighbors:";
    for (OEIter<OEBondBase> bond = atom->GetBonds(); bond; ++bond)
      cout << " " << bond->GetNbr(atom)->GetIdx();
    cout << endl;
  }
  return 0;
}

However this can be done even more conveniently using the GetAtoms method of an OEAtomBase directly, which allows loops over the neighbor atoms.

Listing 4: Finding the neighbors of an atom (version 2)

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cocc1Br");
  
  for (OEIter<OEAtomBase> atom=mol.GetAtoms(); atom; ++atom)
  {
    cout << "Atom: " << atom->GetIdx() << " Neighbors:";
    for (OEIter<OEAtomBase> nbor = atom->GetAtoms(); nbor; ++nbor)
      cout << " " << nbor->GetIdx();
    cout << endl;
  }
  return 0;
}

Atom or Bond Subset Iteration

It can sometimes be useful to loop over a subset of the atoms or bonds of a molecule. Traditionally, this is done with if statements inside a loop, but it can sometimes be cleaner and more convenient to subset the members being looped over inside the iterator itself. To do this, many of OEChem‘s iterator generation functions (such as GetAtoms) can take an argument which determines which subset of the object to loop over (these functions are called functors are detailed in the chapter Predicates Functors). The details of these functions are not important here. Instead, a programmer can simply use the predefined functors to control their loops. Listing 5 shows the use of the predicate OEHasAtomicNum to loop over only carbon atoms in a molecule.

Listing 5: Looping over carbon atoms only

#include "openeye.h"
#include "oechem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1c(Br)occ1CCC");
  
  cout << "Carbon atoms:";
  OEIter<OEAtomBase> atom;
  for (atom=mol.GetAtoms(OEHasAtomicNum(OEElemNo::C)); atom; ++atom)
    cout << ' ' << atom->GetIdx();
  cout << endl;

 return 0;
}

See also

For a complete list of built-in predicates, see Built-in Functors section.

Iterator Methods

The preceding examples show how to use an OEChem iterator to loop over objects. OEChem iterators provide four operators to allow the user to access the object at the current iterator position.

The implicit cast operator A * or operator-> may be used to get a pointer to the current object. Also, operator* may be used to get a reference to the current object.

For example, if variable iter has type OEIter<T>, then (T*)iter is a pointer to the current object of type T *, and *iter is a reference to the current object of type T&. These operators mean that in most cases an OEChem iterator OEIter<T> behaves identically to a T *.

The following two examples demonstrate how iterators and pointers behave similarly. They are functionally equivalent. The only difference is that the second example assigns the iterator to an OEAtomBase pointer before calling the GetAtomicNum method.

Dereferencing iterators

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol,"c1ccccc1");

  for (OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
  {
    cout << atom->GetAtomicNum() << endl;
  }
  return 0;
}

Assigning iterators to pointers

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol,"c1ccccc1");

  for (OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
  {
    const OEAtomBase *aptr = atom;
    cout << aptr->GetAtomicNum() << endl;
  }
  return 0;
}

The implicit cast of OEIter<T> to T* is most useful when passing the object to a function which takes T by pointer.

Passing iterators to functions

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;
using namespace std;

void PrintAtom(const OEAtomBase *atom)
{
  cout << atom->GetAtomicNum();

  if(atom->IsAromatic())
    cout << " Is Aromatic";
  else
    cout << " Isn't Aromatic";

  cout << endl;
}

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccccc1");

  for(OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
    PrintAtom(atom);

  return 0;
}

Iterators offer a much wider range of iteration possibilities. For example, the iterator can be reused by using the ToFirst method. Or, the order of iteration can be rearranged with the Sort method.

The following table describes the full set of features offered by iterators.

Description C++ Code
Increment ++i
Increment by n i += n
Decrement --i
Decrement by n i -= n
Go to first i.ToFirst()
Go to last i.ToLast()
Access current object i->MethodName()
Validity if ((bool)i)
Sorting i.Sort(predicate)

Listing 6 shows how to use an OEAtomBase iterator to loop over the atoms in a molecule in reverse order and print their atomic numbers.

Note

The order of the atoms returned by OEMolBase::GetAtoms can be controlled by OEMolBase::OrderAtoms.

Listing 6: Looping over atoms in reverse order

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OEChem;
using namespace OESystem;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "n1ccccc1");
  
  OEIter<OEAtomBase> atom=mol.GetAtoms();
  for (atom.ToLast(); atom; --atom)
    std::cout << atom->GetAtomicNum() << std::endl;

  return 0;
}