Atom and Bond Traversal¶
OEChem TK* molecules contain atoms and bonds which have APIs described
by the OEAtomBase
and
OEBondBase
abstract base-classes
respectively. Atoms and bonds in OEChem TK can only be created and
destroyed in the context of an OEChem TK molecule. While they can be
accessed as pointers through various member functions of molecules,
their memory is owned by the molecules and they are deallocated during
the molecules’ destruction. Attempting to use references to atoms or
bonds of a molecule after the molecule has gone out of scope results
in undefined behavior.
Iterators¶
The standard way of processing each item or member of a set or collection in OEChem TK is by the use of an iterator. The use of iterators is a common abstraction (or design pattern) in object oriented programming because it hides the way the collection/container is implemented from the user. Hence a set of atoms could be implemented internally as an array, a linked list, a hash table, or any similar data structure, but its behavior to the programmer is independent of the actual implementation. An iterator can be thought of as a current position indicator.
OEChem TK iterators make use of C++’s template mechanism. The use of
templates allows the functionality of an iterator to be specified
(implemented) independently of the type of the collection being
iterated over. An iterator over a type T
, has the type
OEIter<T>
. Hence, an iterator over the atoms of a molecule
(represented by OEAtomBase) has the type
OEIter<OEAtomBase>
and an iterator over the bonds of a molecule
has type OEIter<OEBondBase>
.
The three most common operations of an OEIter are
assignment, testing, and increment. These three iterator methods
allow OEChem TK iterators to resemble conventional for
loops in high
level programming languages. Assignment specifies which
collection/container the iterator is intended to loop over, testing
determines whether the iterator has seen all of the items, and
increment advances the iterator to the next position.
One possible source of confusion is that most functions and methods
that return an iterator actually return a result of type
OEIterBase<T>
rather than
OEIter<T>
. The template class
OEIterBase<T>
is an internal
abstraction used by OEChem TK, and should be treated as an opaque type
by the user. Suffice it to say that values of type
OEIterBase<T>
can be assigned to
variables of type OEIter<T>
as created by
the user.
Another technical point is that OEChem TK iterators only support the
prefix ++
operator, and not the suffix ++
operator. This
means that in order to advance the iterator, users must write ++i
and not i++
. This is actually a performance issue, since in C and
C++ the operator i++
must make a copy of its argument. This is to
support the syntax j = i++
where j
is assigned the value of
i
before the increment. This copying may potentially be expensive
and must be performed even if the value is not assigned. For
primitive types such as integers, most C/C++ compilers can determine
the value is not used and optimize i++
to ++i
. However, for
C++ classes, most compilers are unable to perform this optimization,
hence ++i
is the preferred idiom. Even if OEChem TK changed the
semantics of i++
to perform the same thing as ++i
and return
the value after the increment, the i++
form is marginally less
efficient (requiring an “invisible” integer argument to be passed to
the operator). Hence OpenEye’s policy is to only implement the
“correct” behavior and hope that users of OEChem TK will adopt ++i
even for integer loops as good coding style.
Finally, the template OEIter is defined in the
OESystem
namespace rather than the OEChem
namespace. This is
because iterators (like random number generators) are not chemistry
specific, and the use of two namespaces makes this explicit. It does
however mean that using namespace OESystem;
is required, as shown
in our examples.
Atom and Bond Iteration¶
Listing 1
shows the minimal use of
OEChem TK’s iterators. These examples use the
OEMolBase
methods
GetAtoms
and
GetBonds
, which return
iterators over the atoms and bonds of a molecule, respectively.
Listing 1: Using iterators to loop over atoms and bonds
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1cocc1");
cout << "atoms" << endl;
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
cout << atom->GetAtomicNum() << endl;
cout << "bonds" << endl;
for (OEIter<const OEBondBase> bond = mol.GetBonds(); bond; ++bond)
cout << bond->GetOrder() << endl;
return 0;
}
One point to notice is that once again C++’s destructors mean that it is not necessary to explicitly deallocate or destroy the iterator after use. Once the variable goes out of scope, it is cleaned up automatically.
Note
Listing 1
introduced the
GetAtomicNum
and
GetOrder
methods. These and
other OEAtomBase and OEBondBase
methods will be covered in more detail in chapters
Atom Properties and Bond Properties,
respectively.
Bonds of an Atom Iteration¶
The exact same idiom is used for iterating over the bonds attached to
an atom. The GetBonds
method
returns an iterator over the bonds connected to that
atom. Listing 2
shows how to use this
iterator to determine the explicit degree of an atom, i.e. the
number of bonds to it, not including bonds to implicit hydrogen atoms.
Listing 2: Looping over the bonds of an atom
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
unsigned int MyGetExplicitDegree(const OEAtomBase *atm)
{
unsigned int result = 0;
for (OEIter<const OEBondBase> bond = atm->GetBonds(); bond; ++bond)
++result;
return result;
}
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1cocc1Br");
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
cout << "Atom " << atom->GetIdx() <<
" has degree " << MyGetExplicitDegree(atom) << endl;
return 0;
}
Atom Neighbor Iteration¶
Often it is not the bonds around the atoms that you wish to loop over,
but the neighboring atoms. One way to do this would be to use the
GetBonds
method described in
the previous section and use the
GetNbr
method on each
OEBondBase
to get the atom across the
bond from the input atom.
Listing 3: Finding the neighbors of an atom (version 1)
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1cocc1Br");
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
{
cout << "Atom: " << atom->GetIdx() << " Neighbors:";
for (OEIter<const OEBondBase> bond = atom->GetBonds(); bond; ++bond)
cout << " " << bond->GetNbr(atom)->GetIdx();
cout << endl;
}
return 0;
}
However this can be done even more conveniently using the
GetAtoms
method of an
OEAtomBase
directly, which allows
loops over the neighbor atoms.
Listing 4: Finding the neighbors of an atom (version 2)
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1cocc1Br");
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
{
cout << "Atom: " << atom->GetIdx() << " Neighbors:";
for (OEIter<const OEAtomBase> nbor = atom->GetAtoms(); nbor; ++nbor)
cout << " " << nbor->GetIdx();
cout << endl;
}
return 0;
}
Atom or Bond Subset Iteration¶
It can sometimes be useful to loop over a subset of the atoms or bonds
of a molecule. Traditionally, this is done with if statements
inside a loop, but it can sometimes be cleaner and more convenient to
subset the members being looped over inside the iterator itself. To
do this, many of OEChem TK’s iterator generation functions (such as
GetAtoms
) can take an argument
which determines which subset of the object to loop over (these
functions are called functors are detailed in the chapter
Predicate Functors). The details of these functions are not
important here. Instead, a programmer can simply use the predefined
functors to control their loops.
Listing 5
shows the use of the predicate
OEHasAtomicNum
to loop over only
carbon atoms in a molecule.
Listing 5: Looping over carbon atoms only
#include <openeye.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1c(Br)occ1CCC");
cout << "Carbon atoms:";
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(OEHasAtomicNum(OEElemNo::C)); atom; ++atom)
cout << ' ' << atom->GetIdx();
cout << endl;
return 0;
}
See also
For a complete list of built-in predicates, see Built-in Functors section.
Iterator Methods¶
The preceding examples show how to use an OEChem TK iterator to loop over objects. OEChem TK iterators provide four operators to allow the user to access the object at the current iterator position.
The implicit cast operator A *
or operator->
may be
used to get a pointer to the current object. Also,
operator*
may be used to get
a reference to the current object.
For example, if variable iter
has type OEIter<T>
, then
(T*)iter
is a pointer to the current object of type T *
,
and *iter
is a reference to the current object of type T&
.
These operators mean that in most cases an OEChem TK iterator
OEIter<T>
behaves identically to a T *
.
The following two examples demonstrate how iterators and pointers
behave similarly. They are functionally equivalent. The only
difference is that the second example assigns the iterator to an
OEAtomBase
pointer before calling the
GetAtomicNum
method.
Dereferencing iterators #include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol,"c1ccccc1");
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
cout << atom->GetAtomicNum() << endl;
return 0;
}
|
Assigning iterators to pointers #include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol,"c1ccccc1");
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
{
const OEAtomBase *aptr = atom;
cout << aptr->GetAtomicNum() << endl;
}
return 0;
}
|
The implicit cast of OEIter<T>
to T*
is most useful when passing the object to a function which takes
T
by pointer.
Passing iterators to functions
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
void PrintAtom(const OEAtomBase *atom)
{
cout << atom->GetAtomicNum();
if (atom->IsAromatic())
cout << " Is Aromatic";
else
cout << " Isn't Aromatic";
cout << endl;
}
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1ccccc1");
for (OEIter<const OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
PrintAtom(atom);
return 0;
}
Iterators offer a much wider range of iteration possibilities. For
example, the iterator can be reused by using the
ToFirst
method. Or, the order of
iteration can be rearranged with the
Sort
method.
The following table describes the full set of features offered by iterators.
Description |
C++ Code |
---|---|
Increment |
|
Increment by n |
|
Decrement |
|
Decrement by n |
|
Go to first |
|
Go to last |
|
Access current object |
|
Validity |
|
Sorting |
Listing 6
shows how to use an
OEAtomBase iterator to loop over the atoms in a
molecule in reverse order and print their atomic numbers.
Note
The order of the atoms returned by
OEMolBase::GetAtoms
can be controlled by
OEMolBase::OrderAtoms
.
Listing 6: Looping over atoms in reverse order
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "n1ccccc1");
OEIter<const OEAtomBase> atom = mol.GetAtoms();
for (atom.ToLast(); atom; --atom)
cout << atom->GetAtomicNum() << endl;
return 0;
}