Predicates Functors

A functor (function object) is simply any object that can be called as if it is a function i.e. an object of a class that overload operator() the function call operator. A functor can be considered as a C++ equivalent of function pointers in C. However, functors can also maintain state, be copied, created and destroyed.

Functors that return bool are an important special case. An unary function whose return type is bool is called a predicate.

In OEChem TK, these functors are often passed into another function. The functors are then called from inside the second function. This is the concept of a callback, because the second function provides the argument and ‘call’s back’ to the functor which was passed into the function. Generator method such as OEMolBase::GetAtoms can take a functor as an argument and use the callback mechanism to iterate over atoms that satisfy the functor passed to it. See example in Atom or Bond Subset Iteration section.

In the example below, the function CountAtoms loops over the atoms and performs a call-back to the predicate functor pred for each atom. If the predicate returns true, a counter is incremented. The main function passes OEIsOxygen predefined atom predicates to the CountAtoms function that counts the number of oxygen atoms in the molecule. (Please note that this function is already implemented in OEChem TK and called OECount.)

Listing 1: Using functor callbacks

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

unsigned int CountAtoms(const OEMolBase& mol,const OEUnaryPredicate<OEAtomBase> &pred)
{
  unsigned int counts = 0;
  for (OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
     if (pred(atom))
        counts += 1;
  return counts;	
}

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cc[nH]c1CC2COCNC2");

  cout << "Number of oxygen atoms = " << CountAtoms(mol,OEIsOxygen()) << endl;
}

Built-in Functors

There are many useful functors already defined in OEChem TK. These can be used by programmers with little or no understanding of the details of how functors work. A programmer can simply pass them to one of the many OEChem TK functions and methods which take predicates as arguments.

Atom Functors

Access Functor Name
ring atoms OEAtomIsInRing
chain atoms OEAtomIsInChain
atom with specified atom index OEHasAtomIdx
atom with selected atom index OEAtomIdxSelected
atom with specified atom name OEHasAtomName
atoms with specified atom stereo OEHasAtomStereoSpecified
atoms with specified formal charge OEHasFormalCharge
atoms with specified number of heavy atom neighbors OEHasHvyDegree
aromatic atoms OEIsAromaticAtom
atoms with specific hybridization OEIsAtomHybridization
chiral atoms OEIsChiralAtom
atoms with anisotropic B-factor parameters OEHasAnisou
atoms with specified map index OEHasMapIdx
atoms representing R-Groups OEIsRGroup
\(n^{th}\) atom OENthAtom

Listing 2: Using predefined atom functors

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cc[nH]c1CC2COCNC2");

  cout << "Number of heavy atoms = " << OECount(mol,OEIsHeavy()) << endl;
  cout << "Number of ring atoms  = " << OECount(mol,OEAtomIsInRing()) << endl;  
  return 0;
}

The output of the preceding program is the following:

Number of heavy atoms = 12
Number of ring atoms  = 11

Atomic Number Functors

Access Functor Name
atoms with specified atomic number OEHasAtomicNum
carbon atoms OEIsCarbon
halogen atoms OEIsHalogen
heavy atoms OEIsHeavy
hetero atoms OEIsHetero
explicit hydrogen atoms OEIsHydrogen
metal atoms OEIsMetal
nitrogen atoms OEIsNitrogen
oxygen atoms OEIsOxygen
sulfur atoms OEIsSulfur
phosphorus atoms OEIsPhosphorus
non-carbon atoms OEIsPolar
polar hydrogen atoms OEIsPolarHydrogen

Please note that the following two lines produce the same result.

  cout << "Number of oxygen atoms = " << OECount(mol,OEHasAtomicNum(OEElemNo::O)) << endl;;
  cout << "Number of oxygen atoms = " << OECount(mol,OEIsOxygen()) << endl;

Bond Functors

Access Functor Name
ring bonds OEBondIsInRing
chain bonds OEBondIsInChain
bond with specified bond index OEHasBondIdx
bond with selected bond index OEBondIdxSelected
bonds with specified bond order OEHasOrder
rotatable bonds OEIsRotor
chiral bonds OEIsChiralBond
bonds with specific bond stereo OEHasBondStereoSpecified
aromatic bonds OEIsAromaticBond

Listing 3: Using predefined bond functors

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol,"CC(=O)Nc1c[nH]cc1");
  cout << "Number of ring bonds  = " << OECount(mol,OEBondIsInRing()) << endl;
  cout << "Number of rotor bonds = " << OECount(mol,OEIsRotor()) << endl;
  return 0;
}

The output of the preceding program is the following:

Number of ring bonds  = 5
Number of rotor bonds = 2

Group Functors

Access Functor Name
groups with a specific atom OEHasAtomInGroup
groups with a specific bond OEHasBondInGroup
groups with a specific type OEHasGroupType
groups that store MDL stereo information OEIsMDLStereoGroup

Reaction Component Functors

Access Functor Name
atoms of the catalysts or solvents of a reaction OEAtomIsInAgent
atoms of the product molecule(s) OEAtomIsInProduct
atoms of the reactant molecule(s) OEAtomIsInReactant

Conformer Functors

Access Functor Name
conformer with specified index OEHasConfIdx
conformer with selected index OEConfIdxSelected

Residue Data Functors

Access Functor Name
atoms with specified chain id OEHasChainID
atoms with specified residue number OEHasResidueNumber
atoms with an alternate location OEHasAlternateLocation
atoms with specified fragment number OEHasFragmentNumber
alpha atoms or peptides OEIsCAlpha

Composition Functors

Occasionally, one may want to use a logical operator to join two or more functors. The following table shows the composition functors defined in OEChem TK.

Composition Functors
Composition Functor Description Example of atom composition functors
OENot logical not OENot<OEAtomBase>
OEOr logical or OEOr<OEAtomBase>
OEAnd logical and OEAnd<OEAtomBase>

Each composition functor takes the appropriate number of predicates as arguments and generates a single unary predicate. The following example demonstrates how to use composition functors to build expressions from OEChem TK‘s predefined atom predicates.

Listing 4: Combining predefined atom predicates

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cnc(O)cc1CCCBr");
  
  cout << "Number of chain atoms = " << 
          OECount(mol,OENot<OEAtomBase>(OEAtomIsInRing())) << endl;

  cout << "Number of aromatic nitrogens = " << 
          OECount(mol,OEAnd<OEAtomBase>(OEIsNitrogen(),OEIsAromaticAtom())) << endl;
 
  cout << "Number of non-carbons = " << 
          OECount(mol,OENot<OEAtomBase>(OEHasAtomicNum(OEElemNo::C))) << endl;

  cout << "Number of nitrogen and oxygen atoms = " << 
          OECount(mol,OEOr<OEAtomBase>(OEHasAtomicNum(OEElemNo::N),
                                       OEHasAtomicNum(OEElemNo::O))) << endl;

  return 0;
}

The OECount function returns the number or objects (in this case atoms) matching the given predicate argument.

The output of the preceding program is the following:

Number of chain atoms = 5
Number of aromatic nitrogens = 1
Number of non-carbons = 3
Number of nitrogen and oxygen atoms = 2

Thought the explicit template type instantiation isn’t strictly necessary, in practice it is required to help several parsers make it through the expression. As a convenience to programmers, three related template free functions have been defined. These are operator &&, operator ||, and operator !, which take one or more OEUnaryPredicates as arguments and return the appropriate composition predicate. Not only do these make code much easier to read, but in our experience, they also make the code easier for C++ parsers to parse. The following example is identical to the previous composition listing except that the composition predicates have been replaced by the operator free-functions.

  cout << "Number of chain atoms = " << 
          OECount(mol, !OEAtomIsInRing()) << endl;

  cout << "Number of aromatic nitrogens = " << 
          OECount(mol,OEIsNitrogen() && OEIsAromaticAtom()) << endl;
 
  cout << "Number of non-carbons = " << 
          OECount(mol, !OEHasAtomicNum(OEElemNo::C)) << endl;

  cout << "Number of nitrogen and oxygen atoms = " << 
          OECount(mol,OEHasAtomicNum(OEElemNo::N) || OEHasAtomicNum(OEElemNo::O)) << endl;

Composition functors can be used similarly to combine predefined bond predicates.

Listing 5: Combining predefined bond predicates

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "N#CCC1CCNC=C1");

  cout << "Number of non-rotatable bonds = "
       << OECount(mol,OENot<OEBondBase>(OEIsRotor())) << endl;

  cout << "Number of ring double bonds = "
       << OECount(mol,OEAnd<OEBondBase>(OEBondIsInRing(),OEHasOrder(2))) << endl;

  cout << "Number of double or triple bonds = "
       << OECount(mol,OEOr<OEBondBase>(OEHasOrder(2),OEHasOrder(3))) << endl;

  return 0;
}

The output of the preceding program is the following:

Number of non-rotatable bonds = 8
Number of ring double bonds = 1
Number of double or triple bonds = 2

User Defined Functors

While many predefined functors exist in OEChem TK, it is not difficult to find a situation which calls for a new user-defined functor.

User-defined functor can be written by deriving from the OEUnaryPredicate base template class.

The following example shows a user defined atom functor which returns true for aliphatic nitrogens.

Listing 6: User defined atom predicate

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

class PredAliphaticNitrogen : public OEUnaryPredicate<OEAtomBase>
{
   public:
      bool operator()(const OEAtomBase &atom) const
      {     
         return atom.IsNitrogen() && ! atom.IsAromatic();
      }
      OEUnaryFunction<OEAtomBase,bool> *CreateCopy() const
      {
         return new PredAliphaticNitrogen;
      }
};

int main() 
{
   OEGraphMol mol;
   OESmilesToMol(mol,"c1cc[nH]c1CC2COCNC2");
   cout << "Number of aliphatic N atoms = "
	<< OECount(mol,PredAliphaticNitrogen()) << endl;
   return 0;
}

The output of the preceding program is the following:

Number of aliphatic N atoms = 1

A bond predicate can be similarly defined by deriving from the OEUnaryBondPred class.

Listing 7: User defined bond predicate

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

class PredHasDoubleBondO : public OEUnaryPredicate<OEAtomBase>
{
   public:
      bool operator()(const OEAtomBase &atom) const
      {     
	for (OEIter<OEBondBase> bond = atom.GetBonds(); bond; ++bond)
 	   if (bond->GetOrder() == 2 && bond->GetNbr(&atom)->IsOxygen())
              return true;
	 return false;
      }
      OEUnaryFunction<OEAtomBase,bool> *CreateCopy() const
      {
	 return new PredHasDoubleBondO;
      }
};

class PredAmideBond : public OEUnaryPredicate<OEBondBase>
{
   public:
      bool operator()(const OEBondBase &bond) const
      {
	 if (bond.GetOrder() != 1)
	    return false;
	 const OEAtomBase* atomB = bond.GetBgn();
	 const OEAtomBase* atomE = bond.GetEnd();
	 PredHasDoubleBondO pred;
	 if (atomB->IsCarbon() && atomE->IsNitrogen() && pred(*atomB)) 
 	    return true;
	 if (atomB->IsNitrogen() && atomE->IsCarbon() && pred(*atomE))
            return true;
	 return false;
      }
      OEUnaryFunction<OEBondBase,bool> *CreateCopy() const
      {
	 return new PredAmideBond;
      }
};

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol,"CC(=O)Nc1c[nH]cc1");
  cout << "Number of amide bonds = " 
       << OECount(mol,PredAmideBond()) << endl;
  return 0;
}

The output of the preceding program is the following:

Number of amide bonds = 1

One advantage of functors over function pointers is that they can hold state. Since this state is held by the instance of the object it can be thread safe (unlike static-variables inside functions used with function pointers). The state of a functor can be initialized at construction. For instance, OEHasAtomicNum functor takes an argument on construction which defines which atomic number is required for the functor to return true.

Listing 8: User defined atom predicate with state

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

class PredAtomicNumList : public OEUnaryPredicate<OEAtomBase>
{
   public:
      PredAtomicNumList(const vector<unsigned int>& a) : alist(a)
      {}
      bool operator()(const OEAtomBase &atom) const
      {
         return find(alist.begin(),alist.end(),atom.GetAtomicNum()) != alist.end();
      }
      OEUnaryFunction<OEAtomBase,bool> *CreateCopy() const
      {
         return new PredAtomicNumList(alist);
      }
   private:
      vector<unsigned int> alist;
};

int main() 
{
   OEGraphMol mol;
   OESmilesToMol(mol,"c1cc[nH]c1CC2COCNC2");
   vector<unsigned int> alist;
   alist.push_back(OEElemNo::O);
   alist.push_back(OEElemNo::N);
   cout << "Number of oxygen or nitrogen atoms = " 
        << OECount(mol,PredAtomicNumList(alist)) << endl;
   return 0;
}

Functor substructure-based matching

The Listing 6 shows an example how to create a user-defined atom predicate. OEChem TK also provides a functor template, called OEMatchFunc, that allows convenient substructure-based atom matching.

In the following example functors are initialized with a SMARTS string. These functors return true only if the atom matches the substructure pattern specified in construction.

Listing 9: Functor substructure-based matching

#include "openeye.h"
#include "oechem.h"
#include "oesystem.h"

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol,"C1(Cl)C(N)C(F)OC1C(=O)NCCCN");

  OEMatchFunc<OEAtomBase> NonAmideNitrogenPred("[N;!$(NC=O)]");
  cout << "Number of non-amide nitrogen = " << OECount(mol, NonAmideNitrogenPred) << endl;

  OEMatchFunc<OEAtomBase> FiveMemberedRingOxygenPred("[O;r5]");
  cout << "Number of 5-membered ring oxygen = " << OECount(mol, FiveMemberedRingOxygenPred) << endl;

  OEMatchFunc<OEAtomBase> CarbonAttachedToHalogenPred("[#6][Cl,Br,F]");
  cout << "Number of carbon attached to halogen = " << OECount(mol, CarbonAttachedToHalogenPred) << endl;
  return 0;
}

The output of Listing 9 is the following:

Number of non-amide nitrogen = 2
Number of 5-membered ring oxygen = 1
Number of carbon attached to halogen = 2

Molecule Partitioning

The OESubsetMol function can take any atom predicate as an argument and generate a subset molecule from only atoms for which the specified predicate returns true. In the following example, ring atoms are extracted from a molecule by using the OEAtomIsInRing atom functor.

Listing 10: Ring system extraction

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;
using namespace std;

int main() 
{
   OEGraphMol mol;
   OESmilesToMol(mol,"c1cc[nH]c1CC2COCNC2");
   OEGraphMol submol;
   OESubsetMol(submol,mol,OEAtomIsInRing(),true);
   cout << OEMolToSmiles(submol) << endl;
   return 0;
}

The output of Listing 10 is the following:

c1cc[nH]c1.C1CNCOC1

In the following example, ring systems are extracted from a molecule by using OEPartPred functor.

Listing 11: Ring system extraction

#include <openeye.h>
#include <oechem.h>
#include <oesystem.h>

using namespace OESystem;
using namespace OEChem;
using namespace std;

int main() 
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1cc[nH]c1CC2COCNC2");
  unsigned int* rings = new unsigned int[mol.GetMaxAtomIdx()];
  unsigned int nrrings = OEDetermineRingSystems(mol,rings);
  OEPartPred pred(rings,mol.GetMaxAtomIdx());
  cout << "Number of rings = " << nrrings << endl;

  for (unsigned int r = 1; r < nrrings+1; ++r)
  {
    pred.SelectPart(r);
    OEGraphMol ringmol;
    OESubsetMol(ringmol,mol,pred,true);
    cout << r << " -> " << OEMolToSmiles(ringmol) << endl;
  }  
  delete[] rings;
}

The output of Listing 11 is the following:

Number of rings = 2
1 -> c1cc[nH]c1
2 -> C1CNCOC1