Generic Data

Previous chapters Molecule Properties, Atom Properties, Bond Properties have described how common global properties of molecule, atoms, and bonds can be modified and accessed. There are applications, however, when associating arbitrary data with objects such as molecules, atoms and bonds is necessary. OEChem TK provides a framework to solve this problem by allowing to attach generic data to an object by association either with an integer or character string, called tag identifier.

The following two snippets demonstrate how generic data (for example molecule weight) can be attached to a molecule:

const auto tag = OEGetTag("MolWeight");
mol.SetData(tag, OECalculateMolecularWeight(mol));
mol.SetData("MolWeight", OECalculateMolecularWeight(mol));

After annotation, the data can be accessed with the same integer or character string identifier:

cout << mol.GetData<double>(tag) << endl;
cout << mol.GetData<double>("MolWeight") << endl;

Warning

The integer tag of a generic data should always be allocated using the OEGetTag function.

The following table shows the basic methods of the OEBase class that allow the manipulation of generic data.

Methods to manipulate generic data

Method

Description

OEBase::SetData

sets a generic data associating it with the given tag

OEBase::AddData

adds a generic data associating it with the given tag

OEBase::HasData

determines whether a molecule has any generic data with a given tag

OEBase::GetData

returns the generic data associated with the given tag

OEBase::DeleteData

deletes all generic data with the given tag

OEBase::Clear

clears all stored generic data

The main difference between the OEBase::SetData method and the OEBase::AddData method is that if a data with the same identifier is already attached to an object then:

Furthermore, OEBase::SetData does not allow replacing an existing tag with a different data type:

const auto tag = OEGetTag("MolWeight");
double weight =  OECalculateMolecularWeight(mol);
mol.SetData(tag, weight);
mol.SetData(tag, int(weight));

The above code will throw the following warning:

Warning: data type mismatch found when using generic data

Attaching plain old data

The following simple code demonstrate how data calculated and attached to a molecule in one function can be accessed later on through the tag identifier.

Listing 1: Example of using generic data

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

void CalculateMoleculeWeight(OEMolBase& mol)
{
  mol.SetData("MolWeight", OECalculateMolecularWeight(mol));
}

void PrintMoleculeWeight(const OEMolBase& mol)
{
  const auto tag = OEGetTag("MolWeight");
  if (mol.HasData(tag))
    cout << "molecule weight = " << mol.GetData<double>(tag) << endl;
  else
    cout << "molecule weight is not calculated!" << endl;
}

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "C1CCCC(C(=O)O)C1");

  CalculateMoleculeWeight(mol);
  PrintMoleculeWeight(mol);

  return 0;
}

Note

It is a good programming practice to call OEBase::HasData in order to check whether or not a data exists before trying to access it by the OEBase::GetData method.

The OEBase::GetData method returns only the first instance of data stored previously with the associated data tag. Data can also be accessed by using the OEBase::GetDataIter method that returns an iterator over all data stored.

OEGraphMol mol;
OEParseSmiles(mol, "C1CCCC(C(=O)O)C1");

const auto activitytag = OEGetTag("activity");
mol.AddData( activitytag, string("antiarthritic"));
mol.AddData( activitytag, string("antiinflammatory"));
mol.SetData("weight", OECalculateMolecularWeight(mol));

for (OEIter<OEBaseData> gdata = mol.GetDataIter(); gdata; ++gdata)
{
  cout << OEGetTag(gdata->GetTag()) << ' ';
  if (gdata->GetDataType() == OEGetDataType<string>())
  {
    cout << OECastData<string>(gdata);
  }
  if (gdata->GetDataType() == OEGetDataType<double>())
  {
    cout << OECastData<double>(gdata);
  }
  cout << endl;
}

The output of code snippet above is the following:

activity antiarthritic
activity antiinflammatory
weight 128.16898

The OEBase::GetDataIter method can also take a tag identifier. In this case it iterates over only data associated with the given tag. For example, the following code will only prints out the two piece of ‘activity’ data.

const auto tag = OEGetTag("activity");
for (OEIter<OEBaseData> gdata = mol.GetDataIter(tag); gdata; ++gdata)
{
  cout << OEGetTag(gdata->GetTag()) << ' '
       << OECastData<string>(gdata) << endl;
}

Attaching data to atoms

Generic data can be attached to any object that derives from the OEBase class. The following program shows an example where hydrogen bonding donor property is attached as a bool value to the corresponding OEAtomBase object.

Listing 2: Example of attaching generic data to atoms

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

class IsDonorAtomPred : public OEUnaryPredicate<OEAtomBase>
{
  public:
    IsDonorAtomPred() = default;
    IsDonorAtomPred(const IsDonorAtomPred &) = default;
    IsDonorAtomPred& operator=(const IsDonorAtomPred &) = default;
    ~IsDonorAtomPred() = default;

    bool operator()(const OEAtomBase &atom) const
    {
      return atom.GetData<bool>("isdonor");
    }
    OEUnaryFunction<OEAtomBase, bool> *CreateCopy() const
    {
      return new IsDonorAtomPred;
    }

};

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1c(Cl)cncc1C(=O)O");

  OEMatchFunc<OEAtomBase> IsDonorAtom("[!H0;#7,#8]");
  for (OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
    atom->SetData("isdonor", IsDonorAtom(atom));

  cout << "Donor atoms: ";
  for (OEIter<const OEAtomBase> atom = mol.GetAtoms(IsDonorAtomPred()); atom; ++atom)
     cout << atom->GetIdx() << ' ' << OEGetAtomicSymbol(atom->GetAtomicNum());
  cout << endl;

  return 0;
}

See also

Attaching other objects

The type of the generic data is not restricted to fundamental data types of the programming language. High-level OEChem TK objects such as OEMolBase, OEAtomBase, OEBondBase OEScalarGrid, OESkewGrid and OESurface can also be stored through this mechanism. The following program demonstrates how to attach a subset of a molecule to the original molecule as generic data.

Listing 3: Example of attaching a molecule as generic data

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccccc1O");

  OEGraphMol frag;
  OESubsetMol(frag, mol, OEIsCarbon());

  mol.SetData("just_carbon", frag);
  OEGraphMol justCarbon = mol.GetData<OEGraphMol>("just_carbon");

  return 0;
}

Sequences of objects can be stored as well. The following example shows how to attach a sequence of atoms into an OEAtomBase object.

Listing 4: Example of attaching vector of atoms as generic data

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

void CollectIncorrectStereo(OEMolBase& mol)
{
  const auto tag = OEGetTag("incorrect_stereo_neighs");
  for (OEIter<OEAtomBase> atom = mol.GetAtoms(OEHasAtomStereoSpecified()); atom; ++atom)
  {
    if (atom->IsAromatic())
    {
      vector<OEAtomBase*> neighs;
      for (OEIter<OEAtomBase> n = atom->GetAtoms(); n; ++n)
        neighs.push_back(n);
      atom->SetData(tag, neighs);
    }
  }
}

void RemoveIncorrectStereo(OEMolBase& mol)
{
  const auto tag = OEGetTag("incorrect_stereo_neighs");
  for (OEIter<OEAtomBase> atom = mol.GetAtoms(); atom; ++atom)
  {
    if (atom->HasData(tag))
    {
      const vector<OEAtomBase*>& neighs =
        atom->GetData<vector<OEAtomBase*> >(tag);
      atom->SetStereo(neighs, OEAtomStereo::Tetrahedral, OEAtomStereo::Undefined);
    }
  }
}

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1c[n@@H]cc1");
  cout << OEMolToSmiles(mol) << endl;

  CollectIncorrectStereo(mol);
  RemoveIncorrectStereo(mol);
  cout << OEMolToSmiles(mol) << endl;

  return 0;
}

See also

Note

Generic data attached to a molecule or any of its atoms or bonds is automatically saved when the molecule is written into an .oeb file.