Molecules

The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem‘s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem. Much of an OEGraphMol‘s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.

See also

An OEGraphMol contains atoms and bonds. Their access is discussed in chapter Atom and Bond Traversal.

Construction and Destruction

OEChem molecules use C++ constructors and destructors, allowing them to be defined and used much like normal variables. The following example represents the smallest possible OEChem program. This creates a molecule called OEGraphMol when the program is run, and destroys it automatically when the program finishes.

Create a molecule

#include "openeye.h"
#include "oechem.h"

using namespace OEChem;

int main()
{
  OEGraphMol mol;
  return 0;
}

By using C++ constructors and destructors there is no need to explicitly call a function to allocate and initialize the molecule. There is also no need to explicitly de-initialize and destroy it when we’re done.

Of course, there may be times when it is necessary to create and destroy molecules dynamically. This is possible using C++’s new and delete operators to allocate a molecule.

Allocate memory for a molecule

#include "openeye.h"
#include "oechem.h"

using namespace OEChem;

int main()
{
  OEGraphMol *ptr;

  ptr = new OEGraphMol;
  delete ptr;

  return 0;
}

Note

An OEGraphMol is essentially a smart pointer around any arbitrary OEMolBase implementation. If dynamic allocation is truly needed it is often more beneficial to use the factory function OENewMolBase.

Construction from SMILES

A common method of creating a molecule in OEChem is via the SMILES representation. SMILES notation is commonly used in chemical information systems, as it provides a convenient string representation of a molecule. An introduction to SMILES syntax is provided in chapter SMILES Line Notation. The following examples will use the SMILES c1ccccc1 which describes the molecule benzene. A molecule can be created from a SMILES string using the OESmilesToMol function.

Creating a molecule from a SMILES string (version 1)

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  // create a new molecule
  OEGraphMol mol;

  // convert the SMILES string into a molecule
  OESmilesToMol(mol,"c1ccccc1");

  return 0;
}

The OESmilesToMol function returns a boolean value indicating whether the input string was a valid SMILES representation of a molecule. It is good programming practice to check the return value and report an error message if anything went wrong. The following example shows adding a check on the return status of OESmilesToMol and prints an error message if the string was not a valid SMILES representation of a molecule.

Creating a molecule from a SMILES string (version 2)

#include <openeye.h>
#include <oechem.h>
#include <oesystem.h>

using namespace OEChem;
using namespace OESystem;

int main()
{
  OEGraphMol mol;

  if (OESmilesToMol(mol,"c1ccccc1"))
  {
    // do something with the molecule
  }
  else 
    OEThrow.Warning("SMILES string was invalid!");

  return 0;
}

The OESmilesToMol is considered a high-level function. Additional to parsing the given SMILES string, the OESmilesToMol function also perceives:

In case when the aromaticity of the SMILES string (or the lack of it) want to be preserved, a low-level OEParseSmiles function can be used. For example, if benzene is expressed as c1ccccc1 all atoms and bonds are marked as aromatic. But if it is expressed as a Kekulé form, C1=CC=CC=C1, all atoms and bonds are kept aliphatic. The aromaticity of the molecule can be perceived by calling the OEAssignAromaticFlags function.

Creating molecules from a SMILES string (version 3)

#include <openeye.h>
#include <oechem.h>
#include <oesystem.h>

using namespace OEChem;
using namespace OESystem;

int main()
{
  OEGraphMol mol;

  if (!OEParseSmiles(mol,"C1=CC=CC=C1"))
    OEThrow.Warning("SMILES string was invalid!");

  std::cout << "Number of aromatic atoms: " << OECount(mol, OEIsAromaticAtom()) << std::endl;
  OEAssignAromaticFlags(mol);
  std::cout << "Number of aromatic atoms: " << OECount(mol, OEIsAromaticAtom()) << std::endl;

  return 0;
}

The output of the preceding program is the following:

Number of aromatic atoms = 0
Number of aromatic atoms = 6

Hint

We highly recommend the usage of the OESmilesToMol function when creating a molecules from a SMILES string.

See also

Reuse

Consider the following code examples to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.

Reusing a molecule (OESmilesToMol)

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OESmilesToMol(mol,"c1ccccc1");
  std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;

  OESmilesToMol(mol,"c1ccccc1O");
  std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;

  return 0;
}

The high-level OESmilesToMol function automatically clears the molecule before parsing the SMILES string. The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 7

Reusing a molecule (OEParseSmiles)

#include "openeye.h"
#include "oechem.h"

using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OEParseSmiles(mol,"c1ccccc1");
  std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;

  OEParseSmiles(mol,"c1ccccc1O");
  std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;

  return 0;
}

In the second example the low-level OEParseSmiles function is called. The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 13

The second line, Number of phenol atoms: 13, will be surprising to some. The behavior of the OEParseSmiles function is to add the given SMILES to the current molecule. OEChem provides a mechanism for reusing a molecule by calling the Clear method. Clear deletes all atoms and bonds of a molecule, thereby resetting a molecule into its original “empty” state.

Clearing and reusing a molecule (OEParseSmiles)

#include "openeye.h"
#include "oechem.h"

using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OEParseSmiles(mol,"c1ccccc1");
  std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;

  mol.Clear();

  OEParseSmiles(mol,"c1ccccc1O");
  std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;

  return 0;
}

The output of the preceding program is the following

Number of benzene atoms: 6
Number of phenol atoms: 7

Using the Clear method is recommended, for example, when processing multiple molecules sequentially in a database. Instead of requiring a new molecule to be allocated and destroyed for each entry, the Clear method can be used to reset a molecule to its initial “empty” state.

Unique Representation

It is sometimes useful to generate a unique representation of a molecule for use as an identifier for a database key. The compact nature of SMILES strings make them an ideal candidate for the task. However, the same molecule can be represented by many different SMILES strings. OEChem features an advanced algorithm for generating a (unique) canonical isomeric SMILES string. A canonical isomeric SMILES string can be generated from a molecule by calling the OEMolToSmiles function.

Creating a canonical isomeric SMILES string from a molecule

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;
using namespace std;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "C1=CC=CC=C1");

  cout << "Canonical isomeric SMILES is " << OEMolToSmiles(mol) << endl;
  return 0;
}

The output of the preceding program is the following:

Canonical isomeric SMILES is c1ccccc1

The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical isomeric SMILES to standard output.

Creating canonical isomeric SMILES strings

#include <openeye.h>
#include <oechem.h>
#include <oesystem.h>

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main()
{
  string buffer;
  while (getline(cin, buffer))
  {
    OEGraphMol mol;
    if (OESmilesToMol(mol,buffer))
    {
      cout << OEMolToSmiles(mol) << endl;
    }
    else 
      OEThrow.Warning("%s is an invalid SMILES!", buffer.c_str());
  }
  return 0;
}
input output (canonical isomeric SMILES)
C1CCCN[C@@H]1(O) C1CCN[C@@H](C1)O
C1CN[C@H](O)CC1 C1CCN[C@@H](C1)O
C1CC[C@H](O)CC1 C1CC[C@@H](CC1)O
C1CCC(O)CC1 C1CCC(CC1)O
C1=NC=CN1C[C@H](N)C(=O)O c1cn(cn1)C[C@@H](C(=O)O)N

The OEMolToSmiles is also considered a high-level function. Prior to creating the canonical isomeric SMILES, the OEMolToSmiles function perceives the following properties if necessary:

There is also possible to generate canonical SMILES without isomeric information by using the OECreateCanSmiString low-level function. As was shown in the Construction from SMILES section, OEParseSmiles preserves the aromaticity present in the input SMILES string. The function OEAssignAromaticFlags has to be used to perceive aromaticity in a molecule.

Creating canonical SMILES strings

#include <openeye.h>
#include <oechem.h>
#include <oesystem.h>

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main()
{
  OEGraphMol mol;
  string buffer;
  while (getline(cin,buffer))
  {
    mol.Clear();
    if (OEParseSmiles(mol,buffer))
    {
      OEAssignAromaticFlags(mol);
      OECreateCanSmiString(buffer,mol);
      cout << buffer << endl;
    }
    else 
      OEThrow.Warning("%s is an invalid SMILES!", buffer.c_str());
  }
  return 0;
}

Notice that the preceding program does not construct and destruct molecules each time through the loop, but rather uses the Clear function to reuse the molecule. If the line mol.Clear() were removed from the program, the output would contain longer and longer SMILES containing disconnected fragments, see section Reuse for more details.

input output (canonical SMILES)
c1cccnc1(O) c1ccnc(c1)O
C1=CC=CC=C1 c1ccccc1
C1=CN=CC=C1 c1ccncc1
C1=CC=CC=N1 c1ccncc1
C1=NC=CN1CCC(=O)O c1cn(cn1)CCC(=O)O

Hint

We highly recommend the usage of the OEMolToSmiles function when creating a SMILES string.

InChI

Canonical SMILES are not the only unique representation available, the IUPAC International Chemical Identifier (InChI), and its corresponding hashkey representation (InChIKey) are also unique to the compound they describe [InChI-2013]. InChIs can be created from molecules using the OECreateInChI function.

Creating standard InChI

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccnc(c1)O");
  
  std::string inchi;
  OECreateInChI(inchi, mol);
  std::cout << inchi << std::endl;
  return 0;
}
input (SMILES) output (Standard InChI)
c1ccnc(c1)O InChI=1S/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)

Note

The ‘S’ in ‘InChI=1S’ denotes standard InChI.

Non-standard InChI can be generated by passing in an OEInChIOptions object to the OECreateInChI function. Options available are documented in the OEInChIOptions class.

Creating non-standard InChI

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccnc(c1)O");
  
  OEInChIOptions opts;
  opts.SetFixedHLayer(true);

  std::string inchi;
  OECreateInChI(inchi, mol, opts);
  std::cout << inchi << std::endl;
  return 0;
}
input (SMILES) output (non-standard InChI)
c1ccnc(c1)O InChI=1/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)/f/h7H

A more compact version of InChI, is the InChIKey, a 27 character representation. The first 14 characters are the result of hashing the InChI’s connectivity information followed by a hyphen and 9 characters detailing the remaining layers of the InChI.

CreateInChIKey

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccnc(c1)O");
  
  std::string inchikey;
  OECreateInChIKey(inchikey, mol);
  std::cout << inchikey << std::endl;
  return 0;
}
input (SMILES) output (InChI Key)
c1ccnc(c1)O UBQKCCHYAOITMY-UHFFFAOYSA-N