Molecules

The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem TK’s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem TK. Much of an OEGraphMol’s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.

See also

An OEGraphMol contains atoms and bonds. Their access is discussed in chapter Atom and Bond Traversal.

Construction and Destruction

OEChem TK molecules use C++ constructors and destructors, allowing them to be defined and used much like normal variables. The following example represents the smallest possible OEChem TK program. This creates a molecule called OEGraphMol when the program is run, and destroys it automatically when the program finishes.

Create a molecule

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;
  return 0;
}

By using C++ constructors and destructors there is no need to explicitly call a function to allocate and initialize the molecule. There is also no need to explicitly de-initialize and destroy it when we’re done.

Of course, there may be times when it is necessary to create and destroy molecules dynamically. This is possible using C++’s new and delete operators to allocate a molecule.

Allocate memory for a molecule

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  const OEGraphMol *ptr;

  ptr = new OEGraphMol;
  delete ptr;

  return 0;
}

Note

An OEGraphMol is essentially a smart pointer around any arbitrary OEMolBase implementation. If dynamic allocation is truly needed it is often more beneficial to use the factory function OENewMolBase.

Construction from SMILES

A common method of creating a molecule in OEChem TK is via the SMILES representation. SMILES notation is commonly used in chemical information systems, as it provides a convenient string representation of a molecule. An introduction to SMILES syntax is provided in chapter SMILES Line Notation. The following examples will use the SMILES c1ccccc1 which describes the molecule benzene. A molecule can be created from a SMILES string using the OESmilesToMol function. Similarly, a molecule can be created from a CXSMILES string using the OECXSMILESToMol

Creating a molecule from a SMILES string (version 1)

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  // create a new molecule
  OEGraphMol mol;

  // convert the SMILES string into a molecule
  OESmilesToMol(mol,"c1ccccc1");

  return 0;
}

The OESmilesToMol function returns a boolean value indicating whether the input string was a valid SMILES representation of a molecule. It is good programming practice to check the return value and report an error message if anything went wrong. The following example shows adding a check on the return status of OESmilesToMol and printing an error message if the string was not a valid SMILES representation of a molecule.

Creating a molecule from a SMILES string (version 2)

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace OESystem;
using namespace OEChem;

int main()
{
  OEGraphMol mol;

  if (OESmilesToMol(mol, "c1ccccc1"))
  {
    // do something with the molecule
  }
  else
    OEThrow.Warning("SMILES string was invalid!");

  return 0;
}

The OESmilesToMol is considered a high-level function. In addition, to parsing the given SMILES string, the OESmilesToMol function also perceives:

In cases where you want to preserve the aromaticity of the SMILES string (or the lack of it), a low-level OEParseSmiles function can be used. For example, if benzene is expressed as c1ccccc1, all atoms and bonds are marked as aromatic. But if it is expressed as a Kekulé form, C1=CC=CC=C1, all atoms and bonds are kept aliphatic. The aromaticity of the molecule can be perceived by calling the OEAssignAromaticFlags function.

Creating molecules from a SMILES string (version 3)

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace OESystem;
using namespace OEChem;

int main()
{
  OEGraphMol mol;

  if (!OEParseSmiles(mol, "C1=CC=CC=C1"))
    OEThrow.Warning("SMILES string was invalid!");

  std::cout << "Number of aromatic atoms: " << OECount(mol, OEIsAromaticAtom()) << std::endl;
  OEAssignAromaticFlags(mol);
  std::cout << "Number of aromatic atoms: " << OECount(mol, OEIsAromaticAtom()) << std::endl;

  return 0;
}

The output of the preceding program is the following:

Number of aromatic atoms = 0
Number of aromatic atoms = 6

Hint

We highly recommend the use of the OESmilesToMol function when creating a molecule from a SMILES string.

We highly recommend the use of the OECXSMILESToMol function when creating a molecule from either a SMILES or CXSMILES string and the specific format is not known.

See also

Reuse

Consider the following code examples to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.

Reusing a molecule (OESmilesToMol)

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OESmilesToMol(mol, "c1ccccc1");
  std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;

  OESmilesToMol(mol, "c1ccccc1O");
  std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;

  return 0;
}

The high-level OESmilesToMol function automatically clears the molecule before parsing the SMILES string. The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 7

Reusing a molecule (OEParseSmiles)

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OEParseSmiles(mol, "c1ccccc1");
  std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;

  OEParseSmiles(mol, "c1ccccc1O");
  std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;

  return 0;
}

In the second example the low-level OEParseSmiles function is called. The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 13

The second line, Number of phenol atoms: 13, will be surprising to some. The behavior of the OEParseSmiles function is to add the given SMILES to the current molecule. OEChem TK provides a mechanism for reusing a molecule by calling the Clear method. Clear deletes all atoms and bonds of a molecule, thereby resetting a molecule into its original “empty” state.

Clearing and reusing a molecule (OEParseSmiles)

#include <openeye.h>
#include <oechem.h>

using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OEParseSmiles(mol, "c1ccccc1");
  std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;

  mol.Clear();

  OEParseSmiles(mol, "c1ccccc1O");
  std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;

  return 0;
}

The output of the preceding program is the following

Number of benzene atoms: 6
Number of phenol atoms: 7

Using the Clear method is recommended, for example, when processing multiple molecules sequentially in a database. Instead of requiring a new molecule to be allocated and destroyed for each entry, the Clear method can be used to reset a molecule to its initial “empty” state.

Unique Representation

It is sometimes useful to generate a unique representation of a molecule for use as an identifier for a database key. The compact nature of SMILES strings makes them ideal candidates for the task. However, the same molecule can be represented by many different SMILES strings. OEChem TK features an advanced algorithm for generating a (unique) canonical isomeric SMILES string. A canonical isomeric SMILES string can be generated from a molecule by calling the OEMolToSmiles or OEMolToCXSMILES functions.

Creating a canonical isomeric SMILES string from a molecule

#include <openeye.h>
#include <oechem.h>

using namespace std;
using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "C1=CC=CC=C1");
  cout << "Canonical isomeric SMILES is " << OEMolToSmiles(mol) << endl;

  return 0;
}

The output of the preceding program is the following:

Canonical isomeric SMILES is c1ccccc1

The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical isomeric SMILES to standard output.

Creating canonical isomeric SMILES strings

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

int main()
{
  string buffer;
  while (getline(cin, buffer))
  {
    OEGraphMol mol;
    if (OESmilesToMol(mol, buffer))
      cout << OEMolToSmiles(mol) << endl;
    else
      OEThrow.Warning("%s is an invalid SMILES!", buffer.c_str());
  }

  return 0;
}

input

output (canonical isomeric SMILES)

C1CCCN[C@@H]1(O)

C1CCN[C@@H](C1)O

C1CN[C@H](O)CC1

C1CCN[C@@H](C1)O

C1CC[C@H](O)CC1

C1CC[C@@H](CC1)O

C1CCC(O)CC1

C1CCC(CC1)O

C1=NC=CN1C[C@H](N)C(=O)O

c1cn(cn1)C[C@@H](C(=O)O)N

The OEMolToSmiles and OEMolToCXSMILES functions are considered a high-level functions. Prior to creating the canonical isomeric SMILES, the OEMolToSmiles function perceives the following properties if necessary:

It is also possible to generate canonical SMILES without isomeric information by using the OECreateCanSmiString low-level function. As was shown in the Construction from SMILES section, OEParseSmiles preserves the aromaticity present in the input SMILES string. The function OEAssignAromaticFlags has to be used to perceive aromaticity in a molecule.

Creating canonical SMILES strings

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

int main()
{
  OEGraphMol mol;
  string buffer;
  while (getline(cin, buffer))
  {
    mol.Clear();
    if (OEParseSmiles(mol, buffer))
    {
      OEAssignAromaticFlags(mol);
      OECreateCanSmiString(buffer, mol);
      cout << buffer << endl;
    }
    else
      OEThrow.Warning("%s is an invalid SMILES!", buffer.c_str());
  }

  return 0;
}

Notice that the preceding program does not construct and destruct molecules each time through the loop, but rather uses the Clear function to reuse the molecule. If the line mol.Clear() were removed from the program, the output would contain longer and longer SMILES containing disconnected fragments. See section Reuse for more details.

input

output (canonical SMILES)

c1cccnc1(O)

c1ccnc(c1)O

C1=CC=CC=C1

c1ccccc1

C1=CN=CC=C1

c1ccncc1

C1=CC=CC=N1

c1ccncc1

C1=NC=CN1CCC(=O)O

c1cn(cn1)CCC(=O)O

Hint

We highly recommend the usage of the OEMolToSmiles function when creating a SMILES string.

We highly recommend the usage of the OEMolToCXSMILES function when structures may contain enhanced stereogroup information.

InChI

Canonical SMILES is not the only unique representation available. The IUPAC International Chemical Identifier (InChI), and its corresponding hashkey representation (InChIKey) are also unique to the compound they describe [InChI-2013]. InChIs can be created from molecules using the OECreateInChI, OEMolToInChI, or OEMolToSTDInChI functions.

Creating standard InChI

#include <openeye.h>
#include <oechem.h>

using namespace std;
using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccnc(c1)O");

  const string inchi = OEMolToSTDInChI(mol);
  cout << inchi << endl;
  return 0;
}

input (SMILES)

output (Standard InChI)

c1ccnc(c1)O

InChI=1S/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)

Note

The ‘S’ in ‘InChI=1S’ denotes standard InChI.

The following slightly more complicated example reads InChI strings from standard input and writes InChI strings to standard output.

Reading and writing InChI strings

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>

using namespace std;
using namespace OESystem;
using namespace OEChem;

int main()
{
  string buffer;
  while (getline(cin, buffer))
  {
    OEGraphMol mol;
    if (OEInChIToMol(mol, buffer))
      cout << OEMolToInChI(mol) << endl;
    else
      OEThrow.Warning("%s is an invalid InChI!", buffer.c_str());
  }

  return 0;
}

A nonstandard InChI can be generated by passing in an OEInChIOptions object to the OECreateInChI function. The options available are documented in the OEInChIOptions class.

Creating nonstandard InChI strings

#include <openeye.h>
#include <oechem.h>

using namespace std;
using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccnc(c1)O");

  OEInChIOptions opts;
  const bool fixedH = true;
  opts.SetFixedHLayer(fixedH);

  string inchi;
  OECreateInChI(inchi, mol, opts);
  cout << inchi << endl;

  return 0;
}

input (SMILES)

output (nonstandard InChI)

c1ccnc(c1)O

InChI=1/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)/f/h7H

The 27-character-long InChIKey is made of three parts connected by hyphens. The first part is 14 characters long and is based on the connectivity and proton layers of an InChI string. The second part, contains 9 characters that are related to all other InChI layers (isotopes, stereochemistry, etc.) and also contains the version of InChI and its standard/nonstandard property in the last two characters. The third part is one letter, describing the (de)protonation layer of the original InChI.

CreateInChIKey

#include <openeye.h>
#include <oechem.h>

using namespace std;
using namespace OEChem;

int main()
{
  OEGraphMol mol;
  OESmilesToMol(mol, "c1ccnc(c1)O");

  const string inchikey = OEMolToSTDInChIKey(mol);
  cout << inchikey << endl;
  return 0;
}

input (SMILES)

output (InChI Key)

c1ccnc(c1)O

UBQKCCHYAOITMY-UHFFFAOYSA-N