Molecules¶
The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem TK’s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem TK. Much of an OEGraphMol’s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.
See also
An OEGraphMol
contains atoms and
bonds. Their access is discussed in chapter Atom and Bond Traversal.
Construction and Destruction¶
OEChem TK molecules use C++ constructors and destructors, allowing
them to be defined and used much like normal variables. The
following example represents the smallest possible OEChem TK
program. This creates a molecule called
OEGraphMol
when the program is run,
and destroys it automatically when the program finishes.
Create a molecule
#include <openeye.h>
#include <oechem.h>
using namespace OEChem;
int main()
{
OEGraphMol mol;
return 0;
}
By using C++ constructors and destructors there is no need to explicitly call a function to allocate and initialize the molecule. There is also no need to explicitly de-initialize and destroy it when we’re done.
Of course, there may be times when it is necessary to create and
destroy molecules dynamically. This is possible using C++’s
new
and delete
operators to allocate a molecule.
Allocate memory for a molecule
#include <openeye.h>
#include <oechem.h>
using namespace OEChem;
int main()
{
const OEGraphMol *ptr;
ptr = new OEGraphMol;
delete ptr;
return 0;
}
Note
An OEGraphMol is essentially a smart pointer
around any arbitrary OEMolBase
implementation. If dynamic allocation is truly needed it is
often more beneficial to use the factory function
OENewMolBase
.
Construction from SMILES¶
A common method of creating a molecule in OEChem TK is via the SMILES
representation. SMILES notation is commonly used in chemical
information systems, as it provides a convenient string representation
of a molecule. An introduction to SMILES syntax is provided in chapter
SMILES Line Notation. The following examples will use the SMILES
c1ccccc1
which describes the molecule benzene. A molecule can be
created from a SMILES string using the OESmilesToMol
function. Similarly, a molecule can be created from a CXSMILES string
using the OECXSMILESToMol
Creating a molecule from a SMILES string (version 1)
#include <openeye.h>
#include <oechem.h>
using namespace OEChem;
int main()
{
// create a new molecule
OEGraphMol mol;
// convert the SMILES string into a molecule
OESmilesToMol(mol,"c1ccccc1");
return 0;
}
The OESmilesToMol
function returns a boolean value
indicating whether the input string was a valid SMILES representation
of a molecule. It is good programming practice to check the return
value and report an error message if anything went wrong.
The following example shows adding a check on the return status of
OESmilesToMol
and printing an error message if the
string was not a valid SMILES representation of a molecule.
Creating a molecule from a SMILES string (version 2)
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
if (OESmilesToMol(mol, "c1ccccc1"))
{
// do something with the molecule
}
else
OEThrow.Warning("SMILES string was invalid!");
return 0;
}
The OESmilesToMol
is considered a high-level function.
In addition, to parsing the given SMILES string, the
OESmilesToMol
function also perceives:
the rings of the molecule, by invoking the
OEFindRingAtomsAndBonds
functionthe aromaticity of the molecule, by calling the
OEAssignAromaticFlags
function using theOEChem::OEAroModelOpenEye
aromaticity modelthe chirality of the molecule, by calling the
OEPerceiveChiral
function
In cases where you want to preserve the aromaticity of the SMILES string (or the lack of it),
a low-level OEParseSmiles
function can be used.
For example, if benzene is expressed as c1ccccc1
, all atoms and bonds
are marked as aromatic. But if it is expressed as a Kekulé form,
C1=CC=CC=C1
, all atoms and bonds are kept aliphatic.
The aromaticity of the molecule can be perceived by calling the
OEAssignAromaticFlags
function.
Creating molecules from a SMILES string (version 3)
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
if (!OEParseSmiles(mol, "C1=CC=CC=C1"))
OEThrow.Warning("SMILES string was invalid!");
std::cout << "Number of aromatic atoms: " << OECount(mol, OEIsAromaticAtom()) << std::endl;
OEAssignAromaticFlags(mol);
std::cout << "Number of aromatic atoms: " << OECount(mol, OEIsAromaticAtom()) << std::endl;
return 0;
}
The output of the preceding program is the following:
Number of aromatic atoms = 0
Number of aromatic atoms = 6
Hint
We highly recommend the use of the OESmilesToMol
function when creating a molecule from a SMILES string.
We highly recommend the use of the OECXSMILESToMol
function when creating a molecule from either a SMILES or CXSMILES string
and the specific format is not known.
See also
Aromaticity Perception chapter for further information about aromaticity models.
Reuse¶
Consider the following code examples to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.
Reusing a molecule (OESmilesToMol)
#include <openeye.h>
#include <oechem.h>
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1ccccc1");
std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;
OESmilesToMol(mol, "c1ccccc1O");
std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;
return 0;
}
The high-level OESmilesToMol
function automatically
clears the molecule before parsing the SMILES string.
The output of the preceding program is the following:
Number of benzene atoms: 6
Number of phenol atoms: 7
Reusing a molecule (OEParseSmiles)
#include <openeye.h>
#include <oechem.h>
using namespace OEChem;
int main()
{
OEGraphMol mol;
OEParseSmiles(mol, "c1ccccc1");
std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;
OEParseSmiles(mol, "c1ccccc1O");
std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;
return 0;
}
In the second example the low-level OEParseSmiles
function is called.
The output of the preceding program is the following:
Number of benzene atoms: 6
Number of phenol atoms: 13
The second line, Number of phenol atoms: 13
, will be surprising to
some. The behavior of the OEParseSmiles
function is to
add the given SMILES to the current molecule. OEChem TK provides a
mechanism for reusing a molecule by calling the
Clear
method.
Clear
deletes all atoms and bonds
of a molecule, thereby resetting a molecule into its original “empty”
state.
Clearing and reusing a molecule (OEParseSmiles)
#include <openeye.h>
#include <oechem.h>
using namespace OEChem;
int main()
{
OEGraphMol mol;
OEParseSmiles(mol, "c1ccccc1");
std::cout << "Number of benzene atoms: " << mol.NumAtoms() << std::endl;
mol.Clear();
OEParseSmiles(mol, "c1ccccc1O");
std::cout << "Number of phenol atoms: " << mol.NumAtoms() << std::endl;
return 0;
}
The output of the preceding program is the following
Number of benzene atoms: 6
Number of phenol atoms: 7
Using the Clear
method is
recommended, for example, when processing multiple molecules
sequentially in a database. Instead of requiring a new molecule to be
allocated and destroyed for each entry, the
Clear
method can be used to reset
a molecule to its initial “empty” state.
Unique Representation¶
It is sometimes useful to generate a unique representation of a
molecule for use as an identifier for a database key. The compact
nature of SMILES strings makes them ideal candidates for the
task. However, the same molecule can be represented by many different
SMILES strings. OEChem TK features an advanced algorithm for
generating a (unique) canonical isomeric SMILES string.
A canonical isomeric SMILES string can be generated from a molecule
by calling the OEMolToSmiles
or OEMolToCXSMILES
functions.
Creating a canonical isomeric SMILES string from a molecule
#include <openeye.h>
#include <oechem.h>
using namespace std;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "C1=CC=CC=C1");
cout << "Canonical isomeric SMILES is " << OEMolToSmiles(mol) << endl;
return 0;
}
The output of the preceding program is the following:
Canonical isomeric SMILES is c1ccccc1
The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical isomeric SMILES to standard output.
Creating canonical isomeric SMILES strings
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
string buffer;
while (getline(cin, buffer))
{
OEGraphMol mol;
if (OESmilesToMol(mol, buffer))
cout << OEMolToSmiles(mol) << endl;
else
OEThrow.Warning("%s is an invalid SMILES!", buffer.c_str());
}
return 0;
}
input |
output (canonical isomeric SMILES) |
---|---|
C1CCCN[C@@H]1(O) |
C1CCN[C@@H](C1)O |
C1CN[C@H](O)CC1 |
C1CCN[C@@H](C1)O |
C1CC[C@H](O)CC1 |
C1CC[C@@H](CC1)O |
C1CCC(O)CC1 |
C1CCC(CC1)O |
C1=NC=CN1C[C@H](N)C(=O)O |
c1cn(cn1)C[C@@H](C(=O)O)N |
The OEMolToSmiles
and OEMolToCXSMILES
functions
are considered a high-level functions.
Prior to creating the canonical isomeric SMILES, the
OEMolToSmiles
function perceives the following properties
if necessary:
the rings of the molecule, by using
OEFindRingAtomsAndBonds
.the aromaticity of the molecule, by calling the
OEAssignAromaticFlags
. function using theOEChem::OEAroModelOpenEye
aromaticity modelthe atom and bond stereochemistry.
It is also possible to generate canonical SMILES without
isomeric information by using the OECreateCanSmiString
low-level function.
As was shown in the Construction from SMILES section,
OEParseSmiles
preserves the aromaticity present in
the input SMILES string. The function
OEAssignAromaticFlags
has to be used to perceive
aromaticity in a molecule.
Creating canonical SMILES strings
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
OEGraphMol mol;
string buffer;
while (getline(cin, buffer))
{
mol.Clear();
if (OEParseSmiles(mol, buffer))
{
OEAssignAromaticFlags(mol);
OECreateCanSmiString(buffer, mol);
cout << buffer << endl;
}
else
OEThrow.Warning("%s is an invalid SMILES!", buffer.c_str());
}
return 0;
}
Notice that the preceding program does not construct and destruct
molecules each time through the loop, but rather uses the
Clear
function to reuse the
molecule. If the line mol.Clear()
were removed from the program,
the output would contain longer and longer SMILES containing
disconnected fragments. See section
Reuse for more details.
input |
output (canonical SMILES) |
---|---|
c1cccnc1(O) |
c1ccnc(c1)O |
C1=CC=CC=C1 |
c1ccccc1 |
C1=CN=CC=C1 |
c1ccncc1 |
C1=CC=CC=N1 |
c1ccncc1 |
C1=NC=CN1CCC(=O)O |
c1cn(cn1)CCC(=O)O |
Hint
We highly recommend the usage of the OEMolToSmiles
function when creating a SMILES string.
We highly recommend the usage of the OEMolToCXSMILES
function when structures may contain enhanced stereogroup information.
See also
InChI¶
Canonical SMILES is not the only unique representation available.
The IUPAC International Chemical Identifier (InChI), and its corresponding
hashkey representation (InChIKey) are also unique to the compound
they describe [InChI-2013].
InChIs can be created from molecules using the OECreateInChI
,
OEMolToInChI
, or OEMolToSTDInChI
functions.
Creating standard InChI
#include <openeye.h>
#include <oechem.h>
using namespace std;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1ccnc(c1)O");
const string inchi = OEMolToSTDInChI(mol);
cout << inchi << endl;
return 0;
}
input (SMILES) |
output (Standard InChI) |
---|---|
c1ccnc(c1)O |
InChI=1S/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7) |
Note
The ‘S’ in ‘InChI=1S’ denotes standard InChI.
The following slightly more complicated example reads InChI strings from standard input and writes InChI strings to standard output.
Reading and writing InChI strings
#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
using namespace std;
using namespace OESystem;
using namespace OEChem;
int main()
{
string buffer;
while (getline(cin, buffer))
{
OEGraphMol mol;
if (OEInChIToMol(mol, buffer))
cout << OEMolToInChI(mol) << endl;
else
OEThrow.Warning("%s is an invalid InChI!", buffer.c_str());
}
return 0;
}
A nonstandard InChI can be generated by passing in an OEInChIOptions
object to the OECreateInChI
function.
The options available are documented in the OEInChIOptions class.
Creating nonstandard InChI strings
#include <openeye.h>
#include <oechem.h>
using namespace std;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1ccnc(c1)O");
OEInChIOptions opts;
const bool fixedH = true;
opts.SetFixedHLayer(fixedH);
string inchi;
OECreateInChI(inchi, mol, opts);
cout << inchi << endl;
return 0;
}
input (SMILES) |
output (nonstandard InChI) |
---|---|
c1ccnc(c1)O |
InChI=1/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)/f/h7H |
The 27-character-long InChIKey is made of three parts connected by hyphens. The first part is 14 characters long and is based on the connectivity and proton layers of an InChI string. The second part, contains 9 characters that are related to all other InChI layers (isotopes, stereochemistry, etc.) and also contains the version of InChI and its standard/nonstandard property in the last two characters. The third part is one letter, describing the (de)protonation layer of the original InChI.
CreateInChIKey
#include <openeye.h>
#include <oechem.h>
using namespace std;
using namespace OEChem;
int main()
{
OEGraphMol mol;
OESmilesToMol(mol, "c1ccnc(c1)O");
const string inchikey = OEMolToSTDInChIKey(mol);
cout << inchikey << endl;
return 0;
}
input (SMILES) |
output (InChI Key) |
---|---|
c1ccnc(c1)O |
UBQKCCHYAOITMY-UHFFFAOYSA-N |