Molecules

The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem TK’s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem TK. Much of an OEGraphMol’s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.

See also

An OEGraphMol contains atoms and bonds. Their access is discussed in chapter Atom and Bond Traversal.

Construction and Destruction

The example below represents the smallest possible Python OEChem TK program. This program creates an OEGraphMol called mol when run. When the program ends, Python automatically cleans up the molecule when there are no more references to it.

Create a molecule

from openeye import oechem

mol = oechem.OEGraphMol()

There may be times when you want to delete (destroy) a molecule before the end of the script. This can be done by using the built-in command, del.

Destroy a molecule

from openeye import oechem

mol = oechem.OEGraphMol()
del mol

Construction from SMILES

A common method of creating a molecule in OEChem TK is via the SMILES representation. SMILES notation is commonly used in chemical information systems, as it provides a convenient string representation of a molecule. An introduction to SMILES syntax is provided in chapter SMILES Line Notation. The following examples will use the SMILES c1ccccc1 which describes the molecule benzene. A molecule can be created from a SMILES string using the OESmilesToMol function. Similarly, a molecule can be created from a CXSMILES string using the OECXSMILESToMol

Creating a molecule from a SMILES string (version 1)

from openeye import oechem

# create a new molecule
mol = oechem.OEGraphMol()

# convert the SMILES string into a molecule
oechem.OESmilesToMol(mol, "c1ccccc1")

The OESmilesToMol function returns a boolean value indicating whether the input string was a valid SMILES representation of a molecule. It is good programming practice to check the return value and report an error message if anything went wrong. The following example shows adding a check on the return status of OESmilesToMol and printing an error message if the string was not a valid SMILES representation of a molecule.

Creating a molecule from a SMILES string (version 2)

from openeye import oechem

# create a new molecule
mol = oechem.OEGraphMol()
# convert the SMILES string into a molecule
if oechem.OESmilesToMol(mol, "c1ccccc1"):
    # do something interesting with mol
    pass
else:
    print("SMILES string was invalid!")

The OESmilesToMol is considered a high-level function. In addition, to parsing the given SMILES string, the OESmilesToMol function also perceives:

In cases where you want to preserve the aromaticity of the SMILES string (or the lack of it), a low-level OEParseSmiles function can be used. For example, if benzene is expressed as c1ccccc1, all atoms and bonds are marked as aromatic. But if it is expressed as a Kekulé form, C1=CC=CC=C1, all atoms and bonds are kept aliphatic. The aromaticity of the molecule can be perceived by calling the OEAssignAromaticFlags function.

Creating molecules from a SMILES string (version 3)

from openeye import oechem

mol = oechem.OEGraphMol()
if not oechem.OEParseSmiles(mol, "C1=CC=CC=C1"):
    print("SMILES string was invalid!")

print("Number of aromatic atoms =", oechem.OECount(mol, oechem.OEIsAromaticAtom()))
oechem.OEAssignAromaticFlags(mol)
print("Number of aromatic atoms =", oechem.OECount(mol, oechem.OEIsAromaticAtom()))

The output of the preceding program is the following:

Number of aromatic atoms = 0
Number of aromatic atoms = 6

Hint

We highly recommend the use of the OESmilesToMol function when creating a molecule from a SMILES string.

We highly recommend the use of the OECXSMILESToMol function when creating a molecule from either a SMILES or CXSMILES string and the specific format is not known.

See also

Reuse

Consider the following code examples to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.

Reusing a molecule (OESmilesToMol)

from openeye import oechem

mol = oechem.OEGraphMol()

oechem.OESmilesToMol(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())

oechem.OESmilesToMol(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())

The high-level OESmilesToMol function automatically clears the molecule before parsing the SMILES string. The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 7

Reusing a molecule (OEParseSmiles)

from openeye import oechem

mol = oechem.OEGraphMol()

oechem.OEParseSmiles(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())

oechem.OEParseSmiles(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())

In the second example the low-level OEParseSmiles function is called. The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 13

The second line, Number of phenol atoms: 13, will be surprising to some. The behavior of the OEParseSmiles function is to add the given SMILES to the current molecule. OEChem TK provides a mechanism for reusing a molecule by calling the Clear method. Clear deletes all atoms and bonds of a molecule, thereby resetting a molecule into its original “empty” state.

Clearing and reusing a molecule (OEParseSmiles)

from openeye import oechem

mol = oechem.OEGraphMol()

oechem.OEParseSmiles(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())

mol.Clear()

oechem.OEParseSmiles(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())

The output of the preceding program is the following

Number of benzene atoms: 6
Number of phenol atoms: 7

Using the Clear method is recommended, for example, when processing multiple molecules sequentially in a database. Instead of requiring a new molecule to be allocated and destroyed for each entry, the Clear method can be used to reset a molecule to its initial “empty” state.

Unique Representation

It is sometimes useful to generate a unique representation of a molecule for use as an identifier for a database key. The compact nature of SMILES strings makes them ideal candidates for the task. However, the same molecule can be represented by many different SMILES strings. OEChem TK features an advanced algorithm for generating a (unique) canonical isomeric SMILES string. A canonical isomeric SMILES string can be generated from a molecule by calling the OEMolToSmiles or OEMolToCXSMILES functions.

Creating a canonical isomeric SMILES string from a molecule

from openeye import oechem

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "C1=CC=CC=C1")

print("Canonical isomeric SMILES is", oechem.OEMolToSmiles(mol))

The output of the preceding program is the following:

Canonical isomeric SMILES is c1ccccc1

The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical isomeric SMILES to standard output.

Creating canonical isomeric SMILES strings

from openeye import oechem
import sys

for smi in sys.stdin:
    mol = oechem.OEGraphMol()
    smi = smi.strip()
    if oechem.OESmilesToMol(mol, smi):
        print(oechem.OEMolToSmiles(mol))
    else:
        oechem.OEThrow.Warning("%s is an invalid SMILES!" % smi)

input

output (canonical isomeric SMILES)

C1CCCN[C@@H]1(O)

C1CCN[C@@H](C1)O

C1CN[C@H](O)CC1

C1CCN[C@@H](C1)O

C1CC[C@H](O)CC1

C1CC[C@@H](CC1)O

C1CCC(O)CC1

C1CCC(CC1)O

C1=NC=CN1C[C@H](N)C(=O)O

c1cn(cn1)C[C@@H](C(=O)O)N

The OEMolToSmiles and OEMolToCXSMILES functions are considered a high-level functions. Prior to creating the canonical isomeric SMILES, the OEMolToSmiles function perceives the following properties if necessary:

It is also possible to generate canonical SMILES without isomeric information by using the OECreateCanSmiString low-level function. As was shown in the Construction from SMILES section, OEParseSmiles preserves the aromaticity present in the input SMILES string. The function OEAssignAromaticFlags has to be used to perceive aromaticity in a molecule.

Creating canonical SMILES strings

from openeye import oechem
import sys

mol = oechem.OEGraphMol()
for smi in sys.stdin:
    mol.Clear()
    smi = smi.strip()
    if oechem.OEParseSmiles(mol, smi):
        oechem.OEAssignAromaticFlags(mol)
        print(oechem.OECreateCanSmiString(mol))
    else:
        oechem.OEThrow.Warning("%s is an invalid SMILES!" % smi)

Notice that the preceding program does not construct and destruct molecules each time through the loop, but rather uses the Clear function to reuse the molecule. If the line mol.Clear() were removed from the program, the output would contain longer and longer SMILES containing disconnected fragments. See section Reuse for more details.

input

output (canonical SMILES)

c1cccnc1(O)

c1ccnc(c1)O

C1=CC=CC=C1

c1ccccc1

C1=CN=CC=C1

c1ccncc1

C1=CC=CC=N1

c1ccncc1

C1=NC=CN1CCC(=O)O

c1cn(cn1)CCC(=O)O

Hint

We highly recommend the usage of the OEMolToSmiles function when creating a SMILES string.

We highly recommend the usage of the OEMolToCXSMILES function when structures may contain enhanced stereogroup information.

InChI

Canonical SMILES is not the only unique representation available. The IUPAC International Chemical Identifier (InChI), and its corresponding hashkey representation (InChIKey) are also unique to the compound they describe [InChI-2013]. InChIs can be created from molecules using the OECreateInChI, OEMolToInChI, or OEMolToSTDInChI functions.

Creating standard InChI

from openeye import oechem

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
print(oechem.OEMolToSTDInChI(mol))

input (SMILES)

output (Standard InChI)

c1ccnc(c1)O

InChI=1S/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)

Note

The ‘S’ in ‘InChI=1S’ denotes standard InChI.

The following slightly more complicated example reads InChI strings from standard input and writes InChI strings to standard output.

Reading and writing InChI strings

from openeye import oechem
import sys

for inchi in sys.stdin:
    mol = oechem.OEGraphMol()
    inchi = inchi.strip()
    if oechem.OEInChIToMol(mol, inchi):
        print(oechem.OEMolToInChI(mol))
    else:
        oechem.OEThrow.Warning("%s is an invalid INCHI!" % inchi)

A nonstandard InChI can be generated by passing in an OEInChIOptions object to the OECreateInChI function. The options available are documented in the OEInChIOptions class.

Creating nonstandard InChI strings

from openeye import oechem

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")

opts = oechem.OEInChIOptions()
opts.SetFixedHLayer(True)
print(oechem.OECreateInChI(mol, opts))

input (SMILES)

output (nonstandard InChI)

c1ccnc(c1)O

InChI=1/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)/f/h7H

The 27-character-long InChIKey is made of three parts connected by hyphens. The first part is 14 characters long and is based on the connectivity and proton layers of an InChI string. The second part, contains 9 characters that are related to all other InChI layers (isotopes, stereochemistry, etc.) and also contains the version of InChI and its standard/nonstandard property in the last two characters. The third part is one letter, describing the (de)protonation layer of the original InChI.

CreateInChIKey

from openeye import oechem

mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
print(oechem.OEMolToSTDInChIKey(mol))

input (SMILES)

output (InChI Key)

c1ccnc(c1)O

UBQKCCHYAOITMY-UHFFFAOYSA-N