Molecules¶
The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem TK’s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem TK. Much of an OEGraphMol’s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.
See also
An OEGraphMol
contains atoms and
bonds. Their access is discussed in chapter Atom and Bond Traversal.
Construction and Destruction¶
The example below represents the smallest possible Python OEChem TK
program. This program creates an OEGraphMol called
mol
when run. When the program ends, Python automatically
cleans up the molecule when there are no more references to it.
Create a molecule
from openeye import oechem
mol = oechem.OEGraphMol()
There may be times when you want to delete (destroy) a molecule
before the end of the script. This can be done by using the
built-in command, del
.
Destroy a molecule
from openeye import oechem
mol = oechem.OEGraphMol()
del mol
Construction from SMILES¶
A common method of creating a molecule in OEChem TK is via the SMILES
representation. SMILES notation is commonly used in chemical
information systems, as it provides a convenient string representation
of a molecule. An introduction to SMILES syntax is provided in chapter
SMILES Line Notation. The following examples will use the SMILES
c1ccccc1
which describes the molecule benzene. A molecule can be
created from a SMILES string using the OESmilesToMol
function. Similarly, a molecule can be created from a CXSMILES string
using the OECXSMILESToMol
Creating a molecule from a SMILES string (version 1)
from openeye import oechem
# create a new molecule
mol = oechem.OEGraphMol()
# convert the SMILES string into a molecule
oechem.OESmilesToMol(mol, "c1ccccc1")
The OESmilesToMol
function returns a boolean value
indicating whether the input string was a valid SMILES representation
of a molecule. It is good programming practice to check the return
value and report an error message if anything went wrong.
The following example shows adding a check on the return status of
OESmilesToMol
and printing an error message if the
string was not a valid SMILES representation of a molecule.
Creating a molecule from a SMILES string (version 2)
from openeye import oechem
# create a new molecule
mol = oechem.OEGraphMol()
# convert the SMILES string into a molecule
if oechem.OESmilesToMol(mol, "c1ccccc1"):
# do something interesting with mol
pass
else:
print("SMILES string was invalid!")
The OESmilesToMol
is considered a high-level function.
In addition, to parsing the given SMILES string, the
OESmilesToMol
function also perceives:
the rings of the molecule, by invoking the
OEFindRingAtomsAndBonds
functionthe aromaticity of the molecule, by calling the
OEAssignAromaticFlags
function using theOEChem_OEAroModelOpenEye
aromaticity modelthe chirality of the molecule, by calling the
OEPerceiveChiral
function
In cases where you want to preserve the aromaticity of the SMILES string (or the lack of it),
a low-level OEParseSmiles
function can be used.
For example, if benzene is expressed as c1ccccc1
, all atoms and bonds
are marked as aromatic. But if it is expressed as a Kekulé form,
C1=CC=CC=C1
, all atoms and bonds are kept aliphatic.
The aromaticity of the molecule can be perceived by calling the
OEAssignAromaticFlags
function.
Creating molecules from a SMILES string (version 3)
from openeye import oechem
mol = oechem.OEGraphMol()
if not oechem.OEParseSmiles(mol, "C1=CC=CC=C1"):
print("SMILES string was invalid!")
print("Number of aromatic atoms =", oechem.OECount(mol, oechem.OEIsAromaticAtom()))
oechem.OEAssignAromaticFlags(mol)
print("Number of aromatic atoms =", oechem.OECount(mol, oechem.OEIsAromaticAtom()))
The output of the preceding program is the following:
Number of aromatic atoms = 0
Number of aromatic atoms = 6
Hint
We highly recommend the use of the OESmilesToMol
function when creating a molecule from a SMILES string.
We highly recommend the use of the OECXSMILESToMol
function when creating a molecule from either a SMILES or CXSMILES string
and the specific format is not known.
See also
Aromaticity Perception chapter for further information about aromaticity models.
Reuse¶
Consider the following code examples to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.
Reusing a molecule (OESmilesToMol)
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())
oechem.OESmilesToMol(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())
The high-level OESmilesToMol
function automatically
clears the molecule before parsing the SMILES string.
The output of the preceding program is the following:
Number of benzene atoms: 6
Number of phenol atoms: 7
Reusing a molecule (OEParseSmiles)
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OEParseSmiles(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())
oechem.OEParseSmiles(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())
In the second example the low-level OEParseSmiles
function is called.
The output of the preceding program is the following:
Number of benzene atoms: 6
Number of phenol atoms: 13
The second line, Number of phenol atoms: 13
, will be surprising to
some. The behavior of the OEParseSmiles
function is to
add the given SMILES to the current molecule. OEChem TK provides a
mechanism for reusing a molecule by calling the
Clear
method.
Clear
deletes all atoms and bonds
of a molecule, thereby resetting a molecule into its original “empty”
state.
Clearing and reusing a molecule (OEParseSmiles)
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OEParseSmiles(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())
mol.Clear()
oechem.OEParseSmiles(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())
The output of the preceding program is the following
Number of benzene atoms: 6
Number of phenol atoms: 7
Using the Clear
method is
recommended, for example, when processing multiple molecules
sequentially in a database. Instead of requiring a new molecule to be
allocated and destroyed for each entry, the
Clear
method can be used to reset
a molecule to its initial “empty” state.
Unique Representation¶
It is sometimes useful to generate a unique representation of a
molecule for use as an identifier for a database key. The compact
nature of SMILES strings makes them ideal candidates for the
task. However, the same molecule can be represented by many different
SMILES strings. OEChem TK features an advanced algorithm for
generating a (unique) canonical isomeric SMILES string.
A canonical isomeric SMILES string can be generated from a molecule
by calling the OEMolToSmiles
or OEMolToCXSMILES
functions.
Creating a canonical isomeric SMILES string from a molecule
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "C1=CC=CC=C1")
print("Canonical isomeric SMILES is", oechem.OEMolToSmiles(mol))
The output of the preceding program is the following:
Canonical isomeric SMILES is c1ccccc1
The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical isomeric SMILES to standard output.
Creating canonical isomeric SMILES strings
from openeye import oechem
import sys
for smi in sys.stdin:
mol = oechem.OEGraphMol()
smi = smi.strip()
if oechem.OESmilesToMol(mol, smi):
print(oechem.OEMolToSmiles(mol))
else:
oechem.OEThrow.Warning("%s is an invalid SMILES!" % smi)
input |
output (canonical isomeric SMILES) |
---|---|
C1CCCN[C@@H]1(O) |
C1CCN[C@@H](C1)O |
C1CN[C@H](O)CC1 |
C1CCN[C@@H](C1)O |
C1CC[C@H](O)CC1 |
C1CC[C@@H](CC1)O |
C1CCC(O)CC1 |
C1CCC(CC1)O |
C1=NC=CN1C[C@H](N)C(=O)O |
c1cn(cn1)C[C@@H](C(=O)O)N |
The OEMolToSmiles
and OEMolToCXSMILES
functions
are considered a high-level functions.
Prior to creating the canonical isomeric SMILES, the
OEMolToSmiles
function perceives the following properties
if necessary:
the rings of the molecule, by using
OEFindRingAtomsAndBonds
.the aromaticity of the molecule, by calling the
OEAssignAromaticFlags
. function using theOEChem_OEAroModelOpenEye
aromaticity modelthe atom and bond stereochemistry.
It is also possible to generate canonical SMILES without
isomeric information by using the OECreateCanSmiString
low-level function.
As was shown in the Construction from SMILES section,
OEParseSmiles
preserves the aromaticity present in
the input SMILES string. The function
OEAssignAromaticFlags
has to be used to perceive
aromaticity in a molecule.
Creating canonical SMILES strings
from openeye import oechem
import sys
mol = oechem.OEGraphMol()
for smi in sys.stdin:
mol.Clear()
smi = smi.strip()
if oechem.OEParseSmiles(mol, smi):
oechem.OEAssignAromaticFlags(mol)
print(oechem.OECreateCanSmiString(mol))
else:
oechem.OEThrow.Warning("%s is an invalid SMILES!" % smi)
Notice that the preceding program does not construct and destruct
molecules each time through the loop, but rather uses the
Clear
function to reuse the
molecule. If the line mol.Clear()
were removed from the program,
the output would contain longer and longer SMILES containing
disconnected fragments. See section
Reuse for more details.
input |
output (canonical SMILES) |
---|---|
c1cccnc1(O) |
c1ccnc(c1)O |
C1=CC=CC=C1 |
c1ccccc1 |
C1=CN=CC=C1 |
c1ccncc1 |
C1=CC=CC=N1 |
c1ccncc1 |
C1=NC=CN1CCC(=O)O |
c1cn(cn1)CCC(=O)O |
Hint
We highly recommend the usage of the OEMolToSmiles
function when creating a SMILES string.
We highly recommend the usage of the OEMolToCXSMILES
function when structures may contain enhanced stereogroup information.
See also
InChI¶
Canonical SMILES is not the only unique representation available.
The IUPAC International Chemical Identifier (InChI), and its corresponding
hashkey representation (InChIKey) are also unique to the compound
they describe [InChI-2013].
InChIs can be created from molecules using the OECreateInChI
,
OEMolToInChI
, or OEMolToSTDInChI
functions.
Creating standard InChI
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
print(oechem.OEMolToSTDInChI(mol))
input (SMILES) |
output (Standard InChI) |
---|---|
c1ccnc(c1)O |
InChI=1S/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7) |
Note
The ‘S’ in ‘InChI=1S’ denotes standard InChI.
The following slightly more complicated example reads InChI strings from standard input and writes InChI strings to standard output.
Reading and writing InChI strings
from openeye import oechem
import sys
for inchi in sys.stdin:
mol = oechem.OEGraphMol()
inchi = inchi.strip()
if oechem.OEInChIToMol(mol, inchi):
print(oechem.OEMolToInChI(mol))
else:
oechem.OEThrow.Warning("%s is an invalid INCHI!" % inchi)
A nonstandard InChI can be generated by passing in an OEInChIOptions
object to the OECreateInChI
function.
The options available are documented in the OEInChIOptions class.
Creating nonstandard InChI strings
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
opts = oechem.OEInChIOptions()
opts.SetFixedHLayer(True)
print(oechem.OECreateInChI(mol, opts))
input (SMILES) |
output (nonstandard InChI) |
---|---|
c1ccnc(c1)O |
InChI=1/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)/f/h7H |
The 27-character-long InChIKey is made of three parts connected by hyphens. The first part is 14 characters long and is based on the connectivity and proton layers of an InChI string. The second part, contains 9 characters that are related to all other InChI layers (isotopes, stereochemistry, etc.) and also contains the version of InChI and its standard/nonstandard property in the last two characters. The third part is one letter, describing the (de)protonation layer of the original InChI.
CreateInChIKey
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
print(oechem.OEMolToSTDInChIKey(mol))
input (SMILES) |
output (InChI Key) |
---|---|
c1ccnc(c1)O |
UBQKCCHYAOITMY-UHFFFAOYSA-N |