Molecules¶
The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem TK‘s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem TK. Much of an OEGraphMol‘s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.
See also
An OEGraphMol contains atoms and bonds. Their access is discussed in chapter Atom and Bond Traversal.
Construction and Destruction¶
The example below represents the smallest possible Python OEChem TK program. This program creates an OEGraphMol called mol when run. When the program ends, Python automatically cleans up the molecule when there are no more references to it.
Create a molecule
from openeye import oechem
mol = oechem.OEGraphMol()
There may be times when you want to delete (destroy) a molecule before the end of the script. This can be done by using the built-in command, del.
Destroy a molecule
from openeye import oechem
mol = oechem.OEGraphMol()
del mol
Construction from SMILES¶
A common method of creating a molecule in OEChem TK is via the SMILES representation. SMILES notation is commonly used in chemical information systems, as it provides a convenient string representation of a molecule. An introduction to SMILES syntax is provided in chapter SMILES Line Notation. The following examples will use the SMILES c1ccccc1 which describes the molecule benzene. A molecule can be created from a SMILES string using the OESmilesToMol function.
Creating a molecule from a SMILES string (version 1)
from openeye import oechem
# create a new molecule
mol = oechem.OEGraphMol()
# convert the SMILES string into a molecule
oechem.OESmilesToMol(mol, "c1ccccc1")
The OESmilesToMol function returns a boolean value indicating whether the input string was a valid SMILES representation of a molecule. It is good programming practice to check the return value and report an error message if anything went wrong. The following example shows adding a check on the return status of OESmilesToMol and prints an error message if the string was not a valid SMILES representation of a molecule.
Creating a molecule from a SMILES string (version 2)
from openeye import oechem
# create a new molecule
mol = oechem.OEGraphMol()
# convert the SMILES string into a molecule
if oechem.OESmilesToMol(mol, "c1ccccc1"):
# do something interesting with mol
pass
else:
print("SMILES string was invalid!")
The OESmilesToMol is considered a high-level function. Additional to parsing the given SMILES string, the OESmilesToMol function also perceives:
- the rings of the molecule by invoking the OEFindRingAtomsAndBonds function
- the aromaticity of the molecule by calling the OEAssignAromaticFlags function using the OEChem_OEAroModelOpenEye aromaticity model
- the chirality of the molecule by calling the OEPerceiveChiral function
In cases where the aromaticity of the SMILES string (or the lack of it) is wanted to be preserved, a low-level OEParseSmiles function can be used. For example, if benzene is expressed as c1ccccc1 all atoms and bonds are marked as aromatic. But if it is expressed as a Kekulé form, C1=CC=CC=C1, all atoms and bonds are kept aliphatic. The aromaticity of the molecule can be perceived by calling the OEAssignAromaticFlags function.
Creating molecules from a SMILES string (version 3)
from openeye import oechem
mol = oechem.OEGraphMol()
if not oechem.OEParseSmiles(mol, "C1=CC=CC=C1"):
print("SMILES string was invalid!")
print("Number of aromatic atoms =", oechem.OECount(mol, oechem.OEIsAromaticAtom()))
oechem.OEAssignAromaticFlags(mol)
print("Number of aromatic atoms =", oechem.OECount(mol, oechem.OEIsAromaticAtom()))
The output of the preceding program is the following:
Number of aromatic atoms = 0
Number of aromatic atoms = 6
Hint
We highly recommend the use of the OESmilesToMol function when creating a molecule from a SMILES string.
See also
- Aromaticity Perception chapter for further information about aromaticity models
Reuse¶
Consider the following code examples to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.
Reusing a molecule (OESmilesToMol)
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())
oechem.OESmilesToMol(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())
The high-level OESmilesToMol function automatically clears the molecule before parsing the SMILES string. The output of the preceding program is the following:
Number of benzene atoms: 6
Number of phenol atoms: 7
Reusing a molecule (OEParseSmiles)
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OEParseSmiles(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())
oechem.OEParseSmiles(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())
In the second example the low-level OEParseSmiles function is called. The output of the preceding program is the following:
Number of benzene atoms: 6
Number of phenol atoms: 13
The second line, Number of phenol atoms: 13, will be surprising to some. The behavior of the OEParseSmiles function is to add the given SMILES to the current molecule. OEChem TK provides a mechanism for reusing a molecule by calling the Clear method. Clear deletes all atoms and bonds of a molecule, thereby resetting a molecule into its original “empty” state.
Clearing and reusing a molecule (OEParseSmiles)
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OEParseSmiles(mol, "c1ccccc1")
print("Number of benzene atoms:", mol.NumAtoms())
mol.Clear()
oechem.OEParseSmiles(mol, "c1ccccc1O")
print("Number of phenol atoms:", mol.NumAtoms())
The output of the preceding program is the following
Number of benzene atoms: 6
Number of phenol atoms: 7
Using the Clear method is recommended, for example, when processing multiple molecules sequentially in a database. Instead of requiring a new molecule to be allocated and destroyed for each entry, the Clear method can be used to reset a molecule to its initial “empty” state.
Unique Representation¶
It is sometimes useful to generate a unique representation of a molecule for use as an identifier for a database key. The compact nature of SMILES strings make them an ideal candidate for the task. However, the same molecule can be represented by many different SMILES strings. OEChem TK features an advanced algorithm for generating a (unique) canonical isomeric SMILES string. A canonical isomeric SMILES string can be generated from a molecule by calling the OEMolToSmiles function.
Creating a canonical isomeric SMILES string from a molecule
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "C1=CC=CC=C1")
print("Canonical isomeric SMILES is", oechem.OEMolToSmiles(mol))
The output of the preceding program is the following:
Canonical isomeric SMILES is c1ccccc1
The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical isomeric SMILES to standard output.
Creating canonical isomeric SMILES strings
from openeye import oechem
import sys
for smi in sys.stdin:
mol = oechem.OEGraphMol()
smi = smi.strip()
if oechem.OESmilesToMol(mol, smi):
print(oechem.OEMolToSmiles(mol))
else:
oechem.OEThrow.Warning("%s is an invalid SMILES!" % smi)
input | output (canonical isomeric SMILES) |
---|---|
C1CCCN[C@@H]1(O) | C1CCN[C@@H](C1)O |
C1CN[C@H](O)CC1 | C1CCN[C@@H](C1)O |
C1CC[C@H](O)CC1 | C1CC[C@@H](CC1)O |
C1CCC(O)CC1 | C1CCC(CC1)O |
C1=NC=CN1C[C@H](N)C(=O)O | c1cn(cn1)C[C@@H](C(=O)O)N |
The OEMolToSmiles is also considered a high-level function. Prior to creating the canonical isomeric SMILES, the OEMolToSmiles function perceives the following properties if necessary:
- the rings of the molecule by using OEFindRingAtomsAndBonds
- the aromaticity of the molecule by calling the OEAssignAromaticFlags function using the OEChem_OEAroModelOpenEye aromaticity model
- the atom and bond stereochemistry
It is also possible to generate canonical SMILES without isomeric information by using the OECreateCanSmiString low-level function. As was shown in the Construction from SMILES section, OEParseSmiles preserves the aromaticity present in the input SMILES string. The function OEAssignAromaticFlags has to be used to perceive aromaticity in a molecule.
Creating canonical SMILES strings
from openeye import oechem
import sys
mol = oechem.OEGraphMol()
for smi in sys.stdin:
mol.Clear()
smi = smi.strip()
if oechem.OEParseSmiles(mol, smi):
oechem.OEAssignAromaticFlags(mol)
print(oechem.OECreateCanSmiString(mol))
else:
oechem.OEThrow.Warning("%s is an invalid SMILES!" % smi)
Notice that the preceding program does not construct and destruct molecules each time through the loop, but rather uses the Clear function to reuse the molecule. If the line mol.Clear() were removed from the program, the output would contain longer and longer SMILES containing disconnected fragments, see section Reuse for more details.
input | output (canonical SMILES) |
---|---|
c1cccnc1(O) | c1ccnc(c1)O |
C1=CC=CC=C1 | c1ccccc1 |
C1=CN=CC=C1 | c1ccncc1 |
C1=CC=CC=N1 | c1ccncc1 |
C1=NC=CN1CCC(=O)O | c1cn(cn1)CCC(=O)O |
Hint
We highly recommend the usage of the OEMolToSmiles function when creating a SMILES string.
InChI¶
Canonical SMILES are not the only unique representation available, the IUPAC International Chemical Identifier (InChI), and its corresponding hashkey representation (InChIKey) are also unique to the compound they describe [InChI-2013]. InChIs can be created from molecules using the OECreateInChI, OEMolToInChI, or OEMolToSTDInChI functions.
Creating standard InChI
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
print(oechem.OEMolToSTDInChI(mol))
input (SMILES) | output (Standard InChI) |
---|---|
c1ccnc(c1)O | InChI=1S/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7) |
Note
The ‘S’ in ‘InChI=1S’ denotes standard InChI.
The following slightly more complicated example reads InChI strings from standard input and writes InChI strings to standard output.
Reading and writing InChI strings
from openeye import oechem
import sys
for inchi in sys.stdin:
mol = oechem.OEGraphMol()
inchi = inchi.strip()
if oechem.OEInChIToMol(mol, inchi):
print(oechem.OEMolToInChI(mol))
else:
oechem.OEThrow.Warning("%s is an invalid INCHI!" % inchi)
A non-standard InChI can be generated by passing in an OEInChIOptions object to the OECreateInChI function. The options available are documented in the OEInChIOptions class.
Creating non-standard InChI strings
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
opts = oechem.OEInChIOptions()
opts.SetFixedHLayer(True)
print(oechem.OECreateInChI(mol, opts))
input (SMILES) | output (non-standard InChI) |
---|---|
c1ccnc(c1)O | InChI=1/C5H5NO/c7-5-3-1-2-4-6-5/h1-4H,(H,6,7)/f/h7H |
A more compact version of InChI, is the InChIKey, a 27 character representation. The first 14 characters are the result of hashing the InChI’s connectivity information followed by a hyphen and 9 characters detailing the remaining layers of the InChI.
CreateInChIKey
from openeye import oechem
mol = oechem.OEGraphMol()
oechem.OESmilesToMol(mol, "c1ccnc(c1)O")
print(oechem.OEMolToSTDInChIKey(mol))
input (SMILES) | output (InChI Key) |
---|---|
c1ccnc(c1)O | UBQKCCHYAOITMY-UHFFFAOYSA-N |