Examples

All filtering operations are controlled via the OEFilter object. The OEFilter object is typically configured with a specified filter and then applied iteratively over a molecule file. Listing 1 demonstrates configuring the OEFilter object with the default lead-like filter and then writes out the molecules that pass the filter. The OEFilter object’s OEFilter.operator() method is used to test whether the molecule passes the filter.

Note

The molecule will also be altered by all the specified Filter Preprocessing steps.

Listing 1: Basic filtering for lead-like molecules

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2009 OpenEye Scientific Software, Inc.
#############################################################################
### Filter a molecule file for "Lead-like" molecules
#############################################################################
import sys
from openeye.oechem import *
from openeye.oemolprop import *

def main(argv = [__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <input> <output>" % argv[0])

    ifs = oemolistream()
    if not ifs.open(argv[1]):
        OEThrow.Fatal("Unable to open %s" % argv[1])

    ofs = oemolostream()
    if not ofs.open(argv[2]):
        OEThrow.Fatal("Unable to create %s" % argv[2])

    filt = OEFilter(OEFilterType_Lead)

    for mol in ifs.GetOEGraphMols():
        if filt(mol):
            OEWriteMolecule(ofs, mol)

if __name__ == "__main__":
    sys.exit(main(sys.argv))

By default the OEFilter.operator() method will print information to OEThrow about every molecule passed to it. The following is the format for what is printed.

[Isomeric SMILES]\t[Title],[Pass|Reason for failure]

The following is an example of this output:

CC1=CC(=O)C=CC1=O       NSC 1,Minimum atom count(10) not reached: 9
c1ccc2c(c1)nc(s2)SSc3nc4ccccc4s3        NSC 2,Maximum disulfide(0) exceeded: 1
c1c(cc(c(c1[N+](=O)[O-])[O-])Cl)[N+](=O)[O-]    NSC 3,Maximum heteroatom to carbon ratio(1.10) exceeded: 1.33
c1c(sc(n1)N)[N+](=O)[O-]        NSC 4,Minimum atom count(10) not reached: 9
c1ccc2c(c1)C(=O)c3ccc(cc3C2=O)N NSC 5,Maximum dye(0) exceeded: 2
c1ccc(c(c1)c2c3ccc(c(c3oc-4c(c(=O)ccc24)Br)Br)O)C(=O)[O-]       NSC 6,Maximum atom count(25) exceeded: 27
C[NH+](C)C1=C(C(=O)c2ccccc2C1=O)Cl      NSC 7,Maximum alkyl_halide(0) exceeded: 1
Cc1ccc2c(c1[N+](=O)[O-])C(=O)c3ccccc3C2=O       NSC 8,Maximum nitro(0) exceeded: 1
CC(C)(C)c1cc(c(cc1O)C(C)(C)C)O  NSC 11,Pass
CC1=NN(C(=O)C1)c2ccccc2 NSC 12,Pass

Quiet Filtering

The above log output can be superfluous when using the OEFilter object in a more complex program. Since all the log output is written to the OEThrow object the verbosity level can be lowered by using the OEThrow.SetLevel method. Listing 2 demonstrates setting the OEThrow error level to OEErrorLevel_Warning. This will only allow messages at the level of Warning or above through, thereby silencing the OEFilter object’s logging output.

Listing 2: Silencing the OEFilter logging messages

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2009 OpenEye Scientific Software, Inc.
#############################################################################
###  Quietly filter a molecule file for "Lead-like" molecules
#############################################################################
import sys
from openeye.oechem import *
from openeye.oemolprop import *

def main(argv = [__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <input> <output>" % argv[0])

    ifs = oemolistream()
    if not ifs.open(argv[1]):
        OEThrow.Fatal("Unable to open %s" % argv[1])

    ofs = oemolostream()
    if not ofs.open(argv[2]):
        OEThrow.Fatal("Unable to create %s" % argv[2])

    filt = OEFilter(OEFilterType_Lead)

    OEThrow.SetLevel(OEErrorLevel_Warning)

    for mol in ifs.GetOEGraphMols():
        if filt(mol):
            OEWriteMolecule(ofs, mol)

if __name__ == "__main__":
    sys.exit(main(sys.argv))

Molecular Property Table

The OEFilter object allows for the calculation of all the molecular properties it uses during the filtering process without actually applying the filter. You may find this useful for caching the OEFilter object results into a database. The OEFilter.SetTable method can be used to specify where to write a tab-delimited table of every property in the associated filter file. Listing 3 demonstrates how to write the tabular output to standard out.

Listing 3: Generating tabular output of all molecular properties

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2009 OpenEye Scientific Software, Inc.
#############################################################################
###  Generate a tabular output of molecular properties
#############################################################################
import sys
from openeye.oechem import *
from openeye.oemolprop import *

def main(argv = [__name__]):
    if len(argv) != 2:
        OEThrow.Usage("%s <input>" % argv[0])

    ifs = oemolistream()
    if not ifs.open(argv[1]):
        OEThrow.Fatal("Unable to open %s" % argv[1])

    filt = OEFilter(OEFilterType_Lead)
    OEThrow.SetLevel(OEErrorLevel_Warning)
    
    pwnd = False
    filt.SetTable(oeout, pwnd)

    for mol in ifs.GetOEGraphMols():
        filt(mol)

if __name__ == "__main__":
    sys.exit(main(sys.argv))

Note

A tab-delimited file format was chosen to integrate better with Unix commandline utilities. For example, this allows for quick and easy filter experimentation with the awk command line utility. The following snippet will print the number of molecules (plus one for the header) with a molecular weight greater than 200.

> molproptable drugs.sdf > drugs.txt
> cat drugs.txt | awk -F\t '{if ($12 > 200) { print $12 }}' | wc -l

Furthermore, all database programs have utilities for importing tab-delimited files. Loading the filter results into a third-party database would provide very speedy filter experimentation since the properties would only have to be calculated once and then cached in the database.

Specific Molecular Properties

The OEFilter object table can also be used to retrieve specific molecular properties or filtering terms for which a free function does not exist. The number of molecular properties calculated can be staggering. Furthermore, some properties are more experimental than others. For these reasons free functions are not provided for everything that can be calculated. Listing 4 demonstrates how to extract the number of Lipinski violations from the OEFilter object molecular property table by writing the table to a oeosstream.

Warning

The exact number of fields the OEFilter object outputs depends on the filter file being used. The field will only be output if the filter rule is turned on. That is why the example in Listing 4 will search the header line for the position of the “Lipinski violations” field. This key-value lookup is recommended for extracting specific fields from the filter table.

Listing 4: Extract the number of Lipinski violations from the table output

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2009, 2014 OpenEye Scientific Software, Inc.
#############################################################################
###  Extract the number of Lipinski violations from the table output
#############################################################################
from __future__ import print_function
import sys
from openeye.oechem import *
from openeye.oemolprop import *

def main(argv = [__name__]):
    if len(argv) != 2:
        OEThrow.Usage("%s <input>" % argv[0])

    ifs = oemolistream()
    if not ifs.open(argv[1]):
        OEThrow.Fatal("Unable to open %s" % argv[1])

    filt = OEFilter(OEFilterType_Lead)
    OEThrow.SetLevel(OEErrorLevel_Warning)
    
    ostr = oeosstream()
    pwnd = False
    filt.SetTable(ostr, pwnd)

    headers = ostr.str().split(b'\t')
    ostr.clear() # remove the header row from the stream

    for mol in ifs.GetOEGraphMols():
        filt(mol)

        fields = ostr.str().decode("UTF-8").split('\t')
        ostr.clear() # remove this row from the stream 

        tmpdct = dict(zip(headers, fields))
        print (mol.GetTitle(), tmpdct[b"Lipinski violations"])

if __name__ == "__main__":
    sys.exit(main(sys.argv))

SD Data Molecular Properties

The molecular properties OEFilter calculates can be attached as SD data to the molecule. Listing 5 demonstrates how to use the OEFilter.SetSDTag method to attach the calculated molecular properties to molecules passed to the filter object.

It is very important to remember that SD tag data is highly correlated with the tabular data functionality. The SD data tag names are exactly the same as the name found in the table header. The SD data values are exactly what is output in the data table. The only exceptions are the “SMILES”, “Name”, and “Filter” columns since this information can be obtained other ways.

Warning

Only the properties up to the failing property will be attached by default. Use OEFilter.SetTable to force every property to be attached as SD data. The same rule applies to SD data as tabular output data, only the properties specified in the filter file will be attached as SD data.

Listing 5 demonstrates how to set the table output to the oenul output stream to allow all properties to be attached as SD data, but not actually write the tabular data anywhere.

Listing 5: Attach molecular properties as SD data

#!/usr/bin/env python
#############################################################################
#  Copyright (C) 2009-2014 OpenEye Scientific Software, Inc.
#############################################################################
###  Attach molecular properties as SD data
#############################################################################
import sys
from openeye.oechem import *
from openeye.oemolprop import *

def main(argv = [__name__]):
    if len(argv) != 3:
        OEThrow.Usage("%s <input> <output>" % argv[0])

    ifs = oemolistream()
    if not ifs.open(argv[1]):
        OEThrow.Fatal("Unable to open %s" % argv[1])

    ofs = oemolostream()
    if not ofs.open(argv[2]):
        OEThrow.Fatal("Unable to create %s" % argv[2])

    fmt = ofs.GetFormat()
    if fmt not in [OEFormat_SDF, OEFormat_OEB, OEFormat_CSV]:
        OEThrow.Fatal("Only SD, OEB, and CSV formats preserve SD data");

    filt = OEFilter(OEFilterType_Lead)
    OEThrow.SetLevel(OEErrorLevel_Warning)
    
    pwnd = False
    filt.SetTable(oenul, pwnd)
    filt.SetSDTag(True)

    for mol in ifs.GetOEGraphMols():
        filt(mol)
        OEWriteMolecule(ofs, mol)

if __name__ == "__main__":
    sys.exit(main(sys.argv))