Examples

All filtering operations are controlled via the OEFilter object. The OEFilter object is typically configured with a specified filter and then applied iteratively over a molecule file. Listing 1 demonstrates configuring the OEFilter object with the default lead-like filter and then writes out the molecules that pass the filter. The OEFilter object’s OEFilter.operator() method is used to test whether the molecule passes the filter.

Note

The molecule will also be altered by all the specified Filter Preprocessing steps.

Listing 1: Basic filtering for lead-like molecules

/**************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 *************************************************************/
package openeye.examples.oemolprop;

import openeye.oechem.*;
import openeye.oemolprop.*;

public class LeadFilter {

    public static void main(String[] argv) {
        if (argv.length != 2)
            oechem.OEThrow.Usage("LeadFilter <input> <output>");

        oemolistream ifs = new oemolistream();
        if (!ifs.open(argv[0]))
            oechem.OEThrow.Fatal("Unable to open " + argv[0]);

        oemolostream ofs = new oemolostream();
        if (!ofs.open(argv[1]))
            oechem.OEThrow.Fatal("Unable to create " + argv[1]);

        OEFilter filter = new OEFilter(OEFilterType.Lead);

        OEGraphMol mol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, mol))
            if (filter.call(mol))
                oechem.OEWriteMolecule(ofs, mol);
        ifs.close();
        ofs.close();
    }
}

By default the OEFilter.operator() method will print information to OEThrow about every molecule passed to it. The following is the format for what is printed.

[Isomeric SMILES]\t[Title],[Pass|Reason for failure]

The following is an example of this output:

CC1=CC(=O)C=CC1=O       NSC 1,Minimum atom count(10) not reached: 9
c1ccc2c(c1)nc(s2)SSc3nc4ccccc4s3        NSC 2,Maximum disulfide(0) exceeded: 1
c1c(cc(c(c1[N+](=O)[O-])[O-])Cl)[N+](=O)[O-]    NSC 3,Maximum heteroatom to carbon ratio(1.10) exceeded: 1.33
c1c(sc(n1)N)[N+](=O)[O-]        NSC 4,Minimum atom count(10) not reached: 9
c1ccc2c(c1)C(=O)c3ccc(cc3C2=O)N NSC 5,Maximum dye(0) exceeded: 2
c1ccc(c(c1)c2c3ccc(c(c3oc-4c(c(=O)ccc24)Br)Br)O)C(=O)[O-]       NSC 6,Maximum atom count(25) exceeded: 27
C[NH+](C)C1=C(C(=O)c2ccccc2C1=O)Cl      NSC 7,Maximum alkyl_halide(0) exceeded: 1
Cc1ccc2c(c1[N+](=O)[O-])C(=O)c3ccccc3C2=O       NSC 8,Maximum nitro(0) exceeded: 1
CC(C)(C)c1cc(c(cc1O)C(C)(C)C)O  NSC 11,Pass
CC1=NN(C(=O)C1)c2ccccc2 NSC 12,Pass

Quiet Filtering

The above log output can be superfluous when using the OEFilter object in a more complex program. Since all the log output is written to the OEThrow object the verbosity level can be lowered by using the OEThrow.SetLevel method. Listing 2 demonstrates setting the OEThrow error level to OEErrorLevel.Warning. This will only allow messages at the level of Warning or above through, thereby silencing the OEFilter object’s logging output.

Listing 2: Silencing the OEFilter logging messages

/**************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 *************************************************************/
package openeye.examples.oemolprop;

import openeye.oechem.*;
import openeye.oemolprop.*;

public class QuietFilter {

    public static void main(String[] argv) {
        if (argv.length != 2)
            oechem.OEThrow.Usage("Filter <input> <output>");

        oemolistream ifs = new oemolistream();
        if (!ifs.open(argv[0]))
            oechem.OEThrow.Fatal("Unable to open " + argv[0]);

        oemolostream ofs = new oemolostream();
        if (!ofs.open(argv[1]))
            oechem.OEThrow.Fatal("Unable to create " + argv[1]);

        OEFilter filter = new OEFilter(OEFilterType.Lead);

        oechem.OEThrow.SetLevel(OEErrorLevel.Warning);

        OEGraphMol mol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, mol))
            if (filter.call(mol))
                oechem.OEWriteMolecule(ofs, mol);
        ofs.close();
        ifs.close();
    }
}

Molecular Property Table

The OEFilter object allows for the calculation of all the molecular properties it uses during the filtering process without actually applying the filter. You may find this useful for caching the OEFilter object results into a database. The OEFilter.SetTable method can be used to specify where to write a tab-delimited table of every property in the associated filter file. Listing 3 demonstrates how to write the tabular output to standard out.

Listing 3: Generating tabular output of all molecular properties

/**************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 *************************************************************/
package openeye.examples.oemolprop;

import openeye.oechem.*;
import openeye.oemolprop.*;

public class MolPropTable {

    public static void main(String[] argv) {
        if (argv.length != 1)
            oechem.OEThrow.Usage("MolPropTable <input>");

        oemolistream ifs = new oemolistream();
        if (!ifs.open(argv[0]))
            oechem.OEThrow.Fatal("Unable to open " + argv[0]);

        OEFilter filter = new OEFilter(OEFilterType.Lead);
        oechem.OEThrow.SetLevel(OEErrorLevel.Warning);

        filter.SetTable(oechem.getOeout());

        OEGraphMol mol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, mol))
            filter.call(mol);
        ifs.close();
    }
}

Note

A tab-delimited file format was chosen to integrate better with Unix commandline utilities. For example, this allows for quick and easy filter experimentation with the awk command line utility. The following snippet will print the number of molecules (plus one for the header) with a molecular weight greater than 200.

> molproptable drugs.sdf > drugs.txt
> cat drugs.txt | awk -F\t '{if ($12 > 200) { print $12 }}' | wc -l

Furthermore, all database programs have utilities for importing tab-delimited files. Loading the filter results into a third-party database would provide very speedy filter experimentation since the properties would only have to be calculated once and then cached in the database.

Specific Molecular Properties

The OEFilter object table can also be used to retrieve specific molecular properties or filtering terms for which a free function does not exist. The number of molecular properties calculated can be staggering. Furthermore, some properties are more experimental than others. For these reasons free functions are not provided for everything that can be calculated. Listing 4 demonstrates how to extract the number of Lipinski violations from the OEFilter object molecular property table by writing the table to a oeosstream.

Warning

The exact number of fields the OEFilter object outputs depends on the filter file being used. The field will only be output if the filter rule is turned on. That is why the example in Listing 4 will search the header line for the position of the “Lipinski violations” field. This key-value lookup is recommended for extracting specific fields from the filter table.

Listing 4: Extract the number of Lipinski violations from the table output

/**************************************************************
 * Copyright 2004-2013 OpenEye Scientific Software, Inc.
 *************************************************************/
package openeye.examples.oemolprop;

import java.util.*;
import openeye.oechem.*;
import openeye.oemolprop.*;

public class SpecificMolProp {

    public static void main(String[] argv) {
        if (argv.length != 1)
            oechem.OEThrow.Usage("SpecificMolProp <input>");

        oemolistream ifs = new oemolistream();
        if (!ifs.open(argv[0]))
            oechem.OEThrow.Fatal("Unable to open " + argv[0]);

        OEFilter filter = new OEFilter(OEFilterType.Lead);
        oechem.OEThrow.SetLevel(OEErrorLevel.Warning);

        oeosstream ostr = new oeosstream();
        filter.SetTable(ostr);

        List<String> fields = Arrays.asList(ostr.str().split("\t"));
        ostr.clear(); // remove the header row from the stream

        int fieldidx = fields.indexOf("Lipinski violations");

        OEGraphMol mol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, mol)) {
            filter.call(mol);

            fields = Arrays.asList(ostr.str().split("\t"));
            ostr.clear(); // remove this row from the stream

            System.out.println(mol.GetTitle() + " " + fields.get(fieldidx));
        }
        ostr.close();
        ifs.close();
    }
}

SD Data Molecular Properties

The molecular properties OEFilter calculates can be attached as SD data to the molecule. Listing 5 demonstrates how to use the OEFilter.SetSDTag method to attach the calculated molecular properties to molecules passed to the filter object.

It is very important to remember that SD tag data is highly correlated with the tabular data functionality. The SD data tag names are exactly the same as the name found in the table header. The SD data values are exactly what is output in the data table. The only exceptions are the “SMILES”, “Name”, and “Filter” columns since this information can be obtained other ways.

Warning

Only the properties up to the failing property will be attached by default. Use OEFilter.SetTable to force every property to be attached as SD data. The same rule applies to SD data as tabular output data, only the properties specified in the filter file will be attached as SD data.

Listing 5 demonstrates how to set the table output to the oenul output stream to allow all properties to be attached as SD data, but not actually write the tabular data anywhere.

Listing 5: Attach molecular properties as SD data

/**************************************************************
 * Copyright 2004-2014 OpenEye Scientific Software, Inc.
 *************************************************************/
package openeye.examples.oemolprop;

import openeye.oechem.*;
import openeye.oemolprop.*;

public class MolPropSDData {

    public static void main(String[] argv) {
        if (argv.length != 2)
            oechem.OEThrow.Usage("MolPropSDData <input> <output>");

        oemolistream ifs = new oemolistream();
        if (!ifs.open(argv[0]))
            oechem.OEThrow.Fatal("Unable to open " + argv[0]);

        oemolostream ofs = new oemolostream();
        if (!ofs.open(argv[1]))
            oechem.OEThrow.Fatal("Unable to create " + argv[1]);

        int fmt = ofs.GetFormat();
        if (fmt != OEFormat.SDF && fmt != OEFormat.OEB && fmt != OEFormat.CSV)
          oechem.OEThrow.Fatal("Only SD, OEB, and CSV formats preserve SD data");

        OEFilter filter = new OEFilter(OEFilterType.Lead);
        oechem.OEThrow.SetLevel(OEErrorLevel.Warning);

        filter.SetTable(oechem.getOenul());
        filter.SetSDTag(true);

        OEGraphMol mol = new OEGraphMol();
        while (oechem.OEReadMolecule(ifs, mol)) {
            filter.call(mol);
            oechem.OEWriteMolecule(ofs, mol);
        }
        ifs.close();
        ofs.close();
    }
}