Basic Filtering for a Molecule File

All filtering operations are controlled via the OEFilter object. The OEFilter object is typically configured with a specified filter and then applied iteratively over a molecule file. This example demonstrates configuring the OEFilter object with the lead-like filter and then writes out the molecules that pass the filter. The OEFilter.operator() method is used to test whether the molecule passes the filter.

Note

The molecule will also be altered by all the specified Filter Preprocessing steps.

Command Line Interface

A description of the command line interface can be obtained by executing the program with the –help argument.

prompt> python molfilter.py --help

will generate the following output:

Simple parameter list
 filter options :
   -filtertype : filter type

 input/output options :
   -in : Input filename
   -out : Output filename
Usage: ./molfilter <input> <output>

Code

Download code

molfilter.py

Examples

prompt> python molfilter.py -in mcss.smi.gz -out .smi -filtertype Lead

The following is an example of the output:

CC1=CC(=O)C=CC1=O    NSC 1,Minimum atom count(10) not reached: 9
c1ccc2c(c1)nc(s2)SSc3nc4ccccc4s3    NSC 2,Maximum disulfide(0) exceeded: 1
c1c(cc(c(c1[N+](=O)[O-])[O-])Cl)[N+](=O)[O-]    NSC 3,Maximum heteroatom to carbon ratio(1.10) exceeded: 1.33
c1c(sc(n1)N)[N+](=O)[O-]    NSC 4,Minimum atom count(10) not reached: 9
c1ccc2c(c1)C(=O)c3ccc(cc3C2=O)N     NSC 5,Maximum dye(0) exceeded: 2
c1ccc(c(c1)c2c3ccc(c(c3oc-4c(c(=O)ccc24)Br)Br)O)C(=O)[O-]    NSC 6,Maximum atom count(25) exceeded: 27
C[NH+](C)C1=C(C(=O)c2ccccc2C1=O)Cl    NSC 7,Maximum alkyl_halide(0) exceeded: 1
Cc1ccc2c(c1[N+](=O)[O-])C(=O)c3ccccc3C2=O    NSC 8,Maximum nitro(0) exceeded: 1
CC(C)(C)c1cc(c(cc1O)C(C)(C)C)O    NSC 11,Pass
CC1=NN(C(=O)C1)c2ccccc2 NSC    12,Pass

By default the OEFilter.operator() method will emit information to OEThrow about every molecule passed to it. This example prints that to the screen. The next example, quietfilter, shows how to suppress that output. Only the molecules that pass the filter will be written to the output file. The following is the format for what is emitted:

[Isomeric SMILES]\t[Title],[Pass|Reason for failure]