Basic Filtering for a Molecule File

All filtering operations are controlled via the OEFilter object. The OEFilter object is typically configured with a specified filter and then applied iteratively over a molecule file. This example demonstrates configuring the OEFilter object with the lead-like filter and then writes out the molecules that pass the filter. The OEFilter.operator() method is used to test whether the molecule passes the filter.

Note

The molecule will also be altered by all the specified Filter Preprocessing steps.

Command Line Interface

A description of the command line interface can be obtained by executing the program with the –help argument.

prompt> python molfilter.py --help

will generate the following output:

Simple parameter list
 filter options :
   -filtertype : filter type

 input/output options :
   -in : Input filename
   -out : Output filename
Usage: ./molfilter <input> <output>

Code

Download code

molfilter.py

#!/usr/bin/env python
# (C) 2022 Cadence Design Systems, Inc. (Cadence) 
# All rights reserved.
# TERMS FOR USE OF SAMPLE CODE The software below ("Sample Code") is
# provided to current licensees or subscribers of Cadence products or
# SaaS offerings (each a "Customer").
# Customer is hereby permitted to use, copy, and modify the Sample Code,
# subject to these terms. Cadence claims no rights to Customer's
# modifications. Modification of Sample Code is at Customer's sole and
# exclusive risk. Sample Code may require Customer to have a then
# current license or subscription to the applicable Cadence offering.
# THE SAMPLE CODE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED.  OPENEYE DISCLAIMS ALL WARRANTIES, INCLUDING, BUT
# NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
# PARTICULAR PURPOSE AND NONINFRINGEMENT. In no event shall Cadence be
# liable for any damages or liability in connection with the Sample Code
# or its use.

#############################################################################
# Filter a molecule file for "Lead-like" molecules
#############################################################################
import sys
from openeye import oechem
from openeye import oemolprop


def main(argv=[__name__]):
    itf = oechem.OEInterface(InterfaceData)
    oemolprop.OEConfigureFilterParams(itf)

    if not oechem.OEParseCommandLine(itf, argv):
        oechem.OEThrow.Fatal("Unable to interpret command line!")

    iname = itf.GetString("-in")
    oname = itf.GetString("-out")

    ifs = oechem.oemolistream()
    if not ifs.open(iname):
        oechem.OEThrow.Fatal("Cannot open input file!")

    ofs = oechem.oemolostream()
    if not ofs.open(oname):
        oechem.OEThrow.Fatal("Cannot create output file!")

    ftype = oemolprop.OEGetFilterType(itf)
    filt = oemolprop.OEFilter(ftype)
    filt.SetErrorLevel(oechem.OEErrorLevel_Info)

    for mol in ifs.GetOEGraphMols():
        if filt(mol):
            oechem.OEWriteMolecule(ofs, mol)


#############################################################################
# INTERFACE
#############################################################################

InterfaceData = '''
!BRIEF [-in] <input> [-out] <output>

!CATEGORY "input/output options :"

    !PARAMETER -in
      !ALIAS -i
      !TYPE string
      !REQUIRED true
      !KEYLESS 1
      !VISIBILITY simple
      !BRIEF Input filename
    !END

    !PARAMETER -out
      !ALIAS -o
      !TYPE string
      !REQUIRED true
      !KEYLESS 2
      !VISIBILITY simple
      !BRIEF Output filename
    !END

!END
'''

if __name__ == "__main__":
    sys.exit(main(sys.argv))

Examples

prompt> python molfilter.py -in mcss.smi.gz -out .smi -filtertype Lead

The following is an example of the output:

CC1=CC(=O)C=CC1=O    NSC 1,Minimum atom count(10) not reached: 9
c1ccc2c(c1)nc(s2)SSc3nc4ccccc4s3    NSC 2,Maximum disulfide(0) exceeded: 1
c1c(cc(c(c1[N+](=O)[O-])[O-])Cl)[N+](=O)[O-]    NSC 3,Maximum heteroatom to carbon ratio(1.10) exceeded: 1.33
c1c(sc(n1)N)[N+](=O)[O-]    NSC 4,Minimum atom count(10) not reached: 9
c1ccc2c(c1)C(=O)c3ccc(cc3C2=O)N     NSC 5,Maximum dye(0) exceeded: 2
c1ccc(c(c1)c2c3ccc(c(c3oc-4c(c(=O)ccc24)Br)Br)O)C(=O)[O-]    NSC 6,Maximum atom count(25) exceeded: 27
C[NH+](C)C1=C(C(=O)c2ccccc2C1=O)Cl    NSC 7,Maximum alkyl_halide(0) exceeded: 1
Cc1ccc2c(c1[N+](=O)[O-])C(=O)c3ccccc3C2=O    NSC 8,Maximum nitro(0) exceeded: 1
CC(C)(C)c1cc(c(cc1O)C(C)(C)C)O    NSC 11,Pass
CC1=NN(C(=O)C1)c2ccccc2 NSC    12,Pass

By default the OEFilter.operator() method will emit information to OEThrow about every molecule passed to it. This example prints that to the screen. The next example, quietfilter, shows how to suppress that output. Only the molecules that pass the filter will be written to the output file. The following is the format for what is emitted:

[Isomeric SMILES]\t[Title],[Pass|Reason for failure]