FILTER Application Usage

Command Line Interface

A description of the command line interface can be obtained by executing FILTER with the --help option.

prompt> filter --help

will generate the following output:

Help functions:
  filter --help simple      : Get a list of simple parameters
  filter --help all         : Get a complete list of parameters
  filter --help <parameter> : Get detailed help on a parameter
  filter --help html        : Create an html help file for this program

Required Parameters

-in
File containing one or more molecular connection tables from which you would like to remove the non-medicinal compounds.
-out
File to fill with the molecules that pass all of the specified filters.

Execute Options

-param
The argument for this flag is the name of a file containing control parameters. The control parameter file acts to either replace or augment the command line interface. All parameters necessary for program execution may be provided in the control parameter file, although any command given explicitly on the command line will supersede options found in the parameter file. FILTER generates a new parameter file containing the full set of execution parameters upon every execution. The name of the parameter file written by FILTER is created by combining the prefix base name with the ‘.param’ extension.

Optional Parameters

-filter
This optional parameter specifies a filter file to be used in place of the default filter. If only simple additions to the default filters are desired, please see the -newrule parameter. The file format for this file is described in the Filter Files chapter. There are two special reserved strings for this parameter. If the -filter parameter is “lead”, then the default lead-based filter will be used. If the -filter parameter is “drug”, then the default drug-based filter will be used. [default = blockbuster]
-fail
This specifies an optional molecular output file where the molecules that fail to pass the filter will be written. If this parameter is specified, then every molecule from -in will either be written to -out or to -fail. [default = null]
-prefix
For an execution of the FILTER program, three general purpose files are written in addition to the output file specified on the command-line. These file are the “info”, “log” and “param” files. Normally, they all begin with the prefix “filter”. However, this can be overridden with the -prefix parameter. This is particularly useful if you want to run multiple FILTER jobs in the same directory without overwriting files.
-info
Normally, FILTER writes an “info” file during the progress of any execution. At regular intervals during the execution, the info file is updated to reflect the most recent progress. If you are interested in seeing the progress of a run, it is best to either use the -dots parameter or look at the info file. Normally, the info file is saved as filename.info where “filename” is the prefix specified by the -prefix flag. However, one may use the -info flag to specify an info file with a separate name. [default = null]
-newrule
This optional parameter can specify a file that contains filter rules to supplement the default filter or the filter specified with the -filter parameter. This parameter can be used to extend the functional group list used to filter. New filter rules can also be added directly to the filter file specified with -filter. [default = null]
-typecheck
This boolean flag controls whether the valence states of atoms will be checked. This check identifies molecules that are poorly specified, or represent nonsensical chemical states. For example, an oxygen with eight hydrogens attached or a carbon with a +9 formal charge would be rejected. [default = true]
-select
This parameter is a SMARTS string that allows a user to require a specific functional group or substructure be present in all molecules that pass the filter. This feature is particularly useful for identification of reagents for library design. Selection items can also be added directly to the filter file specified with -filter. The command-line argument only allows specification of the SMARTS pattern, and exactly one copy of that functional group is required. If a user wants to specify a selection SMARTS with minimum and maximum number of occurrences other than 1, then they can use a SELECT statement inside the filter file. [default = null]
-log
This flag determines where the logging information is written. The logging information includes a listing of the filter used, followed by a one line comment about why each molecule failed, or if it passed, an assessment of the probability that the compound lies in drug-like space.
-table
This flag specifies a file for a tab-separated format table that includes all of the filter data. These data files are ready for import into a spreadsheet program for easy examination. Each column of the table includes one of the filter categories (such as “Molecular Weight”) and each row of the table corresponds to a single molecule. The table contains complete entries for all of the molecules in the input file regardless of whether they pass or fail. NOTE: Setting this flag will cause the program to slow down. [default = null]
-tableFlag
This flag specifies that, if a table is being written, any values in the table that would cause a molecule to fail a filter will be flagged with an asterisk. This provides a means of seeing all the filters a molecule might fail, as the log file typically only provides the first failure. [default = false]
-interval
This is the interval at which data is written to the filter.info file. The filter.info file contains running totals that are relevant to a FILTER run. Examining the filter.info file is the best means of checking on the progress of a FILTER execution. If this flag is 50, then the filter.info file is re-written every 50 molecules. [default = 5000]
-pkanorm
This boolean flag determines whether compounds will be modified to reflect a pH=7.4 model. Notice, this will modify the molecules permanently. [default = true]
-normalize
This flag indicates an optional SMIRKS file. This file should contain the set of reactions you wish to use to normalize the connection table of your molecules. Please note: These reactions are applied before the filtering process and can significantly slow the filtering process. [default = null]
-salt
This flag specifies a molecule file that you consider to be salts. If any molecular entries contain multiple disconnected fragments, then any fragment contained in the “salt” file will be removed. If no file is specified, or if there are multiple disconnected fragments in a molecule record that are not in the salt file, then the first largest remaining fragment will be retained and all others discarded. [default = null]
-sdtag
This boolean flag indicates whether you want the molecular properties used for the filtering run (see -filter) to be attached to output molecules as SD tag data. This parameter will only work for .sdf or .oeb formats. [default = false]
-dots
Boolean flag that determines whether FILTER writes a single dot (.) to the terminal (stdout) for every 500 compounds that are processed.
-unique
This flag accepts the name of a file that contains molecules for FILTER to skip. Only unique molecules that do not appear in this file will be sent to the output. Molecules are checked for uniqueness after FILTER has processed them, therefore a parameter such as -pkanorm could change an input structure from a duplicate to a unique molecule.

Example Executions

This section has a series of example FILTER command-line executions. Each example is followed by a brief description of its behavior.

prompt> filter drugs.smi drugs.oeb.gz
prompt> filter -in drugs.smi -out drugs.oeb.gz

These two commands will yield identical results. These execute FILTER with the default parameters. The file drugs.smi is opened in SMILES format for input, and the output is written to the file drugs.oeb.gz in gzipped OEBinary version 2 format.

prompt> filter -in drugs.smi -out drugs.sdf -filter myfilter

This command is the same as the one above except for the -filter flag. It executes FILTER with the parameters found in the myfilter file. The file drugs.smi is opened in SMILES format for input, and the output is written to the file drugs.oeb.gz in gzipped OEBinary version 2 format.

prompt> filter -param myparameters drugs.smi drugs.oeb.gz
prompt> filter drugs.smi drugs.oeb.gz -param myparameters

The first of these two commands will yield exactly the same results as the example above. The file drugs.smi will be mapped to the -in flag and drugs.oeb.gz will be mapped to the -out flag being the second to last and last command-line arguments respectively. Unfortunately, the second of these two commands, will fail to parse because the implicit input and output arguments are not the final two arguments in the list.

prompt> filter -in drugs.smi -out drugs.oeb.gz -table drugs.table

This executes FILTER on the file drugs.smi and writes molecules that pass the filter to the file drugs.oeb.gz. It also writes the all of the filter data to the tab-separated value file drugs.table.

prompt> cat maybridge.05-1.sdf |filter -in .sdf -out .ism|omega .ism m.oeb.gz

This command presumes that you have an SD format file called maybridge.05-1.sdf. That file is piped to the FILTER program. The -in .sdf flag indicates that FILTER should read .sdf format from std::in. Since no -filter flag is specified, the default filter will be used. The -out .ism flag indicates FILTER will write isomeric smiles format to std::out. The output would then be piped into OMEGA.

prompt> filter -in drugs.smi -out drugs.oeb.gz -select "[N;$(*-a)]"

This command will filter the compounds in drugs.smi with the default filter and write the output to drugs.oeb.gz. It also requires that molecules contain exactly one instance of the aniline substructure defined by the SMARTS pattern “[N;$(*-a)]”.

Table Of Contents

Previous topic

Filtering Theory

Next topic

Filter Preprocessing