# Filter Files¶

There are two parameter files a user can provide if they would like to override or augment the default parameter sets. The first is the “filter file”. It provides acceptable limits for all of the physical properties and functional groups in the default filter. The second is the “newrule file”. If you have a filter you like, but would like to augment it with a set of additional rules, these can be added with a newrule file.

There are four types of statements that can occur in a filter file:

• physical property limits
• rules
• new rules
• selections

The statements should occur one-per-line in the filter file.

Note

If the appropriate line is not in the filter file, or the value is false, the respective measure will not be used in filtering and its value will not be included in any table-based output.

## Physical Property Limits¶

There are a large number of physical property limits. They occur as three fields on a line. For example:

MIN_HETEROATOMS 2 "Minimum number of heteroatoms"


The first field is the property keyword, the second field is the value assigned to that keyword, and the third field is a brief informational message. There are a fixed number of physical property keywords. No additional physical property keywords can be added by the user. The current keywords and brief definitions of each are listed below.

Hint

The values listed below are those found in the default BlockBuster filter.

### Basic Properties¶

#### Molecular Weight¶

Isotopic molecular weight

MIN_MOLWT 130 "Minimum molecular weight"
MAX_MOLWT 781 "Maximum molecular weight"


#### Heavy Atom Count¶

Number of non-hydrogen atoms

MIN_NUM_HVY 9 "Minimum number of heavy atoms"
MAX_NUM_HVY 55 "Maximum number of heavy atoms"


#### Carbon Count¶

Number of carbons

MIN_CARBONS 3 "Minimum number of carbons"
MAX_CARBONS 41 "Maximum number of carbons"


#### Hetero-Count¶

Number of non-carbon and non-hydrogen atoms

MIN_HETEROATOMS 1 "Minimum number of heteroatoms"
MAX_HETEROATOMS 14 "Maximum number of heteroatoms"


#### Hetero-Atom to Carbon Ratio¶

Hetero-count/carbon-count

MIN_Het_C_Ratio 0.04 "Minimum heteroatom to carbon ratio"
MAX_Het_C_Ratio 4.0 "Maximum heteroatom to carbon ratio"


#### Chiral Count¶

Number of chiral atoms

MIN_CHIRAL_CENTERS 0 "Minimum chiral centers"
MAX_CHIRAL_CENTERS 21 "Maximum chiral centers"


#### Hydrogen-bond Acceptors¶

Number of atoms which match any of the following:

• degree 2, aromatic, non-positive nitrogens. An example is shown below.
• electron rich or negative, valence less than 4, non-aromatic nitrogens. An example is shown below.
• negatively charged or not electron withdrawn, neutral, non-aromatic oxygens. An example is shown below.
• degree 1, double bonded, electron rich, non-aromatic sulfur. An example is shown below.

This definition is from the work of Mills and Dean ([MillsDean-1996]) and also the book by Jeffrey ([Jeffrey-1997]).

MIN_HBOND_ACCEPTORS 0 "Minimum number of hydrogen-bond acceptors"
MAX_HBOND_ACCEPTORS 13 "Maximum number of hydrogen-bond acceptors"


#### Hydrogen-bond Donors¶

Number of hydrogen atoms on nitrogen, oxygen, or sulfur atoms. This definition is from the work of Mills and Dean ([MillsDean-1996]) and also the book by Jeffrey ([Jeffrey-1997]).

MIN_HBOND_DONORS 0 "Minimum number of hydrogen-bond donors"
MAX_HBOND_DONORS 9 "Maximum number of hydrogen-bond donors"


#### Lipinski Acceptors¶

Number of nitrogens or oxygens. This definition is from the work of Lipinski ([Lipinski-1997]).

MIN_LIPINSKI_ACCEPTORS 1 "Minimum number of oxygen & nitrogen atoms"
MAX_LIPINSKI_ACCEPTORS 14 "Maximum number of oxygen & nitrogen atoms"


#### Lipinski Donors¶

Number of nitrogens and oxygens with at least one hydrogen attached. This definition is from the work of Lipinski ([Lipinski-1997]).

MIN_LIPINSKI_DONORS 0 "Minimum number O & N atoms with hydrogens"
MAX_LIPINSKI_DONORS 6 "Maximum number O & N atoms with hydrogens"


#### Halide Fraction¶

Percent of molecular weight from halides

MIN_HALIDE_FRACTION 0.0 "Minimum Halide Fraction"
MAX_HALIDE_FRACTION 0.66 "Maximum Halide Fraction"


#### Formal Count¶

Number of atoms with a formal charge (excludes dative)

MIN_COUNT_FORMAL_CRG 0 "Minimum number formal charges"
MAX_COUNT_FORMAL_CRG 4 "Maximum number of formal charges"


#### Formal Sum¶

Total formal charge

MIN_SUM_FORMAL_CRG -2 "Minimum sum of formal charges"
MAX_SUM_FORMAL_CRG 2 "Maximum sum of formal charges"


#### Connected Non-Ring¶

Considers sets of contiguous (bonded) non-ring atoms

MIN_CON_NON_RING 0 "Minimum number of connected non-ring atoms"
MAX_CON_NON_RING 19 "Maximum number of connected non-ring atoms"


#### Unbranched Chains¶

The size of the longest chain of either heavy atoms or of all carbons. An unbranched atom is an atom with at most two connections to other heavy atoms and is not in a ring. A set of unbranched atoms which are connected together form a chain. A molecule may contain multiple chains which are isolated from each other by non-chain atoms (e.g. ring or branched atoms). The longest chain of heavy atoms and the longest chain of carbons are identified, and the MIN and MAX parameter filters are applied.

MIN_UNBRANCHED 1 "Minimum number of connected unbranched non-ring atoms"
MAX_UNBRANCHED 13 "Maximum number of connected unbranched non-ring atoms"
MIN_UNBRANCHED_C 0 "Minimum number of connected unbranched non-ring carbon"
MAX_UNBRANCHED_C 6 "Maximum number of connected unbranched non-ring carbon"


#### Total Functional Group Count¶

Total number of functional groups. Does not count any ring-systems as functional groups. Degree 1 heteroatoms, particularly those with double bonds or dative bonds are considered part of ring systems and do not count as a functional group.

MIN_FCNGRP 0 "Minimum number of functional groups"
MAX_FCNGRP 7 "Maximum number of functional groups"


Note

This is different then the functional group rules.

#### Ring Systems¶

Number of ring systems (contiguous systems of ring atoms and bonds)

MIN_RING_SYS 0 "Minimum number of ring systems"
MAX_RING_SYS 5 "Maximum number of ring systems"


#### Ring Size¶

Maximum size of any single ring system

MIN_RING_SIZE 0 "Minimum atoms in any ring system"
MAX_RING_SIZE 20 "Maximum atoms in any ring system"


#### Rotor Count¶

Number of rotatable bonds. Allows optional adjustment for aliphatic rings following the method of [Oprea-2000].

MIN_ROT_BONDS 0 "Minimum number of rotatable bonds"
MAX_ROT_BONDS 16 "Maximum number of rotatable bonds"
ADJUST_ROT_FOR_RING true "BOOLEAN for whether to estimate degrees of freedom in rings"


#### Rigid Count¶

Number of rigid bonds (non-rotatable bonds)

MIN_RIGID_BONDS 4 "Minimum number of rigid bonds"
MAX_RIGID_BONDS 55 "Maximum number of rigid bonds"


#### Unspecified Atom Stereo¶

Number of unspecified atom stereos

MIN_UNSPECIFIED_ATOM_STEREOS 0 "Minimum number of unspecified atom stereos"
MAX_UNSPECIFIED_ATOM_STEREOS 2 "Maximum number of unspecified atom stereos"


Note

MIN_UNSPECIFIED_ATOM_STEREOS and MAX_UNSPECIFIED_ATOM_STEREOS are not used for filtering in the default filters.

#### Unspecified Bond Stereo¶

Number of unspecified bond stereos

MIN_UNSPECIFIED_BOND_STEREOS 0 "Minimum number of unspecified bond stereos"
MAX_UNSPECIFIED_BOND_STEREOS 2 "Maximum number of unspecified bond stereos"


Note

MIN_UNSPECIFIED_BOND_STEREOS and MAX_UNSPECIFIED_BOND_STEREOS are not used for filtering in the default filters.

#### Atom Type Checks¶

Check the validity of atom charges, valences or MMFF atom types for the entire molecule.

TYPECHECK     false "Screen for unusual valences or charges"
MMFFTYPECHECK false "Screen for atoms with unknown MMFF atom types"


### LogP¶

The logP calculation is a derivative of the published XLogP algorithm [Wang-1997-2]., but reparameterized without the dependence on 3D coordinates or the SYBYL/Mol2 aromaticity model.

#### XLogP¶

Calculated LogP

MIN_XLOGP -3.0 "Minimum XLogP"
MAX_XLOGP 6.85 "Maximum XLogP"


### Solubility¶

The solubility predictions are based on using the atom-types from the XLogP algorithm, [Wang-1997-2], and reparameterizing them based on available solubility data. Rather than a quantitative cutoff, solubility uses categories. The 6 allowable categories are:

1. insoluble
2. poorly
3. moderately
4. soluble
5. very
6. highly

These categories are keywords used in the filter files as follows.

#### Solubility¶

Calculated solubility class

MIN_SOLUBILITY insoluble "Minimum solubility"


### Pharmacokinetic Predictors¶

Several secondary filters that are built upon published combinations of simpler properties are available.

Note

All of these properties are used for filtering in the default filters.

### Anionic Carbon Count¶

Number of anionic carbons

MIN_ANION_C 0 "Minimum number of anionic carbons"
MAX_ANION_C 2 "Maximum number of anionic carbons"


Note

MIN_ANION_C and MAX_ANION_C are not used for filtering in the default filters.

#### Lipinski Violations¶

Number of allowable Lipinski violations. A single Lipinski violation is considered acceptable. The published work, [Lipinski-1997], allows compounds to pass with a single violation but not multiple violations.

MAX_LIPINSKI 3 "Maximum number of Lipinski violations"


The Lipinski theory section in the Molecular Properties and Predictors chapter.

#### PSA¶

Peter Ertl’s, [Ertl-2000], topological polar surface area (phosphorus and sulfur area is optional).

PSA_USE_SandP false "Count S and P as polar atoms"
MIN_2D_PSA 0.0 "Minimum 2-Dimensional (SMILES) Polar Surface Area"
MAX_2D_PSA 205.0 "Maximum 2-Dimensional (SMILES) Polar Surface Area"


The PSA theory section in the Molecular Properties and Predictors chapter.

#### GSK/Veber¶

Veber’s measure of bioavailability (PSA > 140 or Rotatable bonds >10). [Veber-2002].

GSK_VEBER false "PSA>140 or >10 rot bonds"


#### Abbott/Martin¶

Yvonne Martin’s Abbott Bioavailability Score. This is reported as a probability that F>10% in rats. [Martin-2005]

MIN_ABS 0.11 "Minimum probability F>10% in rats"


#### Pharmacopia/Egan¶

Egan egg measure of bioavailability (LogP >5.88 or PSA > 131.6). [Egan-2000]

PHARMACOPIA false "LogP > 5.88 or PSA > 131.6"


### Aggregators¶

Aggregators are small molecules that can interfere with assay results by sequestering protein in an aggregation of small molecules in solution. They appear to have activity in many assays, but in fact are usually not specific inhibitors of the protein in question. Includes two measures of whether a molecule is one of the aggregators defined by Shoichet et. al. [McGovern-2003] [Seidler-2003] The first measure, AGGREGATORS, is whether the molecule is an exact match to one of the approximately 400 published aggregators. The second measure, PRED_AGG, is whether the molecule hits in Shoichet’s QSAR model for predicting aggregators.

#### Aggregators¶

Whether a compound is known or predicted to aggregate in concentrations common in virtual screening.

AGGREGATORS true "Eliminate known aggregators"
PRED_AGG false "Eliminate predicted aggregators"


### Elemental Filters¶

The elemental filters are applied in this order:

1. Test for the existence of any of the metals in the ELIMINATE_METALS filter in the molecule.
2. Remove salts by stripping away all the disconnected components except for the largest.
3. Test to make sure only atoms specified in ALLOWED_ELEMENTS filter are in the molecule.

The format of the two elemental filter fields is the keyword followed by a comma delimited list of atomic symbols.

#### Eliminate Metals¶

Any molecule with the atoms indicated in ELIMINATE_METALS fail to pass the filter.

ELIMINATE_METALS Sc,Ti,V,Cr,Mn,Fe,Co,Ni,Cu,Zn,Y,Zr,Nb,Mo,Tc,Ru,Rh,Pd,Ag,Cd


#### Allowed Elements¶

Molecules with atoms other than those specified by ALLOWED_ELEMENTS fail to pass the filter.

ALLOWED_ELEMENTS H,C,N,O,F,P,S,Cl,Br,I


## Functional Group Rules¶

Rules statements set the limits for the maximum number of the specified type of functional group that may be allowed in the molecule.

The first field of a rule statement is the word RULE in all capital letters. The second field is a number indicating the maximum number of the group allowed in a molecule. The third field is the functional group keyword. Functional-group keywords are case sensitive.

RULE 0 acid_halide


The following is a list of functional groups which filter recognizes by default. Three example matches are provided with the atoms that correspond to each other highlighted.

Note

Due to the highly complex nature of the patterns, in particular recursive SMARTS, it is not possible to fully highlight every atom that was included as part of the match.

## New Rules¶

New rules specify additional functional groups or substructures that may be used. They must specify a substructure definition in the form of a SMARTS in addition to the substructure name and maximum limit. For example:

NEWRULE norbornane 1 C1CC2CCC1C2


The first field is the NEWRULE keyword. The second field defines the name associated with the substructure (primarily for logging purposes). The third field indicates the maximum number of the substructure that can be allowed. The fourth field is the SMARTS string for the substructure, norbornane in this case. This example rule would indicate that molecules with a single norbornane substructure would be allowable, but that those with 2 or more norbornanes would be eliminated.

New rules that have a name that is identical with one of the original rules take precedence over the original rule.

## Selection Statements¶

The select statement allows a filter file to specify the required number of substructures in order to be able to pass the filter. These statements are similar to new rules except that they list a required range for passing the filter rather than the range for failing to pass the filter. For example:

SELECT amine 1 1 [N;!$(*-*[!#6;!#1]);!$(*-a);!\$(*=,#*)]


The first field is the SELECT keyword. The second field indicates the name for the selection (again for logging purposes). The third field is the minimum number of substructures required to be in the molecule. The fourth field is the maximum number of substructures allowed in the molecule. The fifth field is the substructure defined by a SMARTS pattern. The example requires that molecules contain exactly one amine. Currently, only a single SELECT statement is allowed in the filter file. Any complex boolean substructure statements can be incorporated directly into the SMARTS. If multiple SELECT statements occur in a filter file, only the final one will be applied.