Filter Files¶
There are two parameter files a user can provide if they would like to override or augment the default parameter sets. The first is the “filter file”. It provides acceptable limits for all of the physical properties and functional groups in the default filter. The second is the “newrule file”. If you have a filter you like, but would like to augment it with a set of additional rules, these can be added with a newrule file.
There are four types of statements that can occur in a filter file:
physical property limits
rules
new rules
selections
The statements should occur one-per-line in the filter file.
Note
If the appropriate line is not in the filter file, or the value is false, the respective measure will not be used in filtering and its value will not be included in any table-based output.
Physical Property Limits¶
There are a large number of physical property limits. They occur as three fields on a line. For example:
MIN_HETEROATOMS 2 "Minimum number of heteroatoms"
The first field is the property keyword, the second field is the value assigned to that keyword, and the third field is a brief informational message. There are a fixed number of physical property keywords. No additional physical property keywords can be added by the user. The current keywords and brief definitions of each are listed below.
Hint
The values listed below are those found in the default BlockBuster filter.
Basic Properties¶
Molecular Weight¶
Isotopic molecular weight
MIN_MOLWT 130 "Minimum molecular weight"
MAX_MOLWT 781 "Maximum molecular weight"
Heavy Atom Count¶
Number of non-hydrogen atoms
MIN_NUM_HVY 9 "Minimum number of heavy atoms"
MAX_NUM_HVY 55 "Maximum number of heavy atoms"
Carbon Count¶
Number of carbons
MIN_CARBONS 3 "Minimum number of carbons"
MAX_CARBONS 41 "Maximum number of carbons"
Hetero-Count¶
Number of non-carbon and non-hydrogen atoms
MIN_HETEROATOMS 1 "Minimum number of heteroatoms"
MAX_HETEROATOMS 14 "Maximum number of heteroatoms"
Hetero-Atom to Carbon Ratio¶
Hetero-count/carbon-count
MIN_Het_C_Ratio 0.04 "Minimum heteroatom to carbon ratio"
MAX_Het_C_Ratio 4.0 "Maximum heteroatom to carbon ratio"
Chiral Count¶
Number of chiral atoms
MIN_CHIRAL_CENTERS 0 "Minimum chiral centers"
MAX_CHIRAL_CENTERS 21 "Maximum chiral centers"
Hydrogen-bond Acceptors¶
Number of atoms which match any of the following:
degree 2, aromatic, non-positive nitrogens
electron rich or negative, valence less than 4, non-aromatic nitrogens
negatively charged or not electron withdrawn and neutral oxygens
degree 1, double bonded, electron rich sulfur
MIN_HBOND_ACCEPTORS 0 "Minimum number of hydrogen-bond acceptors"
MAX_HBOND_ACCEPTORS 13 "Maximum number of hydrogen-bond acceptors"
Hydrogen-bond Donors¶
Number of hydrogen atoms on nitrogen, oxygen, or sulfur atoms
MIN_HBOND_DONORS 0 "Minimum number of hydrogen-bond donors"
MAX_HBOND_DONORS 9 "Maximum number of hydrogen-bond donors"
Lipinski Acceptors¶
Number of nitrogens or oxygens
MIN_LIPINSKI_ACCEPTORS 1 "Minimum number of oxygen & nitrogen atoms"
MAX_LIPINSKI_ACCEPTORS 14 "Maximum number of oxygen & nitrogen atoms"
Lipinski Donors¶
Number of nitrogens and oxygens with at least one hydrogen attached
MIN_LIPINSKI_DONORS 0 "Minimum number O & N atoms with hydrogens"
MAX_LIPINSKI_DONORS 6 "Maximum number O & N atoms with hydrogens"
Halide Fraction¶
Percent of molecular weight from halides
MIN_HALIDE_FRACTION 0.0 "Minimum Halide Fraction"
MAX_HALIDE_FRACTION 0.66 "Maximum Halide Fraction"
Formal Count¶
Number of atoms with a formal charge (excludes dative)
MIN_COUNT_FORMAL_CRG 0 "Minimum number formal charges"
MAX_COUNT_FORMAL_CRG 4 "Maximum number of formal charges"
Formal Sum¶
Total formal charge
MIN_SUM_FORMAL_CRG -2 "Minimum sum of formal charges"
MAX_SUM_FORMAL_CRG 2 "Maximum sum of formal charges"
Connected Non-Ring¶
Considers sets of contiguous (bonded) non-ring atoms
MIN_CON_NON_RING 0 "Minimum number of connected non-ring atoms"
MAX_CON_NON_RING 19 "Maximum number of connected non-ring atoms"
Unbranched Chains¶
The size of unbranched non-ring chains
MIN_UNBRANCHED 1 "Minimum number of connected unbranched non-ring atoms"
MAX_UNBRANCHED 13 "Maximum number of connected unbranched non-ring atoms"
Total Functional Group Count¶
Total number of functional groups. Does not count any ring-systems as functional groups. Degree 1 heteroatoms, particularly those with double bonds or dative bonds are considered part of ring systems and do not count as a functional group.
MIN_FCNGRP 0 "Minimum number of functional groups"
MAX_FCNGRP 7 "Maximum number of functional groups"
Note
This is different than the functional group rules.
Ring Systems¶
Number of ring systems (contiguous systems of ring atoms and bonds)
MIN_RING_SYS 0 "Minimum number of ring systems"
MAX_RING_SYS 5 "Maximum number of ring systems"
Ring Size¶
Maximum size of any single ring system
MIN_RING_SIZE 0 "Minimum atoms in any ring system"
MAX_RING_SIZE 20 "Maximum atoms in any ring system"
Rotor Count¶
Number of rotatable bonds. Allows optional adjustment for aliphatic rings following the method of [Oprea-2000].
MIN_ROT_BONDS 0 "Minimum number of rotatable bonds"
MAX_ROT_BONDS 16 "Maximum number of rotatable bonds"
ADJUST_ROT_FOR_RING true "BOOLEAN for whether to estimate degrees of freedom in rings"
Rigid Count¶
Number of rigid bonds (non-rotatable bonds)
MIN_RIGID_BONDS 4 "Minimum number of rigid bonds"
MAX_RIGID_BONDS 55 "Maximum number of rigid bonds"
LogP¶
The logP calculation is a derivative of the published XLOGP algorithm [Wang-R-1997] but is reparameterized without the dependence on 3D coordinates or the SYBYL/Mol2 aromaticity model.
Solubility¶
The solubility predictions are based on using the atom types from the XLOGP algorithm, [Wang-R-1997] and reparameterizing them based on available solubility data. Rather than a quantitative cutoff, the solubility uses categories. The six allowable categories are:
insoluble
poorly
moderately
soluble
very
highly
These categories are keywords used in the filter files as follows.
Pharmacokinetic Predictors¶
Several secondary filters that are built upon published combinations of simpler properties are available.
Note
All of these properties are used for filtering in the default filters.
Lipinski Violations¶
Number of allowable Lipinski violations. A single Lipinski violation is considered acceptable. The published work, [Lipinski-1997], allows compounds to pass with a single violation but not multiple violations.
MAX_LIPINSKI 3 "Maximum number of Lipinski violations"
See also
The Lipinski theory section in the Molecular Properties and Predictors chapter.
PSA¶
Peter Ertl’s, [Ertl-2000], topological polar surface area (phosphorus and sulfur area is optional).
PSA_USE_SandP false "Count S and P as polar atoms"
MIN_2D_PSA 0.0 "Minimum 2-Dimensional (SMILES) Polar Surface Area"
MAX_2D_PSA 205.0 "Maximum 2-Dimensional (SMILES) Polar Surface Area"
See also
The PSA theory section in the Molecular Properties and Predictors chapter.
GSK/Veber¶
Veber’s measure of bioavailability (PSA > 140 or Rotatable bonds >10). [Veber-2002].
GSK_VEBER false "PSA>140 or >10 rot bonds"
Abbott/Martin¶
Yvonne Martin’s Abbott Bioavailability Score. This is reported as a
probability that F>10%
in rats. [Martin-2005]
MIN_ABS 0.11 "Minimum probability F>10% in rats"
Pharmacopia/Egan¶
Egan egg
measure of bioavailability (LogP >5.88 or PSA >
131.6). [Egan-2000]
PHARMACOPIA false "LogP > 5.88 or PSA > 131.6"
Aggregators¶
Aggregators are small molecules that can interfere with assay results
by sequestering protein in an aggregation of small molecules in
solution. They appear to have activity in many assays, but in fact are
usually not specific inhibitors of the protein in question. Includes
two measures of whether a molecule is one of the aggregators defined
by Shoichet et. al. [McGovern-2003] [Seidler-2003] The first
measure, AGGREGATORS
, is whether the molecule is an exact match to
one of the approximately 400 published aggregators. The second
measure, PRED_AGG
, is whether the molecule hits in Shoichet’s QSAR
model for predicting aggregators.
Aggregators¶
Whether a compound is known or predicted to aggregate in concentrations common in virtual screening.
AGGREGATORS true "Eliminate known aggregators"
PRED_AGG false "Eliminate predicted aggregators"
Elemental Filters¶
The elemental filters are applied in this order:
Test for the existence of any of the metals in the
ELIMINATE_METALS
filter in the molecule.Remove salts by stripping away all the disconnected components except for the largest.
Test to make sure only atoms specified in
ALLOWED_ELEMENTS
filter are in the molecule.
See also
The format of the two elemental filter fields is the keyword followed by a comma delimited list of atomic symbols.
Eliminate Metals¶
Any molecule with the atoms indicated in ELIMINATE_METALS
fail to
pass the filter.
ELIMINATE_METALS Sc,Ti,V,Cr,Mn,Fe,Co,Ni,Cu,Zn,Y,Zr,Nb,Mo,Tc,Ru,Rh,Pd,Ag,Cd
Allowed Elements¶
Molecules with atoms other than those specified by
ALLOWED_ELEMENTS
fail to pass the filter.
ALLOWED_ELEMENTS H,C,N,O,F,P,S,Cl,Br,I
Aromatic Ring Count¶
Uses the result of the function OEGetAromaticRingCount (see toolkit docs) to filter molecules based on their number of aromatic rings.
MIN_AROMATIC_RING_COUNT 1 "Minimum aromatic ring count"
MAX_AROMATIC_RING_COUNT 6 "Maximum aromatic ring count"
CSP3 Carbon Fraction¶
Uses the result of the function OEGetFractionCsp3 (see toolkit docs) to filter molecules based on their CSP3 carbon fraction.
MIN_FRACTION_CSP3 0.1 "Minimum fraction CSP3 carbons"
MAX_FRACTION_CSP3 0.5 "Maximum fraction CSP3 carbons"
Functional Group Rules¶
Rules statements set the limits for the maximum number of the specified type of functional group that may be allowed in the molecule.
The first field of a rule statement is the word RULE in all capital letters. The second field is a number indicating the maximum number of the group allowed in a molecule. The third field is the functional group keyword. Functional-group keywords are case sensitive.
RULE 0 acid_halide
The following is a list of functional groups which filter recognizes by default. Three example matches are provided with the atoms that correspond to each other highlighted.
Note
Due to the highly complex nature of the patterns, in particular recursive SMARTS, it is not possible to fully highlight every atom that was included as part of the match.
acetal¶
acid¶
acid_chloride¶
acid_halide¶
acyclic_NCN¶
acyclic_NS¶
acyl_cyanides¶
acylhydrazide¶
alcohol¶
aldehyde¶
alkene¶
alkyl¶
alkyl_halide¶
alkyl_phosphate¶
alkylaniline¶
alkylating_agent¶
alkyne¶
alphahalo_amine¶
alphahalo_ketone¶
amide¶
aminal¶
amine¶
amino_acid¶
anhydride¶
aniline¶
aniline_unsubstituted¶
arene¶
arenesulfonyl¶
aryl¶
aryl_halide¶
aryl_mono_BrI¶
azide¶
aziridine¶
azo¶
azocyanamides¶
base¶
benzyl_ether¶
benzyloxycarbonyl_CBZ¶
beta_azo_carbonyl¶
beta_carbonyl_quat_nitrogen¶
beta_halo_carbonyl¶
carbamate¶
carbamic_acid¶
carbodiimide¶
carbonate¶
carbonyl¶
carboxylic_acid¶
cation_C_Cl_I_P_or_S¶
charge¶
cyanohydrins¶
cycloheximide_derivatives¶
cyclopropyl¶
cytochalasin_derivatives¶
di_peptide¶
dioxane_6MR¶
dioxolane_5MR¶
disulfide¶
dithioacetal¶
dye¶
enamine¶
enol_ether¶
epoxide¶
ester¶
ether¶
fluorenylmethoxycarbonyl_Fmoc¶
guanidine¶
halide¶
halo_alkene¶
halo_amine¶
halopyrimidine¶
hemiacetal¶
hemiaminal¶
hemiketal¶
hetatm¶
hetero_hetero¶
HOBT_esters¶
hydrazine¶
hydrazone¶
hydroxamic_acid¶
hydroxyl¶
hydroxylamine¶
imidoyl_chlorides¶
imine¶
imino¶
iodine¶
iodoso¶
iodoxy¶
isocyanate¶
isonitrile¶
isothiocyanate¶
ketal¶
ketone¶
lactam¶
lactone¶
lawesson_s_reagent¶
long_aliphatic_chain¶
malonic¶
mercapto¶
methoxyethoxymethyl_MEM¶
methyl_ketone¶
michael_acceptor¶
monensin_derivatives¶
mono_alkene¶
mono_alkyne¶
nitrile¶
nitro¶
nitroso¶
N_methoyl¶
nonacylhydrazone¶
noxide¶
N_P_S_Halides¶
NS_beta_halothyl¶
nucleophile¶
organometallic¶
oxalyl¶
oxaziridine¶
oxime¶
oxygen_cation¶
paranitrophenyl_esters¶
pentafluorophenyl_esters¶
perhalo_ketone¶
peroxide¶
phenol¶
phosphanes¶
phosphinic_acid¶
phosphonamide¶
phosphonic_acid¶
phosphonic_ester¶
phosphonylnitrile¶
phosphoramides¶
phosphoranes¶
phosphoric_acid¶
phosphoric_ester¶
phosphoryl¶
phosphoryl¶
phthalimides_PHT¶
polyenes¶
primary_amine¶
propiolactones¶
pseudo_amine¶
quinone¶
ring¶
saponin_derivatives¶
SCN2¶
secondary_amine¶
squalestatin_derivatives¶
sulfide¶
sulfinimine¶
sulfinylthio¶
sulfonamide¶
sulfone¶
sulfonic_acid¶
sulfonic_ester¶
sulfonimine¶
sulfonyl_halide¶
sulfonylnitrile¶
sulfonylurea¶
sulfoxide¶
t_butyldimethylsilyl_TBDMS¶
t_butyldiphenylsilyl_TBDPS¶
t_butyl_ether¶
t_butoxycarbonyl_tBOC¶
terminal_vinyl¶
tertiary_amine¶
tetrahydropyran_THP¶
thioamide¶
thiocarbamate¶
thiocarbonyl¶
thioester¶
thiol¶
thiourea¶
triacyloxime¶
triazine¶
tricarbo_phosphene¶
triflates¶
triisopropylsilyl_TIPS¶
trimethylsilyl_TMS¶
unbranched_chain¶
urea¶
New Rules¶
New rules specify additional functional groups or substructures that may be used. They must specify a substructure definition in the form of a SMARTS in addition to the substructure name and maximum limit. For example:
NEWRULE norbornane 1 C1CC2CCC1C2
The first field is the NEWRULE
keyword. The second field defines
the name associated with the substructure (primarily for logging
purposes). The third field indicates the maximum number of the
substructure that can be allowed. The fourth field is the SMARTS
string for the substructure, norbornane in this case. This example rule
would indicate that molecules with a single norbornane substructure
would be allowable, but that those with 2 or more norbornanes would be
eliminated.
New rules that have a name that is identical with one of the original rules take precedence over the original rule.
Selection Statements¶
The select statement allows a filter file to specify the required number of substructures in order to be able to pass the filter. These statements are similar to new rules except that they list a required range for passing the filter rather than the range for failing to pass the filter. For example:
SELECT amine 1 1 [N;!$(*-*[!#6;!#1]);!$(*-a);!$(*=,#*)]
The first field is the SELECT
keyword. The second field indicates
the name for the selection (again for logging purposes). The third
field is the minimum number of substructures required to be in the
molecule. The fourth field is the maximum number of substructures
allowed in the molecule. The fifth field is the substructure defined
by a SMARTS pattern. The example requires that molecules contain
exactly one amine. Currently, only a single SELECT
statement is
allowed in the filter file. Any complex boolean substructure
statements can be incorporated directly into the SMARTS. If multiple
SELECT
statements occur in a filter file, only the final one will
be applied.