Filter Preprocessing

Before the applying any of the molecular property filters a preprocessing step occurs that can alter the molecule significantly to fit the criteria needed for most modeling applications. This filtering preprocessing step is a precisely defined series of stages that occur on the molecule in the following order:

Metal Removal
Salt Removal
Canonicalization
pKa Normalization
Normalization
Reagent Selection
Type Checking
MMFF94 Atom Type Checking

Metal Removal

Metal removal is the first stage of elemental based filtering. This stage will remove specified metal complexes from the molecule. It will not reject a molecule for having a metal complex. This allows the filter to treat atoms in the counter-ion portion of a molecule separately from the atoms in the primary portion of the molecular record.

For instance, this allows organic molecules that are complexed with silver to be eliminated based on their metal chelate even though they themselves are acceptable while at the same time eliminating a sulfate counter-ion from another molecule before it leads to elimination of the acceptable cationic molecule.

Salt Removal

This step deletes all atoms that are not part of the largest connected component of a compound. This effectively eliminates all non-covalently bound portions of the compound.

Canonicalization

This step canonicalizes the atom and bond order of the parts of the molecule that are left after the previous removal steps. This is necessary to avoid different atom orderings producing slightly different normalizations in the following normalization steps.

pKa Normalization

pKa Normalization uses a rule-based system to set the ionization state of input molecules. If pKa normalization is turned on, the molecule is set to its most energetically favorable ionization state for pH=7.4. The rule-based nature of this calculation allows it to be very fast. Further, despite being rule-based, this approach takes into account many secondary charge interactions.

While more advanced levels of theory can be found for predicting ionization states, this method is very well suited to virtual-screening database preparation. However, this may not be appropriate for hit-to-lead or lead optimization.

Normalization

In addition to pKa normalization, MolProp TK allows any number of additional molecular normalizations. Since normalizations are usually specific to a particular company or site, MolProp TK provides the ability for users to input normalizations, such as the nitro tautomer state, but does not provide default implementations.

Reagent Selection

Reagent selection for small linear library synthesis or large combinatorial library synthesis is still a necessary task at many pharmaceutical companies. For a user hoping to identify a set of acyl-halide reagents, they can specify a selection parameter to require that each compound have exactly one acyl-halide. In addition they might want to modify the filter to exclude functional groups (such as primary amines) that may be acceptable for typical lead-like molecules, but are not acceptable for the specific reagent the user has in mind.

Therefore, the selection parameter is the reverse of a filtering parameter. The molecule must include the given substructure in order to pass the filter.

Type Checking

This checks the valence state and formal charge of the entire molecule. The check identifies molecules that are poorly specified, or represent nonsensical chemical states, often from corrupt input data. For example, an oxygen with eight hydrogens attached or a carbon with a +9 formal charge would be rejected.

MMFF94 Atom Type Checking

This checks that all the atoms of the molecule have valid MMFF94 atom type assignments. The check identifies molecules that will fail downstream processing that depends on MMFF94 atom types (e.g. Omega).