Valence Models

This section describes the three valence models currently implemented by OEChem.

  1. For molecules that have fully specified formal charges, the MDL valence model may be used to assign hydrogen counts. (See section MDL Valence Model.)
  2. For molecules that have fully specified hydrogen counts, the OpenEye ‘’charge’’ model may be used to assign formal charges. (See section OpenEye Charge Model.)
  3. Finally, for molecules with neither formal charges nor hydrogen counts, OEChem uses the OpenEye hydrogen count model to assign both hydrogen counts and formal charges. (See section OpenEye Hydrogen Count Model.)

MDL Valence Model

The MDL valence model was developed for MDL for allowing hydrogen counts to be implicit in MDL SD and MOL file formats. It assumes that the bond orders to an atom are specified (explicit valence), and that the atomic number and formal charge are correctly set. The MDL valence model then prescribes the number of implicit hydrogens on a particular atom. The following table shows the MDL valence model as implemented in OEChem.

MDL valence model
At# [1] Symbol -3 -2 -1 0 +1 +2 +3 +4 +5
1 H 0 0 0 1 0 0 0 0 0
3 Li 0 0 0 1 0 0 0 0 0
4 Be 0 0 0 2 1 0 0 0 0
5 B 2 3,5 4 3 2 1 0 0 0
6 C 1 2 3,5 4 3 2 1 0 0
7 N 0 1 2 3,5 4 3 2 1 0
8 O 0 0 1 2 3,5 4 3 2 1
9 F 0 0 0 1 2 3,5 4 3 2
11 Na 0 0 0 1 0 0 0 0 0
12 Mg 0 0 0 2 1 0 0 0 0
13 Al 2,4,6 3,5 4 3 2 1 0 0 0
14 Si 1,3,5, 2,4,6 3,5 4 3 2 1 0 0
15 P 0 1,3,5,7 2,4,6 3,5 4 3 2 1 0
16 S 0 0 1,3,5,7 2,4,6 3,5 4 3 2 1
17 Cl 0 0 0 1,3,5,7 2,4,6 3,5 4 3 2
19 K 0 0 0 1 0 0 0 0 0
20 Ca 0 0 0 2 1 0 0 0 0
31 Ga 2,4,6 3,5 4 3 0 1 0 0 0
32 Ge 1,3,5,7 2,4,6 3,5 4 3 0 1 0 0
33 As 0 1,3,5,7 2,4,6 3,5 4 3 0 1 0
34 Se 0 0 1,3,5,7 2,4,6 3,5 4 3 0 1
35 Br 0 0 0 1,3,5,7 2,4,6 3,5 4 3 0
37 Rb 0 0 0 1 0 0 0 0 0
38 Sr 0 0 0 2 1 0 0 0 0
49 In 2,4,6 3,5 2,4 3 0 1 0 0 0
50 Sn 1,3,5,7 2,4,6 3,5 2,4 3 0 1 0 0
51 Sb 0 1,3,5,7 2,4,6 3,5 2,4 3 0 1 0
52 Te 0 0 1,3,5,7 2,4,6 3,5 2,4 3 0 1
53 I 0 0 0 1,3,5,7 2,4,6 3,5 2,4 3 0
55 Cs 0 0 0 1 0 0 0 0 0
56 Ba 0 0 0 2 1 0 0 0 0
81 Tl 2,4,6 3,5 2,4 1,3 0 0 0 0 0
82 Pb 1,3,5,7 2,4,6 3,5 2,4 3 0 1 0 0
83 Bi 0 1,3,5,7 2,4,6 3,5 2,4 3 0 1 0
84 Po 0 0 1,3,5,7 2,4,6 3,5 2,4 3 0 1
85 At 0 0 0 1,3,5,7 2,4,6 3,5 2,4 3 0
87 Fr 0 0 0 1 0 0 0 0 0
88 Ra 0 0 0 2 1 0 0 0 0

Table footnote:

[1] All the remaining elements not listed are assumed to have no implicit hydrogens.

See also

The following functions use the MDL valence model:

OpenEye Valence Models

OpenEye Charge Model

The OpenEye formal charge model assigns formal charges to elements based upon their total valence. In OEChem, this functionality is invoked by the OEAssignFormalCharges function. If the formal charge on an atom is non-zero, it is left unchanged.

Hydrogen
If the valence isn’t one, the formal charge is +1.
Boron
If the valence is four, the formal charge is +1.
Carbon
If the valence is three, the formal charge is +1 if the atom has a polar neighbor, i.e. N, O or S, and formal charge -1 otherwise.
Nitrogen
If the valence is two, the formal charge is -1, and if the valence is four the formal charge is +1.
Oxygen
If the valence is one, the formal charge is -1, and if the valence is three the formal charge is +1.
Phosphorous
If the valence is four, the formal charge is +1.
Sulfur
If the valence is 1, the formal charge is -1, if the valence is three the formal charge is +1, if the valence is 5, the formal charge is -1, if the valence is four and the degree is four the charge is +2.
Chlorine
If the valence is 0 the formal charge is -1, if the valence is four the formal charge is +3.
Fluorine, Bromine, Iodine
If the valence is zero, the formal charge is -1.
Magnesium, Calcium, Zinc
If the valence is zero, the formal charge is +2.
Lithium, Sodium, Potassium
If the valence is zero, the formal charge is +1.
Iron
If the valence is zero, the formal charge is +3 if the partial charge is 3.0, and +2 otherwise.
Copper
If the valence is zero, the formal charge is +2 if the partial charge is 2.0, and +1 otherwise.

For the remaining elements, if the valence of an atom is zero, its formal charge is set from its partial charge.

OpenEye Hydrogen Count Model

OpenEye‘s hydrogen count valence model is used by OEChem when neither hydrogen counts nor valence are specified. The typical uses are reading molecules from PDB or XYZ format files without explicit hydrogens. This functionality is invoked by OEAssignImplicitHydrogens, which must always be followed by a call to OEAssignFormalCharges. This valence model is unique in that it only partially updates hydrogen counts, assuming the unfilled valences will be corrected by OpenEye‘s charge valence model above. In MDL’s model for example, a neutral sodium atom is assumed to have one implicit hydrogen, i.e. sodium hydride instead of sodium metal. In OpenEye‘s hydrogen count valence model, a disconnected sodium atom is assumed to be a sodium cation, [Na+]. When reading from PDB files, this is a very reasonable assumption.

Note that although the OpenEye hydrogen count valence model often sets charge and protonation states to physiological conditions, it is neither intended to be a pKa nor ionization state predictor. Instead, it is a normalization. Much like many registry systems and the MDL valence model which will convert C(=O)[O-] to C(=O)O for registration purposes, this valence model converts the opposite direction to C(=O)[O-].

  • Carbon is always assumed to be at least four valent, and therefore neutral.
  • Nitrogens that are conjugated (have double bonds, or have neighbors that have double bonds, in their Kekulé representations) are assumed at least three valent and neutral, whilst all other nitrogens are assumed to be (minimum) four valent, with a +1 formal charge.
  • Oxygens are assumed to be at least two valent and neutral, unless they have a single bond to an atom that is doubly bonded to oxygen, in which case its assumed to be one valent, with a -1 formal charge.
  • Sulfur is assumed to always be at least two valent.

All other elements are assumed to have no implicit hydrogens, and the formal charge as specified by the OpenEye charge model. This models all disconnected halogens as halide anions, and when disconnected the metals listed above as cations.

These rules are sufficient to reasonably protonate proteins read from PDB files. However, as described above, these rules are not intended to be a comprehensive rule-based pKa predictor. Users interested in predicting physiological ionization, and protonation/disassociation state enumeration should contact OpenEye Scientific Software (http://www.eyesopen.com/) about our tools for exactly this task.