MolCharge Theory¶
Introduction¶
The assignment of appropriate atomic partial charges, both to small molecule ligands and to biopolymers (such as proteins and nucleic acids) is essential to getting meaningful results from any electrostatics calculation.
A molecule may be considered a collection of atomic nuclei and the electrons that surround them. The number of protons in each nucleus defines its atomic number/element. If the number of electrons exactly matches the number of protons in these nuclei, the molecule is neutral and has no net charge. If there are more electrons than protons, the molecule has a net negative charge, and if there are less, the molecule has a net positive charge.
It is both the atomic nuclei and the net charge that define the identity of a molecule. Indeed, this is a representation common to quantum chemistry. Adding or removing electrons (or atoms) from a molecule produces a different molecule.
In the discrete world of cheminformatics, valence bond theory allows the electrons present in a system to be represented in terms of bonds with formal bond orders, and formal charges assigned to particular atoms. The sum of the formal charges is equal to the net charge on the molecule, but which atoms are assigned which formal charges can be to some extent arbitrary due to resonance delocalization. In such cases the same molecule may be represented by similar connection tables, but with formal charges assigned to different sets of atoms.
For example, guanidinium may be expressed as either N[C+](N)N with the formal charge assigned to the carbon, or as [NH2+]=C(N)N with the formal charge assigned arbitrarily to one of the otherwise equivalent nitrogens. A similar example is a thiocarboxylate group, where either C(=O)[S-] or C(=S)[O-] are both equally appropriate representations of the same chemical functionality.
A zwitterion is an electrically neutral molecule that is represented as containing atoms with positive formal charge as well as atoms with negative formal charge.
Perhaps the most important fact to appreciate when considering formal charges on atoms is that they are all artificial constructs by chemists to accommodate a particular chemical model. A figment of a chemist’s fevered imagination. Like valence bond theory, they are an exceptionally useful and powerful discretized model of the universe. But as with any model of reality, it has its limitations. Formal charges, for all their numerous benefits to mankind, unfortunately, are not localized on an atom.
The limitations of describing formal charges with valence bond theory is apparent even within cheminformatics. Sydnones, for example, are a class of heterocyclic compound that cannot be written using normal covalent bonds without introducing and arbitrarily assigning both positive and negative charges. Similarly, in inorganic chemistry, the ditechnetium cation, \(\mbox{Te}_{2}^{+5}\), causes similar problems where the +5 formal charge cannot be assigned to both technetium atoms without breaking symmetry.
A better model, or approximation, of the wave function describing the distribution of electron density around a molecule is the use of atomic partial charges. A partial charge is a floating-point value assigned to each atomic center intended to model the distribution of electrons over a molecule.
Atomic partial charges are yet another approximation, much like the formal charges described above. However, partial charges provide a much better model to describe the electric field, dipole moment and other observable properties of a molecule.
A common limitation of the use of partial charges is the assumption that they are conformationally invariant. Unfortunately, the distribution of electrons around a molecule depends upon the spatial configuration of its nuclei. Some partial charge assignment algorithms, such as the method of Goddard and Rappe, consider these conformational effects, whilst others that are based on quantum mechanics, such as the RESP and AM1BCC methods of Bayly et al., go to great lengths to eliminate conformational effects, for example, by restraining and symmetrizing symmetric atom positions. This is necessary in order to be able to properly handle multiple conformations and changes in geometry (e.g. geometry optimization) with a single set of atomic charges.
Marsili-Gasteiger Partial Charges¶
Marsili-Gasteiger partial charges are assigned using a two stage algorithm. In the first stage, seed charges are assigned to each atom in the molecule. For example, carboxylate oxygens are each assigned the value -0.5. During the second stage, these initial charges are then shared across bonds, moving a certain amount of charge from one atom to another. The partial charge moved and its direction is determined by difference in electronegativities of the atoms on each end of the bond. The relaxation algorithm is then iterated several times (by default eight passes), attenuating the charge moved with each iteration. OpenEye does not recommend use of this charge model for intermolecular interactions; it was never intended for this purpose. The author of the method (Johann Gasteiger) developed it to compare relative reactivity of related organic chemical functional groups within different molecular contexts. Here it is included for comparison purposes.
MMFF94 Partial Charges¶
The partial charges used by the MMFF94 and MMFF94s force fields are assigned using a four stage algorithm. In the first stage, each atom of the molecule is assigned an MMFF94 atom type. In the second stage, an initial seed partial charge is assigned to each atom based upon its atom type. For a few atom types, the initial partial charge also depends upon the local environment. In the third stage, the initial charges assigned to aromatic rings are shared between all atoms of the aromatic ring. Finally, in the fourth stage, a table of bond charge increments (BCI) is used to move charges across bonds based upon the bond type of the bond (single, double, triple) and the atom types of the atoms at each end. Developed for the electrostatic interactions within the above-mentioned force fields, they are the appropriate charges to use with these force fields most notably for intramolecular interactions of pharmaceutical and bio-organic small molecules. They are less well-suited (but still passable) for intermolecular interactions using the common two-body additive Coulomb interactions as used in Amber, Charmm, Gromacs. For these better choices would be amber99sb charges on proteins and peptides, and am1bccsym charges on the ligand.
AM1 Charges¶
AM1 charges are a set of Mulliken-type charges derived from a semi-empirical quantum-mechanical calculation. For further discussion of this method, please see Dewar et. al. These should not be used for intermolecular interactions of force fields.
AM1BCC Charges¶
AM1BCC charges start with Mulliken-type partial charges derived from the AM1 semi-empirical quantum mechanical (QM) wave-function. In a second stage, bond-charge corrections (BCCs) are applied to the partial charges on each atom to generate new partial charges. Many different variants of AM1BCC charges are offered within our API because of the significant influences of several different factors on these QM-derived charges. Specifically, these factors are
Optimization: whether or not the input geometry is optimized. QM wavefunctions in general are quite sensitive to geometry, especially bond lengths and bond angles, so this can markedly affect the partial charges. In general, optimizing the geometry is recommended. To avoid a collapse of the conformation due to strong intramolecular electrostatic interactions, light restraints to the starting geometry are applied.
Symmetrization: whether or not topologically similar atoms (for example the two oxygens on a carboxylate) are constrained to have identical values. The true QM wavefunction is usually asymmetric around topologically similar atoms, leading to asymmetric partial charges. However, if the same partial charges are to be used on different conformers (as with general fixed-charge force fields) it is important that these charges be symmetrized or else interconverting between formally degenerate conformers (e.g. 180 degree rotation of the carboxylate) will have non-degenerate electrostatic energies. In general, if the partial charges are to be applied to more than the single conformer used to generate them, symmetrization is strongly recommended.
Another important issue with AM1BCC charges is if highly conformer-specific charges are generated which are unsuitable for other conformers, leading to undesirable perturbed electrostatic energies for those other conformers. To address this problem, we strongly recommend the ELF conformer selection protocol described below.
“Standard” AM1BCC includes both optimization and symmetrization.
OpenEye considers AM1BCC charges to be the best partial charge model currently available. For further discussion, please see the work of Christopher I. Bayly.
ELF Conformer Selection¶
ELF conformer selection is a method to select one or more conformers having Electrostatically Least-interacting Functional groups (ELF) from a large conformer database. The purpose of this method is to resolve important issues with QM-derived charges in general, including AM1BCC charges. The issue is that strong short-range intramolecular polarizations specific to a certain conformation, as with an intramolecular hydrogen bond or salt bridge, usually leads to strongly perturbed partial charges for the atoms involved. These charges can be very different from those found for other conformers which do not have that intramolecular polar interaction. If such partial charges are applied to all conformers, some of those other conformers are very likely to have wrongly over-stabilized solvation energies.
A second problem arising from this issue is the precision or sensitivity of the electrostatic energies resulting from QM-derived charges. By this we mean the variance in electrostatic relative energies between different conformers depending on what conformer is used for the QM-derived charges. Imagine two different sets of QM-derived partial charges for a molecule, each coming from a conformer having different strong intramolecular hydrogen bonds. Each set of partial charges will have strongly perturbed partial charges for the internally hydrogen bonded atoms, but they will be different. The relative energies between conformers will be different depending which partial charge set is used.
For these reasons it is important to avoid generating QM-derived partial charges from a conformer having electrostatically strongly interacting functional groups. This is what ELF conformer selection does.
ELF needs to start with enough conformers so that it can find a population of conformers that do not have strongly-interacting functional groups. In the first stage, the Coulomb electrostatic energy is calculated for every conformer using the absolute value of the MMFF94 partial charges (original negative charges are replaced with their absolute values). The electrostatic energies with such charges destabilize all strong polar interactions, and thus the lowest electrostatic energies correspond to the Electrostatically Least-interacting Functional groups. The lowest-energy 2% of conformers is selected as the 2% ELF population. We find that averaging 10 diverse conformers from the 2% ELF population is sufficient to provide a well-behaved set of QM-derived partial charges even for highly polar and charged molecules.
Amber ff94, ff96, ff99, ff99sb, and ff99sbc0 Partial Charges¶
The partial charges used by the AmberFF94 force field are based on fitting quantum mechanical electrostatic potentials (esp). They were developed to address two key issues with earlier esp-fit charge sets: unrealistically high charges on charge centers and the variation of atomic charges with conformation. While the latter should have some basis in electronic structure, numerical instability in the charge fitting process was the source of both these pathologies. AmberFF94 charge sets use restrained esp-fitting (RESP) to control the numerical instabilities and simultaneous multi-conformer fitting to lead to conformation-independent charges that are restricted to individual residues. Particular attention was given to ensure that backbone amides have consistent charges. The Amber force fields ff94, ff96, ff99, ff99sb, and ff99sbc0 all use the same set of RESP charges, they differ in other terms (mostly torsional).