Macromolecule Conformations

Alternate Locations

Because macro-molecular structures are usually represented as static shapes, this gives the mistaken impression that proteins, nucleic acids, etc. are rigid molecules. In truth, these molecules move around quite a lot and in a crystal, loops and other bits are often disordered. Crystallographers work to model multiple conformations in parts of the structure where disorder is observed. If they can, they include a separate copy of each moving atom for each conformation, marking each with an ‘alternate location code’, a fractional ‘occupancy’ and a ‘temperature factor’ quantifying the atom’s thermal motion. Section Biopolymer Residues discusses how these properties are stored in an OEResidue.

Because a structure with alternate locations describes an ensemble of molecules rather than a single molecule, they are unsuitable as-is for calculating molecular properties. Often, this is dealt with by dropping all but the first alternate location from the molecule and this is what OEReadMolecule does for Protein Data Bank (PDB) files, by default. The first step in dealing with alternate locations is to retain all the alternate location atoms by setting the input flavor before reading the molecule, as shown below.

ims.SetFlavor(OEFormat::PDB, OEIFlavor::PDB::ALTLOC);

With all the alternate atoms retained, you can use the predicate OEHasAlternateLocation to identify these atoms. Although alternate locations are atom properties, they usually describe the coordinated motion of groups of atoms. Each connected set of atoms with alternate location codes that move in a coordinated fashion is called an alternate location group (represented by an OEAltGroup) and each conformation of a group’s atoms is called an alternate location (represented by an OEAltLocation). The second step in dealing with alternate locations is to use the OEAltLocationFactory, a class that will manage these groups and locations for you.

Listing 1: Alternate location factory groups

#include <openeye.h>
#include <oesystem.h>
#include <oechem.h>
#include <oebio.h>

using namespace OESystem;
using namespace OEChem;
using namespace OEBio;

void PrintAltGroupInfo(OEMolBase &mol)
{
  if (!OEHasResidues(mol))
    OEPerceiveResidues(mol, OEPreserveResInfo::All);

  OEAltLocationFactory alf(mol); // create factory for mol

  std::cout << mol.GetTitle() << "\t"
            << "(" << alf.GetGroupCount() << " groups)" << std::endl;

  for (OEIter<const OEAltGroup> grp = alf.GetGroups(); grp; ++grp)
  {
    std::cout << "\t" << grp->GetLocationCount()   << " locs"
              <<  ":" << alf.GetLocationCodes(grp) << std::endl;
  }
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    OEThrow.Usage("%s <mol-infile>", argv[0]);

  oemolistream ims;
  if(! ims.open(argv[1]))
    OEThrow.Fatal("Unable to open %s for reading", argv[1]);

  // need this flavor to read alt loc atoms
  ims.SetFlavor(OEFormat::PDB, OEIFlavor::PDB::ALTLOC);

  OEGraphMol mol;
  while(OEReadMolecule(ims, mol))
  {
    PrintAltGroupInfo(mol);
  }
  return 0;
}

In addition to providing methods to work with alternate locations and groups, the OEAltLocationFactory corrects its copy of the input source molecule for bond and formal charge problems caused by atoms having multiple locations (something the standard molecule perception routines are not setup to handle).

The OEAltLocationFactory also provides methods for manufacturing subset molecules that represent specific selections among the different groups of alternate locations. The initial (primary) selection is the alternate location in each group with the largest average occupancy. In the example below, the subset is for the previous set of location selections plus the location with code 'B' that includes the residue of the specified atom.

Listing 2: Making an alternate location factory subset mol

// given OEAltLocationFactory alf and OEIter<OEAtomBase> atom ...
OEAltLocation loc = alf.GetLocation(atom, 'B');
OEGraphMol ssmol;
if (alf.MakeAltMol(ssmol, loc))
{
  // use the subset mol...

Dihedrals and Sidechain Rotamers

The function OEGetRotamers returns an iterator of ‘rotameric’ sidechain conformations for a given amino-acid type, while OESetRotamer will set the sidechain chi angles of a specified OEHierResidue or OEAtomBase to a given rotamer. The Dunbrack [Dunbrack-1997], Richardson [Lovell-2000], and newer Richardson_2016 [Hintze-2016] rotamer libraries are supported.

There are also functions that return backbone and sidechain dihedral angles (OEGetPhi, OEGetPsi, OEGetChis, OEGetTorsion) and modify dihedrals (OESetTorsion).

Swapping Ambiguous Isoelectronic Residue Atoms

The function OESwapAIEResidueAtoms exchanges the coordinates of nitrogen and oxygen atoms in aspartic acid, asparagine, glutamic acid and glutamine sidechains and the ND1/CD2 and CE1/NE2 atoms in histidine rings. These are atoms that may be confused with one another in an electron density map because they have the same or very similar electron density and swapping coordinates is occasionally required to correct an error in a structure.