Tutorial

Introduction

The SZMAP product contains eight applications; szmap itself; gameplan, a program that runs szmap to explore a binding site; three protein preparation utilities: fixdupatomnames, mkhetdict and pch; and three results processing utilities: szmap_report, szmap_grid and grid_comp.

  1. The szmap program is an application for analyzing sites on the surface of a protein and/or ligand using an explicit probe molecule, usually water, and producing 3D grids of estimated thermodynamic properties. szmap can also evaluate solvent thermodynamics rapidly at specified coordinates.
  2. The gameplan tool identifies interesting coordinates near a ligand or in a protein binding site, runs szmap, and analyzes the results, comparing ligand properties to the binding site environment, generating a set of hypotheses for ligand modification and identifying sites where water either stabilizes or destabilizes the complex.
  3. The fixdupatomnames utility program can be used during protein and ligand preparation to identify and rename any duplicate atom names which can confuse reduce, an open-source program used to add and optimize hydrogens.
  4. The mkhetdict utility program creates a PDB heterogen dictionary enabling reduce to add hydrogens to ligand atoms.
  5. The pch utility program assists in preparing structure files for use with szmap by adding partial charges and radii to atoms and partitioning the molecules into separate protein and ligand files
  6. The szmap_grid utility program displays information about grids within an OE binary file (.oeb) and can split out a particular grid into a separate file. It also displays property tags in -at_coords results.
  7. The grid_comp utility program performs various mathematical operations on a grid or pair of grids to facilitate grid comparisons.
  8. Finally, the szmap_report utility program produces .pdf files containing 2D renderings of szmap results using the OEDepict and Grapheme technology.

In addition, there are three VIDA extensions for working with szmap output:

  1. The Water Orientation VIDA Extension identifies key waters and analyzes their energies and orientational constraints.
  2. The WaterColor VIDA Extension changes the display styles of szmap grids and gameplan annotation to facilitate analysis.
  3. The Color By Atom Properties VIDA Extension colors molecules based on properties calculated at atomic coordinates.

In the Preparation section below, reduce (non-OpenEye software available at kinemage.biochem.duke.edu) is used to add and optimize explicit hydrogens. You are free to use alternative methods to perform this and other “prep” tasks.

A Word About Pronunciation

Like a number of OpenEye product names, SZMAP (solvent-zap-map) contains the letter pair sz which sounds similar to sh. It is pronounced ‘shmăp.

SZMAP Workflow

The sections below represent a tutorial, going through a typical workflow and analysis of the PDB entry 4STD. Example input and output files and a list of commands for the following workflow are found in {SZMAP Version Examples}/tutorial_files (see Installation and Platform Notes for the location of this directory/folder on your computer). Images are provided to guide the user when visualizing SZMAP results.

The workflow consists of three basic parts:

  • Preparing molecules so that they can be used as input to szmap or gameplan.
  • Running szmap or gameplan.
  • Analyzing the results in VIDA with the help of the SZMAP extensions.

For a complete description of the OpenEye programs referred to below, see the corresponding chapter later in this manual.

Protein and Ligand Preparation

Good protein and ligand preparation is vital before running szmap or gameplan. This consists of trimming molecules to the relevant parts; adding hydrogens, partial charges, and atomic radii; and organizing them into separate protein and ligand files.

The commands below are shown in a form appropriate for Linux, Unix or Mac OS X. On Windows some of the auxiliary commands will not be available and a substitution will be necessary. For example, you may need to use winzip rather than gzip. When you install SZMAP on Windows, a pre-configured version of the DOS command prompt is constructed and can be used to run any SZMAP program or utility without any extra set-up. This window can be found under the Start menu in All Programs >> OpenEye >> SZMAP {version} >> Command Prompt.

First, examine your structure in VIDA to determine the number of subunits and where the ligand is with respect to the subunit interface. VIDA can read gzipped structure files as-is and has the File >> Open Special >> From PDB menu command to fetch structures from the Protein Data Bank directly.

_images/vida-split-selected.png

File >> New Molecule >> From Split >> Selected

The structure for 4STD is a trimer with the ligand some distance from the protein/protein interface so we can edit the file to delete protein chains B and C (proteins and small molecules). To do this in VIDA, open the protein in the List window and drill down until you see the three chains. Clicking on Chain A and then right-clicking (control-click on the Mac) to bring up the pop-up menu will allow you to select chain A. Selecting the menu item File >> New Molecule >> From Split >> Selected will generate a new list item with two entries: one for chain A and one for the other chains. Selecting the split-out A chain and right-clicking will bring up a menu that will allow you to save this item to a file. Be sure to change the format to PDB as reduce will expect a .pdb file.

If you wish to delete detergents or other extraneous molecules, make sure they are not selected when you do the split operation. Alternatively, you may prefer to edit your molecule by hand as follows. If your structure file is gzipped PDB file, unzip it (gzip is a widely-available open-source command-line tool for doing this; on Windows a program like winzip may be more convenient). Edit it to delete extra subunits, detergent, or other extraneous molecules using whichever text editor you prefer. It is not necessary to remove the waters at this stage. They will be culled by pch in a later step. And it is also not necessary to prune the connection table—any references to deleted atoms will be ignored

> gzip -dc pdb4std.ent.gz > 4std.pdb
> edit 4std.pdb

Since szmap and gameplan require explicit hydrogen atoms on the molecules and most PDB structures do not include the hydrogen atoms, the next steps produce a protein structure with all the hydrogen atoms explicitly represented. There are many ways to do this. Here, we will use reduce, a free program to add and optimize hydrogens that is available from the Richardson laboratory at Duke University.

If your atom names contain duplicates (for example, if all hydrogens are named ” H “) you need to convert them to unique atom names, see chapter fixdupatomnames for instructions. In our example this is not required.

Next, make a Protein DataBank heterogen dictionary, a format reduce can use to work out how to protonate ligands.

> mkhetdict 4std.pdb 4std_hets.txt

If you need to use an ionization or tautomerization state other than the one mkhetdict assigns, you can edit the heterogen dictionary to add or delete hydrogens as required.

The next step is to add hydrogens and optimize OH, SH, His, Asn/Gln, etc. in the context of the complex. This is currently done using non-OpenEye software such as reduce [Word-1999] (free to license and available for download at kinemage.biochem.duke.eduWindows users should use the most recent installer and if using with cygwin, see this discussion of auto restart).

Note

Reduce is not produced or supported by OpenEye. Information is provided here for your convenience. You are free to use programs other those described here as long as they produce similar results. Future versions of SZMAP will will not require third-party software for this function.

Reduce requires both the input and the output structure files to be in .pdb format.

> reduce -db 4std_hets.txt -rotexist -build 4std.pdb >4stdH.pdb 2>4stdH_reduce.log

Next, split the structure into protein and ions in one file and the ligand in another and add partial charges and radii to all the atoms. If the structure contains multiple small molecules (ligand + cofactor, salt, etc.), pch -ligand_res LIG will ensure that only the residue LIG is put in the ligand file and the cofactors, etc. are added to the protein file (see chapter pch for a complete list of options for distinguishing the ligand from other molecules).

Tip

The pch utility is supplied for your convenience. If you have another mechanism for assigning partial charges to your molecules, szmap will accept the results.

pch will assign partial charges to amino-acids that contain covalent modifications, using AmberFF94 charges for any standard amino acid and AM1BCC for any other group. It will also eliminate alternate conformations from a structure, leaving only the conformation with the highest occupancy. Occasionally, X-ray structures contain mistakes where alternate location codes are scrambled, leading to incorrect bonds being assigned. These bonds are often much longer or much shorter than they should be. Errors such as these in your input may have to be resolved before it is suitable for use with pch.

> pch -ligand_res BFS 4stdH.pdb 4std_prot.oeb.gz 4std_lig.oeb.gz

Warnings of No Amber charges and Formal charge(#) is not equal to sum of partial charges(#) indicate missing atoms and should be accompanied by Missing atom warnings listing the missing protein (non-hydrogen) atoms. Similarly, warnings of bad MMFF types in szmap can usually be traced to missing atoms or very poor geometry that leads to inappropriate bonds being assigned. Missing protein atoms or bad bonds may not actually be a problem if they are sufficiently distant from the binding site.

The charged molecules are usually written to OE binary files but if you need to modify the charges, for example to change iron II to iron III, the molecules can be written to editable DelPhi format .pdb files (with radii and partial charges in the occupancy and B-factor fields) which szmap can read.

Warning

If szmap is given an input .pdb file which still contains the usual occupancy and B-factor, rather than radii and partial charges, any energies it generates will be meaningless.

Running SZMAP Stabilization Calculations

Assuming the input structures have been prepared as described above, szmap grids can be calculated for the region around the ligand as follows:

> szmap -mpi_np 4 -prefix 4std_stbl -stbl -p 4std_prot.oeb.gz -l 4std_lig.oeb.gz

Calculations for complex, apo, and ligand grids will be done (requiring about 40 minutes for this example on Linux using four 2.8GHz processors) and then further processed to yield (complex - apo - ligand) stabilization grids, describing where water is stabilizing or destabilizing the binding reaction. The output consists of the calculated grids in an .oeb.gz file, along with a parameter file and a log file, all with names prefixed with “4std_stbl”.

To run on only a single processor, drop the -mpi_np # option option. More information on options can be found in chapter szmap.

Note

szmap (and gameplan) will check for missing hydrogens and missing partial charges and halt if they are missing. Sometimes, OEChem will have a different opinion about the charge or bond order than what you expect. For example, if your structure has significantly non-planar geometry around an aromatic or double bond, OEChem may try to treat it as a single bond. The option -warn_if_missing_hydrogens will allow you to run szmap and gameplan despite OEChem thinking there are hydrogens that are not explicitly in your structure. But you may also need to change the formal charge of a polar group before running pch, either in VIDA’s builder or by modifying the structure file using a text editor.

Determining the Ligand Displacement Region

The results can be processed with grid_comp to break-out the region of the apo grid that has been displaced by the ligand.

> grid_comp -op lig_disp -i 4std_stbl.oeb.gz -o 4std_stbl_disp.oeb.gz

While not required, these displacement grids can be useful. For example, they can be used to identify precisely which waters are displaced by each of a series of substituents. They also make it easy to determine the volume of water displaced by the ligand.

Analyzing Grid Results

szmap grid results can be visualized in VIDA using the WaterColor VIDA Extension (sets grid styles for viewing—the link leads to information about the extension) and the Water Orientation VIDA Extension (shows water orientational preferences). If VIDA is in your command search path, the following command will open VIDA and load the file 4std_stbl_disp.oeb.gz. VIDA can also be launched by clicking on the VIDA icon and the file can then be loaded using the Open... command in the File menu. You can also drag and drop molecule files onto a VIDA instance.

> vida 4std_stbl_disp.oeb.gz

Note

VIDA extensions need to be installed before they can be used. For instructions, see Installing VIDA Extensions.

The neutral difference (water - uncharged) grids map the polarity throughout the site and the stabilization grids describe the binding reaction (complex - apo - ligand): they have negative energies where water enhances binding affinity and positive where it decreases binding affinity.

_images/4std_lig_disp_grid.png

Displaced Neutral Difference Free Energy

The figure Displaced Neutral Difference Free Energy was generated by selecting 4std_stbl_disp.oeb.gz in VIDA’s list window and running the WaterColor VIDA Extension to setup the contour levels and display styles, then opening the lig_disp:apo:PROTEIN... item in the list window and displaying the neut_diff_apo_free_energy_grid. You must run WaterColor on each new szmap results file opened in VIDA. See chapter WaterColor VIDA Extension for more detailed instructions.

The figure Displaced Neutral Difference Free Energy shows that the ligand displaces regions with both negative (yellow) and positive (purple) neutral difference free energies. These energies represent a polarity scale: negative regions favor polar groups while positive regions favor non-polar groups. Note that the hydroxyl oxygen nicely substitutes for solvent that acts as an H-bond acceptor. The region in the upper center, on the other hand, contains solvent interacting with a pair of tyrosine hydroxyls that is displaced but not replaced with a similar group. This suggests placing a polar group into this site.

_images/4std_lig_disp_orient.png

Orientations and Displaced Neutral Difference Free Energy

The Water Orientation VIDA Extension will display the most probable water orientations for each point in the calculation, filtered to show only the dominant points (local minima and maxima). It makes it easy to determine where waters donate or accept H-bonds and where waters do not make any significant interactions. Open the control panel by selecting the Extensions >> Water Orientation menu in VIDA (it will remain open until you close it).

The figure Orientations and Displaced Neutral Difference Free Energy adds information indicating both the magnitude of the free energies and the geometry that must be taken into account in positioning a substituent. To generate the figure, set the Apply to: popup in the Water Orientation VIDA Extension to 4std_stbl_disp.oeb.gz/apo(lig_disp)—the ligand displacement apo results—and press the Update button.

_images/4std_cplx_orient.png

Complex Neutral Difference Free Energy

The next stage is to examine the protein-ligand complex grid, illustrated in Complex Neutral Difference Free Energy. First, undisplay all grids from the lig_disp:apo:PROTEIN... entry in VIDA’s list window, along with the 4std_stbl_disp.oeb.gz/apo(lig_disp) entry from the previous orientation display. Next, open the complex:PROTEIN... section of VIDAs list window to reveal the set of grids calculated for the complex. Select and display the neut_diff_free_energy_grid. Finally, set the Water Orientation VIDA Extension to applied to 4std_stbl_disp.oeb.gz/complex and press Update. The display shows two regions of negative free energy: a region at the bottom were solvent interacts with two histidines and the amide NH of the ligand, and a smaller region at the top, just beyond the displaced region, that interacts with the previously mentioned tyrosines, along with one region of positive free energy.

_images/4std_stbl_grid.png

Stabilization Neutral Difference Free Energy

Looking at the Stabilization Neutral Difference Free Energy (by undisplaying any other grids or orientations and then opening the stabilization:PROTEIN... item in the VIDA list window and displaying the neut_diff_stbl_free_energy_grid), it is clear that only the lower region is stabilizing the bound ligand, adding weight to the idea of displacing water in the upper region.

Running SZMAP at Coordinates

szmap can be run at arbitrary coordinates, for example the atom positions in the ligand. Because this calculation is performed only over the atom coordinates specified in -at_coords <mol_file> it will run very quickly, around 1 minute for this example. While not as informative as a full grid calculation, “at coords” calculations are a rapid way to survey a binding site (about 20 seconds on four 2.8GHz processors).

> szmap -mpi_np 4 -prefix 4std_lig_coords -at_coords 4std_lig.oeb.gz -p 4std_prot.oeb.gz

The output for -at_coords calculations includes a text file containing tab-delimited tables of results and an .oeb file with szmap results as properties attached to the atoms used to define the coordinates.

Analyzing Coordinate Results

If the calculations were done at specified coordinates (using the -at_coords parameter) the results can be visualized with the Water Orientation VIDA Extension or the Color By Atom Properties VIDA Extension (which colors coordinates by calculated values). Open the file in VIDA using the command below or, if VIDA is already open, by using the File >> Open menu.

> vida 4std_lig_coords.oeb.gz
_images/4std_lig_coords.png

Apo Neutral Difference Free Energy at Ligand Coordinates

The figure Apo Neutral Difference Free Energy at Ligand Coordinates shows results similar to the Orientations and Displaced Neutral Difference Free Energy. Calculating at the ligand coordinates is very quick but produces a coarse sampling and in regions where the gradient is large, can be somewhat misleading.

To generate this figure, the Water Orientation VIDA Extension panel was accessed to Apply to: 4std_lig_coords.oeb.gz/apo and the Update button was pressed. Next, the apo: at_coords entry in the list window was selected and the “H” button in the style panel was used to display all the ligand hydrogens. With the apo: at_coords entry still highlighted in the list window, the Color By Atom Properties VIDA Extension was opened and the property szmap_neut_diff_free_energy was selected. Then, if this is the first time the panel has been used, the color tiles in the center-left were selected one-by-one and changed so that the left one was yellow, the center one was white, and the right one was purple (the standard SZMAP color scheme). The missing values color tile was also set to a dark gray color easily distinguished from the others. Finally, the options “Lock middle color to zero” and “Missing Values” were checked. The dark gray atoms along outer edges of the ligand are at positions where a water probe is slightly too large to fit without clashing, but it is worth noting that this can occasionally be alleviated with a shift of just a fraction of an Ångström.

Two-dimensional Coordinate Reports

In addition to the 3D graphics, you can also produce a printable .pdf report of -at_coords results with szmap_report. The molecule is shown in 2D, colored by their numerical value of the szmap results. If multiple ligands were used for the coordinate calculations, szmap_report will depict all of them for easy comparison.

> szmap_report 4std_lig_coords.oeb.gz 4std_lig_coords.pdf szmap_neut_diff_free_energy
_images/4std_lig_coords_report_nddg.png

Neutral Difference Free Energy in 2D

Figure Neutral Difference Free Energy in 2D shows a SZMAP Report for the neutral difference free energy with a surface representing how tightly or loosely the ligand fits into the binding site. See chapter szmap_report for more information on this surface representation. In this case, the phenolic ring is held tightly while the methyl group is adjacent to a cavity. The pocket around the fluorine is so tight that a probe water cannot fit there, indicated by the gray color of the fluorine atom.

Stabilization at Coordinates

While calculations at ligand atom coordinates cannot produce stabilization energies, calculations at coordinates other than the ligand can. They can also be very quick ways to explore properties of the complex, with the same caveat that they represent a coarse sampling.

In our example, the two crystallographic waters closest to the ligand are split from the original PDB file and used to calculate stabilization values at these coordinates. Although the splitting can also be done with VIDA or with a text editor, here we use the egrep command generally available on Linux, Mac OS X or Windows cygwin.

> egrep 'HOH A 187' 4std.pdb >  4std_selected_waters.pdb
> egrep 'HOH A 190' 4std.pdb >> 4std_selected_waters.pdb

> szmap -prefix 4std_wat_coords -at_coords 4std_selected_waters.pdb \
>       -mpi_np 4 -stbl -p 4std_prot.oeb.gz -l 4std_lig.oeb.gz

Tip

The line above with a ‘\’ as the very last character indicates that the command was too long to fit on one line and continues on the following line.

Analyzing Coordinate Stabilization Results

> vida 4std_wat_coords.oeb.gz
_images/4std_wat_coords_orient.png

Complex Neutral Difference Free Energy at Water Coordinates

The Complex Neutral Difference Free Energy at Water Coordinates (with orientations and energies from the Water Orientation VIDA Extension) shows results similar to the Complex Neutral Difference Free Energy but were produced much more quickly.

_images/4std_wat_coords_stbl_4digits.png

Stabilization Neutral Difference Free Energy at Water Coordinates

In Stabilization Neutral Difference Free Energy at Water Coordinates, the stabilization: at_coords waters were highlighted in the list window and colored based on the property szmap_neut_diff_stbl_free_energy using Color By Atom Properties VIDA Extension. No orientation data is shown because orientation information is not produced by stabilization calculations, only complex, ligand, and apo calculations. But using VIDA’s generic data labeling capabilities, the coordinates can be labeled by the neutral difference stabilization free energy values. This is a quick way to visualize energies from a stabilization grid calculation. Rather than using crystallographic waters in the calculation, the “at coords” positions can be waters from a previous grid calculation using the “Split” tool in the Water Orientation VIDA Extension, or generated by some other procedure.

Running Gameplan

The application gameplan uses szmap to analyze ways to modify ligand chemistry based on water structure and energetics in the immediate environment of the ligand. By testing a limited number of coordinates based on the structure of the ligand and the binding site, gameplan quickly generates: a description of how compatable the ligand is with the binding site, a series of hypotheses for ligand variants, and stabilizing or destabilizing waters.

Again, assuming the input protein and ligand structures have been prepared as described above, a gameplan calculation can be performed as follows:

> gameplan -szmap_mpi_np 4 -prefix 4std_gameplan -p 4std_prot.oeb.gz -l 4std_lig.oeb.gz

The protein and ligand are analyzed and the most informative sample points are identified. szmap is then run on those points, and the results are transformed into a set of potential ligand modifications. All of this takes about 4 minutes on four 2.8GHz processors and the output is placed in the file “4std_gameplan.oeb.gz”, which can be analyzed in VIDA using the WaterColor VIDA Extension. The file “4std_gameplan.log” documents the results, including the energy values for each attachment hypothesis.

The command line option -szmap_mpi_np # controls how gameplan runs szmap to compute the results. More information on options can be found in chapter gameplan.

Analyzing Gameplan Results

Results from gameplan contain ideas for ligand modifications and points where water is stabilizing or destabilizing the protein/ligand complex.

> vida 4std_gameplan.oeb.gz
_images/4std_polar_attachment.png

Gameplan Hypothesis for a Polar Attachment

In Gameplan Hypothesis for a Polar Attachment, the WaterColor VIDA Extension has been used to configure the display of the gameplan results by highlighting any part of the gameplan results in the list window and running WaterColor VIDA Extension. Using the up-arrow and down-arrow keys, you can browse through the different components of the results in the list window. Shown in pink is a suggestion for a site where a hydrogen-bond acceptor might interact favorably with nearby tyrosine residues. The yellow ball indicates that water makes a strong polar interaction at this site in the apo-protein. For a complete description, see chapter gameplan.