pch

The pch utility helps prepare input files for use with szmap by adding partial charges (thus the name) and radii to atoms, and separating protein chains from any ligand(s) and any waters. It reads a structure file—preferably one where the hydrogens have already been added and oriented in optimal positions—and writes out two charged molecule files, one with the protein and any metals and one with other (non-water) molecules, such as ligands (waters can be written out to a separate file, if needed). The molecules in these output files have AmberFF94 partial charges assigned to protein atoms, formal charges on the ions, and AM1BCC partial charges on heterogen atoms. Modifications to residues, such as sugars or covalently-bonded small molecules or non-standard residues, are charged separately from the rest of the protein using AM1BCC and the charges are then transferred back to the modified protein.

Tip

The OpenEye Python Cookbook contains a recipe for assigning canonical AM1BCC partial charges to ligands which are much less dependent on ligand conformation, something pch cannot currently do.

By default, pch will eliminate all alternate conformations from the input except the one with the highest occupancy. Although this behavior can be overridden with the option -keep_alts, szmap will not process more than one conformation so keeping multiple alternative conformations is not appropriate for input to szmap.

Because many proteins require co-factors to function, pch provides a rich set of options to define precisely which residue(s) will go into the ligand file, leaving the rest to be incorporated into the protein file.

Examples

A useful protein preparation procedure before running szmap or gameplan starts with deleting any unwanted subunits, detergent and other non-essential molecules. Next, hydrogens are added and their orientations optimized, see the Tutorial and chapter mkhetdict for more information. Then, the protein and any ions are separated from any small-molecules and partial charges and radii are added using pch:

> pch structure.pdb prot+ions.oeb.gz small-mols.oeb.gz

Because the output structures were saved in OEBinary (.oeb) format, the partial charges and radii can be inspected in VIDA and used by szmap and gameplan.

Note

A warning that the formal charge is not equal to the sum of the partial charges usually indicates one or more atoms were missing from the protein structure. The degree to which this affects calculation results depends on the distance of the group with missing atoms from the region where the szmap calculations are performed. Missing atoms more than 10 Å from the binding site usually do not alter the results significantly. pch will display a list of any non-hydrogen atoms missing from standard protein residues, which can help you to check the location of missing atoms with respect to the binding site.

If the structure contains small-molecules other than the ligand, such as ions, detergent, or co-factors, use -lig_res or one of the other selection options to ensure that only the specified residue will be placed in the second (ligand) output file and all the other non-water molecules are placed in the first (protein) file.

> pch -lig_res cam 2cppH.pdb 2cpp_prot.oeb.gz 2cpp_lig.oeb.gz

Tip

To identify a peptide or nucleic acid as the ligand, select it by chain or by a range of residue numbers.

If you need to modify the charge pch assigns (for example, to change iron II to iron III), either modify the charge in VIDA using the builder or save the output to a .mol2 format or (DelPhi) .pdb format rather than .oeb format where it can easily be edited. DelPhi format is a non-standard version of the PDB format where the radii and partial charge are stored in the occupancy and B-factor fields, respectively.

> pch -lig_res cam 2cppH.pdb 2cpp_prot.pdb 2cpp_lig.oeb.gz
> edit 2cpp_prot.pdb

If you store the protein and ligand you want to use as input to pch in separate files, run pch twice and discard the extra (empty) files. Just make sure the protein and the ligand has hydrogens and that these hydrogens are in positions that make all the appropriate interactions with the corresponding protein or ligand.

> pch proteinH.pdb prot.oeb.gz /tmp/junk.oeb.gz
> pch ligand.sdf /tmp/junk.oeb.gz lig.oeb.gz

Elements

Because small-molecules and modifications to amino-acids are charged with AM1BCC and szmap uses MMFF van der Waals terms, there are restrictions on the elements that may be used with szmap and gameplan.

AM1BCC supports the following elements:

H, C, N, O, F, P, S, Cl, Br, I, and Si

MMFF supports the following elements:

H, B, C, N, O, F, P, S, Cl, Br, I, Si, Se,
Li, Na, K, Ca, Fe, Zn, Cu, and Mg

pch will generate a warning if other elements are found in the input. If the offending atom happens to be an metal used to determine the crystallographic phase and it is not related to the binding site, you can usually edit it out of the input and rerun pch. If, on the other hand, you need to replace this atom with a “reasonable facsimile”, you can specify -fix_elements to request that pch attempt such a replacement, for example replacing Hg^{2+} with Zn^{2+}. Note that the oxidation state may be a problem regardless of the substitution: if molybdenum IV is converted to iron II, you can’t just change this to iron IV because MMFF only contains van der Waals parameters for iron II and iron III. Future versions of SZMAP will provide van der Waals parameters across a wider range of elements and oxidation states.

Command Line Interface

A description of the basic command line interface can be obtained by executing pch with no arguments.

prompt> pch

will generate output similar to the following:

          :jGf:
        :jGDDDDf:              PPPPP     CCC   H    H
      ,fDDDGjLDDDf,            P    P   C   C  H    H
    ,fDDLt:   :iLDDL;          P    P  C       H    H
  ;fDLt:         :tfDG;        PPPPP   C       HHHHHH
,jft:   ,ijfffji,   :iff       P       C       H    H
     .jGDDDDDDDDDGt.           P        C   C  H    H
    ;GDDGt:''':tDDDG,          P         CCC   H    H
   .DDDG:       :GDDG.
   ;DDDj         tDDDi         Copyright (c) 2010-2015
   ,DDDf         fDDD,         OpenEye Scientific Software, Inc.
    LDDDt.     .fDDDj          Version: 1.2.1
    .tDDDDfjtjfDDDGt           Release: 20150305
      :ifGDDDDDGfi.            OEChem version: 1.9.2 20150305
          .:::.                Platform: redhat-RHEL5-g++4.1-x64
  ......................
  DDDDDDDDDDDDDDDDDDDDDD
  DDDDDDDDDDDDDDDDDDDDDD

  Licensed for the exclusive use of Company Name.
  Licensed for use only in Site.
  License expires on August 15, 2015.

No arguments specified on the command line
pch : add charges and radii and split into protein+ion and ligand files
Required parameters:
    -input_mol : Input molecule file. Should have coordinates for hydrogens
                 as well as heavy atoms.
For more help type:
  pch --help

Required Parameters

-input_mol <filename>
-i <filename>

[keyless parameter 1]

Input structure file. Hydrogens should already be added and their positions optimized.

File type Extension
OEBinary .oeb .oeb.gz
PDB .pdb .ent .pdb.gz .ent.gz
SDF .sdf .mol .sdf.gz .mol.gz
MOL2 .mol2 .mol2.gz
MacroModel .mmod .mmod.gz

Command Line Options

Output Files

-output_prot <filename>
-p <filename>

[keyless parameter 2; default pch_prot.oeb.gz]

Name for output protein file. This file is either a DelPhi format PDB file (radii and charges stored in occupancy and Bfactor) or another format that retains the partial charges: OEBinary or MOL2. Saving in an OEBinary format allows the charges and radii to be easily inspected in VIDA. The extension modifier .gz means gzipped and is usually more compact than the uncompressed format but otherwise identical.

File type Extension
OEBinary .oeb .oeb.gz
PDB(DelPhi) .pdb .ent .pdb.gz .ent.gz
MOL2 .mol2 .mol2.gz
-output_lig <filename>
-l <filename>

[keyless parameter 3; default pch_lig.oeb.gz]

Name for output ligand file. This file is either a DelPhi format PDB file (radii and charges stored in occupancy and Bfactor) or another format that retains the partial charges: OEBinary or MOL2. Saving in an OEBinary format allows the charges and radii to be easily inspected in VIDA. The extension modifier .gz means gzipped and is usually more compact than the uncompressed format but otherwise identical.

File type Extension
OEBinary .oeb .oeb.gz
PDB(DelPhi) .pdb .ent .pdb.gz .ent.gz
MOL2 .mol2 .mol2.gz
-output_waters <filename>
-waters <filename>

Name for output file containing waters. The following file 3D formats are supported:

File type Extension
OEBinary .oeb .oeb.gz
PDB .pdb .ent .pdb.gz .ent.gz
SDF .sdf .mol .sdf.gz .mol.gz
MOL2 .mol2 .mol2.gz
MacroModel .mmod .mmod.gz

Parameters

-nonsymmetrized_charges
-nonsym

Generate AM1BCC partial charges where topologically equivalent atoms are not forced to have the same charge. For example, by if false the hydrogens on a methyl will all have the same partial charge. Charges generated if this option is true will generally differ for each of these “equivalent” hydrogens.

Since szmap does not vary the conformation of the ligand, non-symmetrized charges can be used to describe a specific conformation of each atom. But for any workflow down the road where the conformation may change, using symmetrized charges is probably more appropriate.

[default: true]

-keep_alts
Retain alternate locations in input. This option should be used with caution, molecules with alternate locations cannot be processed sensibly by most programs and all but the highest occupancy will be discarded by szmap.
-no_fixup
Don’t attempt to fix problems with protein charges (inconsistent HIS and N-term charges).
-fix_elements
Attempt to substitute elements that will work with szmap for those that won’t work (due to lack of MMFF parameters) and adjust metal charges.
-keep_numbers
Do not renumber atoms in output.

Ligand Selection

-ligand_res <RES>
-lig_res <RES>
If specified, only ligands with this (case insensitive) residue name will be written to -output_lig, the rest (e.g. cofactors) will be added to -output_prot. Results will be produced even if a residue named RES is not found.
-ligand_resnum <int>
-lig_num <int>
If specified, only ligands with this residue number will be written to -output_lig, the rest (e.g. cofactors) will be added to -output_prot.
-ligand_resnum_end <int>
-lig_end <int>
If specified, this is the last residue number (in a range of residues beginning with -ligand_resnum) that will be written to -output_lig. Although -ligand_inscode will be ignored if this option is used, the other selection criteria (residue name, chain ID, model number) will operate normally.
-ligand_inscode <char>
-lig_ins <char>
If specified, only ligands with this (case insensitive) residue number insertion code will be written to -output_lig, the rest (e.g. cofactors) will be added to -output_prot.
-ligand_chain <char>
-lig_ch <char>
If specified, only ligands with this (case insensitive) chain ID will be written to -output_lig, the rest (e.g. cofactors) will be added to -output_prot.
-ligand_model <int>
-lig_model <int>
If specified, only ligands from this model will be written to -output_lig, the rest (e.g. cofactors) will be added to -output_prot.

Other Options

-verbose
-v
Print additional information.

Table Of Contents

Previous topic

mkhetdict

Next topic

szmap_grid