Creating OEDesignUnits from a PDB file

Preparation of a biological structure file (PDB, mmCIF) to a fully charged, hydrogenated, molecular componentized object (design unit; DU; OEDesignUnit) is one of the more advanced functionalities offered through Spruce TK. This example shows how to construct DUs using an input PDB file and the OEMakeDesignUnits function.

Command Line Interface

This example uses an input PDB file, and will output a set of DUs from it to a temporary directory (see OEMakeDesignUnits for details on the API).

make_design_units <input biomolecular PDB> [<electron density mtz>] [<LoopModelingTemplateDB>]

Code

Download code

make_design_units.py and both the 3tpp.pdb (the input PDB file), 3tpp.mtz (the input MTZ file), and spruce_bace.loop_db (the input loopDB file)

#!/usr/bin/env python
# (C) 2022 Cadence Design Systems, Inc. (Cadence) 
# All rights reserved.
# TERMS FOR USE OF SAMPLE CODE The software below ("Sample Code") is
# provided to current licensees or subscribers of Cadence products or
# SaaS offerings (each a "Customer").
# Customer is hereby permitted to use, copy, and modify the Sample Code,
# subject to these terms. Cadence claims no rights to Customer's
# modifications. Modification of Sample Code is at Customer's sole and
# exclusive risk. Sample Code may require Customer to have a then
# current license or subscription to the applicable Cadence offering.
# THE SAMPLE CODE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED.  OPENEYE DISCLAIMS ALL WARRANTIES, INCLUDING, BUT
# NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
# PARTICULAR PURPOSE AND NONINFRINGEMENT. In no event shall Cadence be
# liable for any damages or liability in connection with the Sample Code
# or its use.

#############################################################################
# Script to prepare proteins into design units
#############################################################################
import sys
import os
from openeye import oechem
from openeye import oegrid
from openeye import oespruce


def main(argv=sys.argv):

    if len(argv) < 2 or len(argv) > 4:
        oechem.OEThrow.Usage("%s <infile> [<mtzfile>] [<loopdbfile>]" % argv[0])

    ifs = oechem.oemolistream()
    ifile = argv[1]
    if not ifs.open(ifile):
        oechem.OEThrow.Fatal("Unable to open %s for reading" % ifile)

    include_loop = False
    include_ed = False
    ed = oegrid.OESkewGrid()

    if len(argv) > 2:
        if len(argv) == 4 or (len(argv) == 3 and "mtz" in argv[2]):
            edfile = argv[2]
            if not oegrid.OEReadMTZ(edfile, ed, oegrid.OEMTZMapType_Fwt):
                oechem.OEThrow.Fatal(
                    "Unable to read electron density file %s" % edfile
                )  # noqa
            include_ed = True
        if len(argv) == 4:
            loopfile = argv[3]
            include_loop = True
        elif len(argv) == 3 and "mtz" not in argv[2]:
            loopfile = argv[2]
            include_loop = True

    if ifs.GetFormat() not in [oechem.OEFormat_PDB, oechem.OEFormat_CIF]:
        oechem.OEThrow.Fatal("Only works for .pdb or .cif input files")

    ifs.SetFlavor(oechem.OEFormat_PDB, oechem.OEIFlavor_PDB_SpruceDefault)
    ifs.SetFlavor(oechem.OEFormat_MMCIF, oechem.OEIFlavor_MMCIF_SpruceDefault)

    mol = oechem.OEGraphMol()
    if not oechem.OEReadMolecule(ifs, mol):
        oechem.OEThrow.Fatal("Unable to read molecule from %s" % ifile)

    allow_filter_errors = False
    metadata = oespruce.OEStructureMetadata()
    filter_opts = oespruce.OESpruceFilterOptions()
    makedu_opts = oespruce.OEMakeDesignUnitOptions()
    makedu_opts.GetPrepOptions().GetBuildOptions().GetLoopBuilderOptions().SetBuildTails(False)
    if include_loop:
        makedu_opts.GetPrepOptions().GetBuildOptions().GetLoopBuilderOptions().SetLoopDBFilename(
            loopfile
        )
    
    filter = oespruce.OESpruceFilter(filter_opts, makedu_opts)
    ret_filter = filter.StandardizeAndFilter(mol, ed, metadata)
    if ret_filter !=oespruce.OESpruceFilterIssueCodes_Success:
        oechem.OEThrow.Warning("This structure fails spruce filter due to: ")
        oechem.OEThrow.Warning(filter.GetMessages())
        if not allow_filter_errors:
            oechem.OEThrow.Fatal("This structure fails spruce filter")

    if include_ed:
        design_units = oespruce.OEMakeDesignUnits(mol, ed, metadata, makedu_opts)
    else:
        design_units = oespruce.OEMakeDesignUnits(mol, metadata, makedu_opts)

    validator = oespruce.OEValidateDesignUnit()

    base_name = os.path.basename(ifile)[:-4] + "_DU_{}.oedu"
    for i, design_unit in enumerate(design_units):
        ret_validator = validator.Validate(design_unit,metadata)

        if ret_validator != oespruce.OEDesignUnitIssueCodes_Success:
            oechem.OEThrow.Warning("This generated DU did not pass DU validator.")
            oechem.OEThrow.Warning(validator.GetMessages())
        oechem.OEWriteDesignUnit(base_name.format(i), design_unit)


if __name__ == "__main__":
    sys.exit(main(sys.argv))

Example

make_design_units.py 3tpp.pdb 3tpp.mtz spruce_bace.loop_db

will generate the following output:

DPI: 0.06, RFree: 0.18, Resolution: 1.60
Processing BU # 0 with title: BETA-SECRETASE 1, chains A, alt: A
Warning: For residue ARG -4   A 1   removing clashing solvent molecule HOH 597   A 2
Warning: For residue ARG -4   A 1   removing clashing solvent molecule HOH 487 A A 1
Warning: For residue ARG 7   A 1   removing clashing solvent molecule HOH 498   A 2
Warning: For residue ARG 7   A 1   removing clashing solvent molecule HOH 730   A 2
Warning: For residue ARG 7   A 1   removing clashing solvent molecule HOH 653   A 2
Warning: For residue ARG 128   A 1   removing clashing solvent molecule HOH 523   A 2
Warning: For residue ARG 128   A 1   removing clashing solvent molecule HOH 654   A 2
Warning: For residue LYS 142   A 1   removing clashing solvent molecule HOH 550   A 2
Warning: For residue LYS 142   A 1   removing clashing solvent molecule HOH 691   A 2
Warning: For residue ARG 205   A 1   removing clashing solvent molecule HOH 423   A 2
Warning: For residue ARG 205   A 1   removing clashing solvent molecule HOH 703   A 2
Warning: For residue LYS 256   A 1   removing clashing solvent molecule HOH 604   A 2
Found gap between ALA 157   A 1   and VAL 170   A 1  , with sequence GFPLNQSEVLAS
Found gap between GLU 310   A 1   and THR 314   A 1  , with sequence DVA
Opened database spruce_bace.loop_db
LoopDatabase Info:
    276 loops from RSCB last synced on 03-19-2020, were added to LoopTemplateDatabase on 03-19-2020 using Spruce Toolkit 1.0.0.a
    The loop database was built with a max loop length of 22, a termini crop length of 2, and excluding regular secondary structures
Opened database spruce_bace.loop_db
LoopDatabase Info:
    276 loops from RSCB last synced on 03-19-2020, were added to LoopTemplateDatabase on 03-19-2020 using Spruce Toolkit 1.0.0.a
    The loop database was built with a max loop length of 22, a termini crop length of 2, and excluding regular secondary structures
Processing BU # 1 with title: BETA-SECRETASE 1, chains A, alt: B
Warning: For residue ARG -4   A 1   removing clashing solvent molecule HOH 597   A 2
Warning: For residue ARG -4   A 1   removing clashing solvent molecule HOH 487 A A 1
Warning: For residue LYS 9   A 1   removing clashing solvent molecule HOH 675   A 2
Warning: For residue ARG 128   A 1   removing clashing solvent molecule HOH 523   A 2
Warning: For residue ARG 128   A 1   removing clashing solvent molecule HOH 654   A 2
Warning: For residue LYS 142   A 1   removing clashing solvent molecule HOH 448   A 2
Warning: For residue LYS 142   A 1   removing clashing solvent molecule HOH 735   A 2
Warning: For residue ARG 205   A 1   removing clashing solvent molecule HOH 423   A 2
Warning: For residue ARG 205   A 1   removing clashing solvent molecule HOH 703   A 2
Warning: For residue LYS 256   A 1   removing clashing solvent molecule HOH 604   A 2
Found gap between ALA 157   A 1   and VAL 170   A 1  , with sequence GFPLNQSEVLAS
Found gap between GLU 310   A 1   and THR 314   A 1  , with sequence DVA
Opened database spruce_bace.loop_db
LoopDatabase Info:
    276 loops from RSCB last synced on 03-19-2020, were added to LoopTemplateDatabase on 03-19-2020 using Spruce Toolkit 1.0.0.a
    The loop database was built with a max loop length of 22, a termini crop length of 2, and excluding regular secondary structures
Opened database spruce_bace.loop_db
LoopDatabase Info:
    276 loops from RSCB last synced on 03-19-2020, were added to LoopTemplateDatabase on 03-19-2020 using Spruce Toolkit 1.0.0.a
    The loop database was built with a max loop length of 22, a termini crop length of 2, and excluding regular secondary structures
DU: BETA-SECRETASE 1(A)altA > 5HA(A-999), Iridium Category: HT, LaD: 1.00, ASaD: 1.00, DPI: 0.06, POL: false, POAS: false, AltConfs: false, PackRes: false, Excp: false, IrrRFree: false, PossCov: false
DU: BETA-SECRETASE 1(A)altB > 5HA(A-999), Iridium Category: HT, LaD: 1.00, ASaD: 1.00, DPI: 0.06, POL: false, POAS: false, AltConfs: false, PackRes: false, Excp: false, IrrRFree: false, PossCov: false
Skipping redundant DU with alts outside the site of interest, renaming existing to collapse alts
Discarding redundant alt DU with title BETA-SECRETASE 1(A)altB > 5HA(A-999)

See also