Spruce is used to process PDB files containing the results of X-ray crystallography into molecule files usable for molecular modeling. Since these files are actually experimental output, some processing is required before use.
The design unit is an object that contains the extracted and prepared parts of a single BU, ready for modeling. The parts include:
A protein can be structurally superimposed on to a reference protein structure using the OESpruce TK. Proteins can be superimposed with either atomic coordinates in the OEStructuralSuperposition class, or with secondary structure elements using the OESecondaryStructureSuperposition class.
The OEStructuralSuperposition class can superimpose proteins using any of the following four methods:
The OESecondaryStructureSuperposition class can superimpose proteins using the following method:
All structural superposition methods in the OEStructuralSuperposition class have a corresponding score from the sequence alignment that was used to find the matching atoms of both proteins. This score comes from the output of the OESequenceAlignment class, where a larger score indicates a better sequence alignment, and scores below a small threshold (around 200) should be considered a bad alignment.
All structural superposition methods in the OEStructuralSuperposition class have a corresponding RMSD value for superposition that can be loosely associated with the quality of the superposition. The OESecondaryStructureSuperposition class does not have an RMSD value, but instead uses the Tanimoto score from the underlying shape overlap calculation.
Reading a PDB file correctly for use in subsequent modeling tasks can be challenging. To correctly read a PDB, one must be aware that PDB header information as well as information about alternate locatation codes within the PDB file will be lost unless a specific combination of PDB-centric OEIFlavor‘s are used. Furthermore, the protein itself must be processed by OEAltLocationFactory in order to create a molecule with all alternate location atoms retained. With that in mind, we recommend reading PDB files for use in OESpruce TK using the following pattern as shown in ReadProteinFromPDB below:
def ReadProteinFromPDB(pdb_file, mol): ifs = oechem.oemolistream() ifs.SetFlavor(oechem.OEFormat_PDB, oechem.OEIFlavor_PDB_Default | oechem.OEIFlavor_PDB_DATA | oechem.OEIFlavor_PDB_ALTLOC) if not ifs.open(pdb_file): oechem.OEThrow.Fatal("Unable to open %s for reading." % pdb_file) temp_mol = oechem.OEGraphMol() if not oechem.OEReadMolecule(ifs, temp_mol): oechem.OEThrow.Fatal("Unable to read molecule from %s." % pdb_file) ifs.close() fact = oechem.OEAltLocationFactory(temp_mol) mol.Clear() fact.MakePrimaryAltMol(mol) return (mol)