SPRUCE - Protein Preparation from PDB Files

This floe uses Spruce to prepare biomolecules for downstream modeling applications in Orion, such as docking, posit, gameplan, or short-trajectory MD.

The required input for this floe are PDB or MMCIF files accompanied by MTZ files with electron density maps (if available).

An additional input dataset from a previous Classic Spruce: Prep run may be used as a reference for biological unit extraction and superposition to a common reference frame.

If a ligand cannot be detected during the run, consider specifying the ligand residue name, increasing the size of the input variable “max_residues”, given as an optional input to this floe. Or if this is a known apo structure, you can provide the definition of a residue in the binding site.

You can read more about Spruce in the toolkit documentation.

Extra Required Parameters

  • Add interaction hints (boolean) : Option add interactions to the design units.
    Default: True
  • Add style (boolean) : Option add style to the design units.
    Default: True
  • Allow cap residue truncation (boolean) : Option to allow terminal residue to converted to cap, if cap will otherwise clash.
    Default: True
  • Alternate location handling method (string) : Option to pick method of handling alternate locations.
    Default: Default
    Choices: Primary, Enumerate, Default
  • Loop backbone clash threshold (decimal) : Loops from the database where more than the threshold fraction of the backbone atoms clash, are rejected.
    Default: 0.25
  • Build C-terminal caps (boolean) : Option to cap broken C-termini in protein chains.
    Default: True
  • Option to build disulfide bridges (boolean) : Allow the loop builder to build disulfide bridges during loop modeling (if possible).
    Default: True
  • Build missing loops (boolean) : Option to build missing loops (if information is available to do so)
    Default: True
  • Build N-terminal caps (boolean) : Option to cap broken N-termini in protein chains.
    Default: True
  • Build partial sidechains (boolean) : Option to build missing or partial protein sidechains.
    Default: True
  • Build missing tails (boolean) : Option to build missing tails (if information is available to do so)
    Default: False
  • Loop builder include crystal packing (boolean) : Include packing residues when building loops.
    Default: False
  • Assign charges and radii (boolean) : Option to assign partial charge and radii.
    Default: True
  • Collapse non-site alts (boolean) : Option to deduplicate structures with different alts, if the alt locations are not near the binding site.
    Default: True
  • Loop crop length (integer) : Anchor residues on the protein to crop back for a better fit, results in longer loops being built.
    Default: 1
  • Delete clashing solvent (boolean) : Option to allow build steps to remove clashing solvent.
    Default: True
  • Duplicate removal (boolean) : Option to deduplicate identical structures resulting from symmetry operation.
    Default: True
  • Enumerate co-factor sites (boolean) : Option to generate individual design units based on the recognized co-factors.
    Default: False
  • Enumerate pockets (boolean) : Option to enumerate pockets when no ligand is found
    Default: False
  • Fix backbone atom issues (boolean) : Option to fix backbone atom issues in protein chains.
    Default: True
  • Generate Tautomers (boolean) : Option to generate and use tautomers in the hydrogen network optimization.
    Default: True
  • Hetgroup cluster distance (decimal) : Distance between heterogens used to determine optimization clusters for protonation.
    Default: 3.5
  • Include SA term (boolean) : Include solvent accessible surface area term when ranking the loops.
    Default: True
  • Include solvation (boolean) : Include simple solvation model when building loops.
    Default: True
  • Include Binding Site Grids (boolean) : Include electron density and difference density maps around the binding site
    Default: True
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Loop clash threshold (decimal) : Loops from the database where more than the threshold fraction of the loops atoms in addition to the bacbkone clashing ones clash, are rejected.
    Default: 0.2
  • Loop anchor atom distance buffer (decimal) : Fuzzy matches in the loop database has to have distance between anchor atoms correct, +/- buffer distance.
    Default: 1.0
  • Make packing residues (boolean) : Generate packing residues from an asymmetric unit.
    Default: True
  • Maximum atoms in biological unit (integer) : Option to limit the size of BUs processed based on number of atoms.
    Default: 50000
  • Maximum parts in biological unit (integer) : Option to limit the size of BUs processed based on number of parts (chains).
    Default: 24
  • Number of loops to minimize and evaluate (integer) : Maximum number of loops to connect and minimize.
    Default: 5
  • Max atoms for a ligand (integer) : Maximum number of atoms in a molecule to be detected as a ligand. For peptides we recommend 200
    Default: 100
  • Max residues for a ligand (integer) : Maximum number of residues in a molecule to be detected as a ligand. For peptides we recommend 20
    Default: 5
  • Max system atoms (integer) : Maximum number of atoms in the system.
    Default: 50000
  • Minimum alignment score for BU extraction (integer) : Option to specify minimum sequence alignment score for biounit extraction.
    Default: 200
  • Min atoms for a ligand (integer) : Minimum number of atoms in a molecule to be detected as a ligand. For fragments we recommend setting to 5
    Default: 8
  • Optimize Experimental Protons (boolean) : Option to optimize hydrogens assigned in the experiment.
    Default: False
  • Loop optimization shell (decimal) : Include atoms within this distance in the loop optimization, larger distance results in slower optimizations.
    Default: 15.0
  • Opt stage 1 step/residue multiplier (integer) : Number of steps per number of residues in the loop for the first stage optimizer.
    Default: 5
  • Opt stage 2 step/residue multiplier (integer) : Number of steps per number of residues in the loop for the second stage optimizer.
    Default: 10
  • Loop optimization tolerance (decimal) : Tolerance for the loop optimization, smaller numbers result in slower optimizations.
    Default: 0.001
  • Output BioDesignUnits (boolean) : Option to write intermediate work produce bio design units
    Default: False
  • Prefer author BIOMT records (boolean) : Option where the author BIOMT record is prefered over the software generated one.
    Default: True
  • Protonate (boolean) : Option to add and optimize protons in the system.
    Default: True
  • Restrict DUs to ref site removal (boolean) : Option to not generate design units with sites not matching the reference (if one is provided).
    Default: True
  • Rotamer Coverage % (decimal) : Coverage of the rotamers returned from the library in percent.
    Default: 100.0
  • Rotamer Library (string) : Rotamer library to use for side-chain building.
    Default: Richardson2016
    Choices: Dunbrack, Richardson, Richardson2016
  • Size used to define binding site (decimal) : Distance used to determine the size of the site.
    Default: 5.0
  • Strict Ligand (boolean) : Option to only emit design units with ligands that match the ligand names (if any are provided)
    Default: True
  • Enforce proline positions in loop templates (boolean) : Fuzzy matches in the loop database have to have proline in exact locations of sequence.
    Default: True
  • Strict protonation mode (boolean) : Option to fail prep if protons could not be added.
    Default: False
  • Superpose design units (boolean) : Option to superpose DUs (if multiple), first onto the reference structure (if provided).
    Default: True
  • Superposition method (string) : Superposition method.
    Default: SiteSequence
    Choices: GlobalSequence, SiteSequence, DDMatrix, SSE, SiteHopper
  • Target classication (string) : Option to pick whether target is protein or nucleic acid component.
    Default: Protein
    Choices: Protein, Nucleic
  • Number to transform (integer) : Number of loops to allow through the sidechain clash checker. No matter this number, will process all with an identical sequence to target.
    Default: 25
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Input structure (PDB/MMCIF) files (file_in) :
  • Components to be part of the molecule (string) : Components to make part of the molecule.If set to ‘undefined’, will not be included in output
    Default: [‘protein’]
    Choices: protein, nucleic, ligand, solvent, metals, counter_ions, lipids, packing_residues, sugars, undefined, cofactors, excipients, polymers, post_translational, other_proteins, other_nucleics, other_ligands, other_cofactors
  • Discard liganded design units (boolean) : Option to discard liganded design units.
    Default: True
  • Generate surface (boolean) : Option to generate surface for pockets.
    Default: True
  • Local burial factor (decimal) : Option to set local burial factor.
    Default: 1.4
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Max surface area (decimal) : Option to set maximum surface area for pocket finding.
    Default: 3000.0
  • Min surface area (decimal) : Option to set minimum surface area for pocket finding.
    Default: 150.0
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Include density based depictions (boolean) : Include density based depictions.
    Default: True
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Output Dataset (dataset_out) : Output dataset to write to
    Default: Spruce_prep_dataset
  • Output Dataset (dataset_out) : Output dataset to write to
    Default: Failed_Spruce_prep_dataset
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field