SPRUCE - Protein Preparation from PDB Codes

This floe uses Spruce to prepare biomolecules for downstream modeling applications in Orion, such as docking, posit, gameplan, or short-trajectory MD.

The required input for this floe is PDB codes. The PDB (or MMCIF if the PDB is not available), as well as the MTZ file, containing the electron density maps, (if available) will be downloaded from the RCSB.

An additional input dataset from a previous Classic Spruce: Prep run may be used as a reference for biological unit extraction and superposition to a common reference frame.

If a ligand cannot be detected during the run, consider specifying the ligand residue name, increasing the size of the input variable “max_residues”, given as an optional input to this floe. Or if this is a known apo structure, you can provide the definition of a residue in the binding site.

You can read more about Spruce in the toolkit documentation.

Extra Required Parameters

  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Include density based depictions (boolean) : Include density based depictions.
    Default: True
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Output Dataset (dataset_out) : Output dataset to write to
    Default: Failed_Spruce_prep_dataset
  • Output Dataset (dataset_out) : Output dataset to write to
    Default: Spruce_prep_dataset
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Components to be part of the molecule (string) : Components to make part of the molecule.If set to ‘undefined’, will not be included in output
    Default: [‘protein’]
    Choices: protein, nucleic, ligand, solvent, metals, counter_ions, lipids, packing_residues, sugars, undefined, cofactors, excipients, polymers, post_translational, other_proteins, other_nucleics, other_ligands, other_cofactors
  • Discard liganded design units (boolean) : Option to discard liganded design units.
    Default: True
  • Generate surface (boolean) : Option to generate surface for pockets.
    Default: True
  • Local burial factor (decimal) : Option to set local burial factor.
    Default: 1.4
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Max surface area (decimal) : Option to set maximum surface area for pocket finding.
    Default: 3000.0
  • Min surface area (decimal) : Option to set minimum surface area for pocket finding.
    Default: 150.0
  • Add interaction hints (boolean) : Option add interactions to the design units.
    Default: True
  • Add style (boolean) : Option add style to the design units.
    Default: True
  • Allow cap residue truncation (boolean) : Option to allow terminal residue to converted to cap, if cap will otherwise clash.
    Default: True
  • Alternate location handling method (string) : Option to pick method of handling alternate locations.
    Default: Default
    Choices: Primary, Enumerate, Default
  • Loop backbone clash threshold (decimal) : Loops from the database where more than the threshold fraction of the backbone atoms clash, are rejected.
    Default: 0.25
  • Build C-terminal caps (boolean) : Option to cap broken C-termini in protein chains.
    Default: True
  • Option to build disulfide bridges (boolean) : Allow the loop builder to build disulfide bridges during loop modeling (if possible).
    Default: True
  • Build missing loops (boolean) : Option to build missing loops (if information is available to do so)
    Default: True
  • Build N-terminal caps (boolean) : Option to cap broken N-termini in protein chains.
    Default: True
  • Build partial sidechains (boolean) : Option to build missing or partial protein sidechains.
    Default: True
  • Build missing tails (boolean) : Option to build missing tails (if information is available to do so)
    Default: False
  • Loop builder include crystal packing (boolean) : Include packing residues when building loops.
    Default: False
  • Assign charges and radii (boolean) : Option to assign partial charge and radii.
    Default: True
  • Collapse non-site alts (boolean) : Option to deduplicate structures with different alts, if the alt locations are not near the binding site.
    Default: True
  • Loop crop length (integer) : Anchor residues on the protein to crop back for a better fit, results in longer loops being built.
    Default: 1
  • Delete clashing solvent (boolean) : Option to allow build steps to remove clashing solvent.
    Default: True
  • Duplicate removal (boolean) : Option to deduplicate identical structures resulting from symmetry operation.
    Default: True
  • Enumerate co-factor sites (boolean) : Option to generate individual design units based on the recognized co-factors.
    Default: False
  • Enumerate pockets (boolean) : Option to enumerate pockets when no ligand is found
    Default: False
  • Fix backbone atom issues (boolean) : Option to fix backbone atom issues in protein chains.
    Default: True
  • Generate Tautomers (boolean) : Option to generate and use tautomers in the hydrogen network optimization.
    Default: True
  • Hetgroup cluster distance (decimal) : Distance between heterogens used to determine optimization clusters for protonation.
    Default: 3.5
  • Include SA term (boolean) : Include solvent accessible surface area term when ranking the loops.
    Default: True
  • Include solvation (boolean) : Include simple solvation model when building loops.
    Default: True
  • Include Binding Site Grids (boolean) : Include electron density and difference density maps around the binding site
    Default: True
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Loop clash threshold (decimal) : Loops from the database where more than the threshold fraction of the loops atoms in addition to the bacbkone clashing ones clash, are rejected.
    Default: 0.2
  • Loop anchor atom distance buffer (decimal) : Fuzzy matches in the loop database has to have distance between anchor atoms correct, +/- buffer distance.
    Default: 1.0
  • Make packing residues (boolean) : Generate packing residues from an asymmetric unit.
    Default: True
  • Maximum atoms in biological unit (integer) : Option to limit the size of BUs processed based on number of atoms.
    Default: 50000
  • Maximum parts in biological unit (integer) : Option to limit the size of BUs processed based on number of parts (chains).
    Default: 24
  • Number of loops to minimize and evaluate (integer) : Maximum number of loops to connect and minimize.
    Default: 5
  • Max atoms for a ligand (integer) : Maximum number of atoms in a molecule to be detected as a ligand. For peptides we recommend 200
    Default: 100
  • Max residues for a ligand (integer) : Maximum number of residues in a molecule to be detected as a ligand. For peptides we recommend 20
    Default: 5
  • Max system atoms (integer) : Maximum number of atoms in the system.
    Default: 50000
  • Minimum alignment score for BU extraction (integer) : Option to specify minimum sequence alignment score for biounit extraction.
    Default: 200
  • Min atoms for a ligand (integer) : Minimum number of atoms in a molecule to be detected as a ligand. For fragments we recommend setting to 5
    Default: 8
  • Optimize Experimental Protons (boolean) : Option to optimize hydrogens assigned in the experiment.
    Default: False
  • Loop optimization shell (decimal) : Include atoms within this distance in the loop optimization, larger distance results in slower optimizations.
    Default: 15.0
  • Opt stage 1 step/residue multiplier (integer) : Number of steps per number of residues in the loop for the first stage optimizer.
    Default: 5
  • Opt stage 2 step/residue multiplier (integer) : Number of steps per number of residues in the loop for the second stage optimizer.
    Default: 10
  • Loop optimization tolerance (decimal) : Tolerance for the loop optimization, smaller numbers result in slower optimizations.
    Default: 0.001
  • Output BioDesignUnits (boolean) : Option to write intermediate work produce bio design units
    Default: False
  • Prefer author BIOMT records (boolean) : Option where the author BIOMT record is prefered over the software generated one.
    Default: True
  • Protonate (boolean) : Option to add and optimize protons in the system.
    Default: True
  • Restrict DUs to ref site removal (boolean) : Option to not generate design units with sites not matching the reference (if one is provided).
    Default: True
  • Rotamer Coverage % (decimal) : Coverage of the rotamers returned from the library in percent.
    Default: 100.0
  • Rotamer Library (string) : Rotamer library to use for side-chain building.
    Default: Richardson2016
    Choices: Dunbrack, Richardson, Richardson2016
  • Size used to define binding site (decimal) : Distance used to determine the size of the site.
    Default: 5.0
  • Strict Ligand (boolean) : Option to only emit design units with ligands that match the ligand names (if any are provided)
    Default: True
  • Enforce proline positions in loop templates (boolean) : Fuzzy matches in the loop database have to have proline in exact locations of sequence.
    Default: True
  • Strict protonation mode (boolean) : Option to fail prep if protons could not be added.
    Default: False
  • Superpose design units (boolean) : Option to superpose DUs (if multiple), first onto the reference structure (if provided).
    Default: True
  • Superposition method (string) : Superposition method.
    Default: SiteSequence
    Choices: GlobalSequence, SiteSequence, DDMatrix, SSE, SiteHopper
  • Target classication (string) : Option to pick whether target is protein or nucleic acid component.
    Default: Protein
    Choices: Protein, Nucleic
  • Number to transform (integer) : Number of loops to allow through the sidechain clash checker. No matter this number, will process all with an identical sequence to target.
    Default: 25
  • Codes delimiter (string) : Delimiter to separate multiple PDB codes.
    Default: ,
  • PDB code(s) to download (string) : Separate multiple codes with a (default) comma delimiter, e.g. ‘1ABC, DEF2, G3HI’.
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Download timeout (integer) : Timeout when attempting to download files for each PDB code.
    Default: 600