Spruce Prep

Spruce-prepped OEDesignUnits is generated from input PDB/MTZ files on the input oechem.OERecord.

Main Parameters

Parameter Name

Add interaction hints

Add style

Allow cap residue truncation

Alternate location handling method

Loop backbone clash threshold

Build C-terminal caps

Option to build disulfide bridges

Build missing loops

Build N-terminal caps

Build partial sidechains

Build missing tails

Loop builder include crystal packing

Assign charges and radii

Collapse non-site alts

Loop crop length

Delete clashing solvent

Duplicate removal

Enumerate co-factor sites

Enumerate pockets

Extended Log Field

Fix backbone atom issues

Generate Tautomers

Hetgroup cluster distance

Include SA term

Include solvation

Include Binding Site Grids

Ligand Type

Log Field

Loop clash threshold

Loop anchor atom distance buffer

Make packing residues

Maximum atoms in biological unit

Maximum parts in biological unit

Number of loops to minimize and evaluate

Max system atoms

Minimum alignment score for BU extraction

Optimize Experimental Protons

Loop optimization shell

Opt stage 1 step/residue multiplier

Opt stage 2 step/residue multiplier

Loop optimization tolerance

Output biological unit

Prefer author BIOMT records

Protonate

Restrict DUs to ref site removal

Rotamer Coverage %

Rotamer Library

Size used to define binding site

Strict Ligand

Enforce proline positions in loop templates

Strict protonation mode

Superpose design units

Superposition method

Target classification

Number to transform


Calculation Parameters

  • Add interaction hints (add_interactions) type: boolean: Option add interactions to the design units.
    Default: True
  • Add style (add_style) type: boolean: Option add style to the design units.
    Default: True
  • Allow cap residue truncation (allow_truncate) type: boolean: Option to allow terminal residue to converted to cap, if cap will otherwise clash.
    Default: True
  • Alternate location handling method (altloc) type: string: Option to pick method of handling alternate locations.
    Default: Default
    Choices: Primary, Enumerate, Default
  • Loop backbone clash threshold (bb_clash_threshold) type: decimal: Loops from the database where more than the threshold fraction of the backbone atoms clash, are rejected.
    Default: 0.25
  • Build C-terminal caps (build_cterm_caps) type: boolean: Option to cap broken C-termini in protein chains.
    Default: True
  • Option to build disulfide bridges (build_disulfidebridges) type: boolean: Allow the loop builder to build disulfide bridges during loop modeling (if possible).
    Default: True
  • Build missing loops (build_loops) type: boolean: Option to build missing loops (if information is available to do so)
    Default: True
  • Build N-terminal caps (build_nterm_caps) type: boolean: Option to cap broken N-termini in protein chains.
    Default: True
  • Build partial sidechains (build_sidechains) type: boolean: Option to build missing or partial protein sidechains.
    Default: True
  • Build missing tails (build_tails) type: boolean: Option to build missing tails (if information is available to do so)
    Default: False
  • Loop builder include crystal packing (build_with_crystalpacking) type: boolean: Include packing residues when building loops.
    Default: False
  • Assign charges and radii (charge_radii) type: boolean: Option to assign partial charge and radii.
    Default: True
  • Add Cofactor code(s) (cofactor_codes) type: string: Add uncommon, or custom, cofactor 3-letter codes.
  • Collapse non-site alts (collapse_nonsite_alts) type: boolean: Option to deduplicate structures with different alts, if the alt locations are not near the binding site.
    Default: True
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Loop crop length (crop_length) type: integer: Anchor residues on the protein to crop back for a better fit, results in longer loops being built.
    Default: 1
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Delete clashing solvent (delete_clashing_solvent) type: boolean: Option to allow build steps to remove clashing solvent.
    Default: True
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Duplicate removal (duplicate_removal) type: boolean: Option to deduplicate identical structures resulting from symmetry operation.
    Default: True
  • Enumerate co-factor sites (enum_cofactors_sites) type: boolean: Option to generate individual design units based on the recognized co-factors.
    Default: False
  • Enumerate pockets (enum_pocket) type: boolean: Option to enumerate pockets when no ligand is found
    Default: False
  • Add Excipient code(s) (excipient_codes) type: string: Add uncommon, or custom, excipient 3-letter codes.
  • Fix backbone atom issues (fix_backbone) type: boolean: Option to fix backbone atom issues in protein chains.
    Default: True
  • Generate Tautomers (generate_tautomers) type: boolean: Option to generate and use tautomers in the hydrogen network optimization.
    Default: True
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Hetgroup cluster distance (het_group_nbr_dist) type: decimal: Distance between heterogens used to determine optimization clusters for protonation.
    Default: 3.5
  • Include SA term (incl_SA_term) type: boolean: Include solvent accessible surface area term when ranking the loops.
    Default: True
  • Include solvation (incl_solvation) type: boolean: Include simple solvation model when building loops.
    Default: True
  • Include Binding Site Grids (include_bsite_edens_grids) type: boolean: Include electron density and difference density maps around the binding site
    Default: True
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Ligand Type (lig_type) type: string: The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5; Peptide: min_atoms=8, max_atoms=200, max_residues=2; Macrocycle: min_atoms=8, max_atoms=250, max_residues=20; Fragment: min_atoms=2, max_atoms=35, max_residues=5
    Default: Small Molecule
    Choices: Small Molecule, Peptide, Macrocycle, Fragment
  • Add Ligand Smiles (ligand_metadata) type: string: Add ligand smiles and 3-letter codes, e.g. ‘c1ccccc1 BNZ’.
  • Ligand name(s) (ligand_names) type: string: format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.
  • Add Lipid codes(s) (lipid_codes) type: string: Add uncommon, or custom, lipid 3-letter codes
  • Loop clash threshold (loop_clash_threshold) type: decimal: Loops from the database where more than the threshold fraction of the loops atoms in addition to the bacbkone clashing ones clash, are rejected.
    Default: 0.2
  • Loop anchor atom distance buffer (loop_distance_buffer) type: decimal: Fuzzy matches in the loop database has to have distance between anchor atoms correct, +/- buffer distance.
    Default: 1.0
  • A template loop database file (loop_input_file) type: file_in: (Optional) A template loop database file, if not specified built-in database will be used
  • Make packing residues (make_pack_res) type: boolean: Generate packing residues from an asymmetric unit.
    Default: True
  • Maximum atoms in biological unit (max_bu_atoms) type: integer: Option to limit the size of BUs processed based on number of atoms.
    Default: 50000
  • Maximum parts in biological unit (max_bu_parts) type: integer: Option to limit the size of BUs processed based on number of parts (chains).
    Default: 24
  • Number of loops to minimize and evaluate (max_eval_loops) type: integer: Maximum number of loops to connect and minimize.
    Default: 5
  • Max atoms for a ligand (max_lig_atoms) type: integer: Override for the maximum number of heavy atoms in a molecule to be detected as a ligand.
  • Max residues for a ligand (max_lig_residues) type: integer: Override for the maximum number of residues in a molecule to be detected as a ligand.
  • Max system atoms (max_system_atoms) type: integer: Maximum number of atoms in the system.
    Default: 50000
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Minimum alignment score for BU extraction (min_align_score) type: integer: Option to specify minimum sequence alignment score for biounit extraction.
    Default: 200
  • Min atoms for a ligand (min_lig_atoms) type: integer: Override for the minimum number of heavy atoms in a molecule to be detected as a ligand.
  • Optimize Experimental Protons (opt_expt_protons) type: boolean: Option to optimize hydrogens assigned in the experiment.
    Default: False
  • Loop optimization shell (opt_shell) type: decimal: Include atoms within this distance in the loop optimization, larger distance results in slower optimizations.
    Default: 15.0
  • Opt stage 1 step/residue multiplier (opt_stage1_iter_multiplier) type: integer: Number of steps per number of residues in the loop for the first stage optimizer.
    Default: 5
  • Opt stage 2 step/residue multiplier (opt_stage2_iter_multiplier) type: integer: Number of steps per number of residues in the loop for the second stage optimizer.
    Default: 10
  • Loop optimization tolerance (opt_tolerance) type: decimal: Tolerance for the loop optimization, smaller numbers result in slower optimizations.
    Default: 0.001
  • Output biological unit (output_bio_designunits) type: boolean: Option to write biological design units. These are intermediaries and should not be used forother applications.
    Default: False
  • Prefer author BIOMT records (pref_author_record) type: boolean: Option where the author BIOMT record is prefered over the software generated one.
    Default: True
  • Protonate (protonate) type: boolean: Option to add and optimize protons in the system.
    Default: True
  • Restrict DUs to ref site removal (restrict_to_refsite) type: boolean: Option to not generate design units with sites not matching the reference (if one is provided).
    Default: True
  • Rotamer Coverage % (rot_coverage) type: decimal: Coverage of the rotamers returned from the library in percent.
    Default: 100.0
  • Rotamer Library (rot_lib) type: string: Rotamer library to use for side-chain building.
    Default: Richardson2016
    Choices: Dunbrack, Richardson, Richardson2016
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Site residue entry (site_residue) type: string: Single site residue specification for APO structures. Format ‘name:num:insert:chain[:fragno:altloc]’, e.g. ‘ALA:325: :A’ (note the blank/whitespace insert code). The regex ‘.*’ notation can be used as a wildcard.
  • Size used to define binding site (site_size) type: decimal: Distance used to determine the size of the site.
    Default: 5.0
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Strict Ligand (strict_ligand) type: boolean: Option to only emit design units with ligands that match the ligand names (if any are provided)
    Default: True
  • Enforce proline positions in loop templates (strict_proline_match) type: boolean: Fuzzy matches in the loop database have to have proline in exact locations of sequence.
    Default: True
  • Strict protonation mode (strict_protonate) type: boolean: Option to fail prep if protons could not be added.
    Default: True
  • Superpose design units (superpose) type: boolean: Option to superpose DUs (if multiple), first onto the reference structure (if provided).
    Default: True
  • Superposition method (superpose_method) type: string: Superposition method.
    Default: SiteSequence
    Choices: GlobalSequence, SiteSequence, DDMatrix, SSE, SiteHopper
  • Target classification (target) type: string: Option to pick whether target is protein or nucleic acid component.
    Default: Protein
    Choices: Protein, Nucleic
  • Number to transform (transform_threshold) type: integer: Number of loops to allow through the sidechain clash checker. No matter this number, will process all with an identical sequence to target.
    Default: 25
  • output verbosity (verbosity) type: string: verbose level
    Default: warning
    Choices: info, warning, error, debug, ddebug

Field parameters

  • Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log field
    Default: Extended Log Field
  • Log Field (log_field) type: Field Type: String: The field to store messages to floe report
    Default: Log Field

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Spruce Prep

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.
    Default: 1 , Min: 1, Max: 65535
  • Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work item
    Default: 10 , Min: 1, Max: 100
  • Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this Cube
    Default: 1000 , Min: 1
  • Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this Cube
    Default: 0