Spruce Prep

Spruce-prepped OEDesignUnits is generated from input PDB/MTZ files on the input oechem.OERecord.

Calculation Parameters

  • Add Interaction Hints (add_interactions) type: boolean: Option add interactions to the design units.
    Default: True
  • Add Style (add_style) type: boolean: Option add style to the design units.
    Default: True
  • Allow Cap Residue Truncation (allow_truncate) type: boolean: Option to allow terminal residue to converted to cap, if cap will otherwise clash.
    Default: True
  • Alternate Location Handling Method (altloc) type: string: Option to pick method of handling alternate locations.
    Default: Default
    Choices: Primary, Enumerate, Default
  • Loop Backbone Clash Threshold (bb_clash_threshold) type: decimal: Loops from the database where more than the threshold fraction of the backbone atoms clash, are rejected.
    Default: 0.25
  • Build C-Terminal Caps (build_cterm_caps) type: boolean: Option to cap broken C-termini in protein chains.
    Default: True
  • Option to Build Disulfide Bridges (build_disulfidebridges) type: boolean: Allow the loop builder to build disulfide bridges during loop modeling (if possible).
    Default: True
  • Build Missing Loops (build_loops) type: boolean: Option to build missing loops (if information is available to do so)
    Default: True
  • Build N-Terminal Caps (build_nterm_caps) type: boolean: Option to cap broken N-termini in protein chains.
    Default: True
  • Build Partial Sidechains (build_sidechains) type: boolean: Option to build missing or partial protein sidechains.
    Default: True
  • Build Missing Tails (build_tails) type: boolean: Option to build missing tails (if information is available to do so)
    Default: False
  • Loop Builder Include Crystal Packing (build_with_crystalpacking) type: boolean: Include packing residues when building loops.
    Default: False
  • Assign Charges and Radii (charge_radii) type: boolean: Option to assign partial charge and radii.
    Default: True
  • Add Cofactor Code(s) (cofactor_codes) type: string: Add uncommon, or custom, cofactor 3-letter codes.
  • Collapse Non-Site Alternates (collapse_nonsite_alts) type: boolean: Option to deduplicate structures with different alts, if the alt locations are not near the binding site.
    Default: True
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Loop Crop Length (crop_length) type: integer: Anchor residues on the protein to crop back for a better fit, results in longer loops being built.
    Default: 1
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Delete Clashing Solvent (delete_clashing_solvent) type: boolean: Option to allow build steps to remove clashing solvent.
    Default: True
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Duplicate Removal (duplicate_removal) type: boolean: Option to deduplicate identical structures resulting from symmetry operation.
    Default: True
  • Enumerate Cofactor Sites (enum_cofactors_sites) type: boolean: Option to generate individual design units based on the recognized cofactors.
    Default: False
  • Enumerate pockets (enum_pocket) type: boolean: Option to enumerate pockets when no ligand is found
    Default: True
  • Add Excipient Code(s) (excipient_codes) type: string: Add uncommon, or custom, excipient 3-letter codes.
  • Fix Backbone Atom Issues (fix_backbone) type: boolean: Option to fix backbone atom issues in protein chains.
    Default: True
  • Generate Tautomers (generate_tautomers) type: boolean: Option to generate and use tautomers in the hydrogen network optimization.
    Default: True
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Hetgroup Cluster Distance (het_group_nbr_dist) type: decimal: Distance between heterogens used to determine optimization clusters for protonation.
    Default: 3.5
  • Include Solvent Accessible Surface Area Term (incl_SA_term) type: boolean: Include solvent accessible surface area term when ranking the loops.
    Default: True
  • Include Solvation (incl_solvation) type: boolean: Include simple solvation model when building loops.
    Default: True
  • Include Binding Site Grids (include_bsite_edens_grids) type: boolean: Include electron density and difference density maps around the binding site
    Default: True
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Ligand Type (lig_type) type: string: The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5; Peptide: min_atoms=8, max_atoms=200, max_residues=2; Macrocycle: min_atoms=8, max_atoms=250, max_residues=20; Fragment: min_atoms=2, max_atoms=35, max_residues=5
    Default: Small Molecule
    Choices: Small Molecule, Peptide, Macrocycle, Fragment
  • Add Ligand Smiles (ligand_metadata) type: string: Add ligand smiles and 3-letter codes, e.g. ‘c1ccccc1 BNZ’.
  • Ligand Name(s) (ligand_names) type: string: format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.
  • Add Lipid Codes(s) (lipid_codes) type: string: Add uncommon, or custom, lipid 3-letter codes
  • Loop Clash Threshold (loop_clash_threshold) type: decimal: Loops from the database where more than the threshold fraction of the loops atoms in addition to the bacbkone clashing ones clash, are rejected.
    Default: 0.2
  • Loop Anchor Atom Eistance Buffer (loop_distance_buffer) type: decimal: Fuzzy matches in the loop database has to have distance between anchor atoms correct, +/- buffer distance.
    Default: 1.0
  • Loop Database File (loop_input_file) type: file_in: (Optional) A template loop database file, if not specified built-in database will be used
  • Make Packing Residues (make_pack_res) type: boolean: Generate packing residues from an asymmetric unit.
    Default: True
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Maximum Atoms in Biological Unit (max_bu_atoms) type: integer: Option to limit the size of BUs processed based on number of atoms.
    Default: 50000
  • Maximum Parts in Biological Unit (max_bu_parts) type: integer: Option to limit the size of BUs processed based on number of parts (chains).
    Default: 24
  • Number of Loops to Minimize and Evaluate (max_eval_loops) type: integer: Maximum number of loops to connect and minimize.
    Default: 5
  • Max Atoms for a Ligand (max_lig_atoms) type: integer: Override for the maximum number of heavy atoms in a molecule to be detected as a ligand.
  • Max Residues for a Ligand (max_lig_residues) type: integer: Override for the maximum number of residues in a molecule to be detected as a ligand.
  • Max System Atoms (max_system_atoms) type: integer: Maximum number of atoms in the system.
    Default: 50000
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Minimum Alignment Score for Biological Unit Extraction (min_align_score) type: integer: Option to specify minimum sequence alignment score for biological unit extraction.
    Default: 200
  • Min Atoms for a Ligand (min_lig_atoms) type: integer: Override for the minimum number of heavy atoms in a molecule to be detected as a ligand.
  • Optimize Experimental Protons (opt_expt_protons) type: boolean: Option to optimize hydrogens assigned in the experiment.
    Default: False
  • Loop Optimization Shell (opt_shell) type: decimal: Include atoms within this distance in the loop optimization, larger distance results in slower optimizations.
    Default: 15.0
  • Optimize Stage 1 Step/Residue Multiplier (opt_stage1_iter_multiplier) type: integer: Number of steps per number of residues in the loop for the first stage optimizer.
    Default: 5
  • Optimize Stage 2 Step/Residue Multiplier (opt_stage2_iter_multiplier) type: integer: Number of steps per number of residues in the loop for the second stage optimizer.
    Default: 10
  • Loop Optimization Tolerance (opt_tolerance) type: decimal: Tolerance for the loop optimization, smaller numbers result in slower optimizations.
    Default: 0.001
  • Output biological unit (output_bio_designunits) type: boolean: Option to write all biological design units. These are intermediaries and should not be used forother applications.
    Default: False
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Prefer Author BIOMT Records (pref_author_record) type: boolean: Option where the author BIOMT record is prefered over the software generated one.
    Default: True
  • Protonate (protonate) type: boolean: Option to add and optimize protons in the system.
    Default: True
  • Restrict DUs to Reference Site Removal (restrict_to_refsite) type: boolean: Option to not generate design units with sites not matching the reference (if one is provided).
    Default: True
  • Rotamer Coverage % (rot_coverage) type: decimal: Coverage of the rotamers returned from the library in percent.
    Default: 100.0
  • Rotamer Library (rot_lib) type: string: Rotamer library to use for side-chain building.
    Default: Richardson2016
    Choices: Dunbrack, Richardson, Richardson2016
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Site Residue Entry (site_residue) type: string: Single site residue specification for APO structures. Format ‘name:num:insert:chain[:fragno:altloc]’, e.g. ‘ALA:325: :A’ (note the blank/whitespace insert code). The regex ‘.*’ notation can be used as a wildcard.
  • Size Used to Define Binding Site (site_size) type: decimal: Distance used to determine the size of the site.
    Default: 5.0
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Strict Ligand (strict_ligand) type: boolean: Option to only emit design units with ligands that match the ligand names (if any are provided)
    Default: True
  • Enforce Proline Positions in Loop Templates (strict_proline_match) type: boolean: Fuzzy matches in the loop database have to have proline in exact locations of sequence.
    Default: True
  • Strict Protonation Mode (strict_protonate) type: boolean: Option to fail prep if protons could not be added.
    Default: True
  • Superpose Design Units (superpose) type: boolean: Option to superpose DUs (if multiple), first onto the reference structure (if provided).
    Default: True
  • Superposition Method (superpose_method) type: string: Superposition method.
    Default: SiteSequence
    Choices: GlobalSequence, SiteSequence, DDMatrix, SSE, SiteHopper
  • Target Classification (target) type: string: Option to pick whether target is protein or nucleic acid component.
    Default: Protein
    Choices: Protein, Nucleic
  • Number to Transform (transform_threshold) type: integer: Number of loops to allow through the sidechain clash checker. No matter this number, will process all with an identical sequence to target.
    Default: 25
  • output verbosity (verbosity) type: string: verbose level
    Default: warning
    Choices: info, warning, error, debug, ddebug

Field parameters

  • Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log field
    Default: Extended Log Field
  • Log Field (log_field) type: Field Type: String: The field to store messages to floe report
    Default: Log Field

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Spruce Prep

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.
    Default: 1 , Min: 1, Max: 65535
  • Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work item
    Default: 10 , Min: 1, Max: 100
  • Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this Cube
    Default: 1000 , Min: 1
  • Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this Cube
    Default: 0