Prepare Biological Units
Spruce-prepped biological units are generated from input structure on the input oechem.OERecord.
Calculation Parameters
Add Interaction Hints (add_interactions) type: boolean: Option add interactions to the design units.Default: True Add Style (add_style) type: boolean: Option add style to the design units.Default: True Allow Cap Residue Truncation (allow_truncate) type: boolean: Option to allow terminal residue to converted to cap, if cap will otherwise clash.Default: True Alternate Location Handling Method (altloc) type: string: Option to pick method of handling alternate locations.Default: DefaultChoices: Primary, Enumerate, Default Loop Backbone Clash Threshold (bb_clash_threshold) type: decimal: Loops from the database where more than the threshold fraction of the backbone atoms clash, are rejected.Default: 0.25 Build C-Terminal Caps (build_cterm_caps) type: boolean: Option to cap broken C-termini in protein chains.Default: True Option to Build Disulfide Bridges (build_disulfidebridges) type: boolean: Allow the loop builder to build disulfide bridges during loop modeling (if possible).Default: True Build Missing Loops (build_loops) type: boolean: Option to build missing loops (if information is available to do so)Default: True Build N-Terminal Caps (build_nterm_caps) type: boolean: Option to cap broken N-termini in protein chains.Default: True Build Partial Sidechains (build_sidechains) type: boolean: Option to build missing or partial protein sidechains.Default: True Build Missing Tails (build_tails) type: boolean: Option to build missing tails (if information is available to do so)Default: False Loop Builder Include Crystal Packing (build_with_crystalpacking) type: boolean: Include packing residues when building loops.Default: False Assign Charges and Radii (charge_radii) type: boolean: Option to assign partial charge and radii.Default: True Add Cofactor Code(s) (cofactor_codes) type: string: Add uncommon, or custom, cofactor 3-letter codes. Collapse Non-Site Alternates (collapse_nonsite_alts) type: boolean: Option to deduplicate structures with different alts, if the alt locations are not near the binding site.Default: True CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Loop Crop Length (crop_length) type: integer: Anchor residues on the protein to crop back for a better fit, results in longer loops being built.Default: 1 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Delete Clashing Solvent (delete_clashing_solvent) type: boolean: Option to allow build steps to remove clashing solvent.Default: True Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Duplicate Removal (duplicate_removal) type: boolean: Option to deduplicate identical structures resulting from symmetry operation.Default: True Enumerate Cofactor Sites (enum_cofactors_sites) type: boolean: Option to generate individual design units based on the recognized cofactors.Default: False Add Excipient Code(s) (excipient_codes) type: string: Add uncommon, or custom, excipient 3-letter codes. Fix Backbone Atom Issues (fix_backbone) type: boolean: Option to fix backbone atom issues in protein chains.Default: True Generate Tautomers (generate_tautomers) type: boolean: Option to generate and use tautomers in the hydrogen network optimization.Default: True GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Hetgroup Cluster Distance (het_group_nbr_dist) type: decimal: Distance between heterogens used to determine optimization clusters for protonation.Default: 3.5 Include Solvent Accessible Surface Area Term (incl_SA_term) type: boolean: Include solvent accessible surface area term when ranking the loops.Default: True Include Solvation (incl_solvation) type: boolean: Include simple solvation model when building loops.Default: True Include Binding Site Grids (include_bsite_edens_grids) type: boolean: Include electron density and difference density maps around the binding siteDefault: True Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Ligand Type (lig_type) type: string: The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5; Peptide: min_atoms=8, max_atoms=200, max_residues=20; Macrocycle: min_atoms=8, max_atoms=250, max_residues=20; Fragment: min_atoms=2, max_atoms=35, max_residues=5Default: Small MoleculeChoices: Small Molecule, Peptide, Macrocycle, Fragment Add Ligand Smiles (ligand_metadata) type: string: Provide the ligand code and SMILES using a comma-separated CSV-style format (e.g. ‘BNZ, c1ccc(cc1)O’). The ligand code is used to identify the entry in the molecule. The SMILES is used to verify or remediate the connectivity and valence state of the ligand read from the PDB or MMCIF file format. Optional remaining CSV value(s) are ligand smiles tautomers SPRUCE will check during hydrogen network optimization. (e.g. ‘BNZ, c1ccc(cc1)O, c1ccc(cc1)[O-]’). SPRUCE will by default generate tautomers from the provided SMILES, thus to exclusively test the provided tautomers, turn off the Generate Tautomers parameter.This parameter will be applied globally to all input structures for the floe. Ligand Name(s) (ligand_names) type: string: format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes (e.g. ‘SER-VAL-TPO-ALA’). Add Lipid Codes(s) (lipid_codes) type: string: Add uncommon, or custom, lipid 3-letter codes Loop Clash Threshold (loop_clash_threshold) type: decimal: Loops from the database where more than the threshold fraction of the loops atoms in addition to the bacbkone clashing ones clash, are rejected.Default: 0.2 Loop Anchor Atom Eistance Buffer (loop_distance_buffer) type: decimal: Fuzzy matches in the loop database has to have distance between anchor atoms correct, +/- buffer distance.Default: 1.0 Loop Database File (loop_input_file) type: file_in: (Optional) A template loop database file, if not specified built-in database will be used Make Packing Residues (make_pack_res) type: boolean: Generate packing residues from an asymmetric unit.Default: True Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300 Maximum Atoms in Biological Unit (max_bu_atoms) type: integer: Option to limit the size of BUs processed based on number of atoms.Default: 50000 Maximum Parts in Biological Unit (max_bu_parts) type: integer: Option to limit the size of BUs processed based on number of parts (chains).Default: 24 Number of Loops to Minimize and Evaluate (max_eval_loops) type: integer: Maximum number of loops to connect and minimize.Default: 5 Max Atoms for a Ligand (max_lig_atoms) type: integer: Override for the maximum number of heavy atoms in a molecule to be detected as a ligand. Max Residues for a Ligand (max_lig_residues) type: integer: Override for the maximum number of residues in a molecule to be detected as a ligand. Max System Atoms (max_system_atoms) type: integer: Maximum number of atoms in the system.Default: 50000 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Minimum Alignment Score for Biological Unit Extraction (min_align_score) type: integer: Option to specify minimum sequence alignment score for biological unit extraction.Default: 200 Min Atoms for a Ligand (min_lig_atoms) type: integer: Override for the minimum number of heavy atoms in a molecule to be detected as a ligand. Optimize Experimental Protons (opt_expt_protons) type: boolean: Option to optimize hydrogens assigned in the experiment.Default: False Loop Optimization Shell (opt_shell) type: decimal: Include atoms within this distance in the loop optimization, larger distance results in slower optimizations.Default: 15.0 Optimize Stage 1 Step/Residue Multiplier (opt_stage1_iter_multiplier) type: integer: Number of steps per number of residues in the loop for the first stage optimizer.Default: 5 Optimize Stage 2 Step/Residue Multiplier (opt_stage2_iter_multiplier) type: integer: Number of steps per number of residues in the loop for the second stage optimizer.Default: 10 Loop Optimization Tolerance (opt_tolerance) type: decimal: Tolerance for the loop optimization, smaller numbers result in slower optimizations.Default: 0.001 Output All Biological Units (output_bio_designunits) type: boolean: Option to write all prepared biological units. These are intermediaries in SPRUCE’s processes to prepare and generate design units. Their use in other applications is limited, but these are valid inputs for e.g. plain MD simulations and cryptic pocket detection Floes.Default: False Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32 Prefer Author BIOMT Records (pref_author_record) type: boolean: Option where the author BIOMT record is preferred over the software generated one.Default: True Protonate (protonate) type: boolean: Option to add and optimize protons in the system.Default: True Restrict DUs to Reference Site Removal (restrict_to_refsite) type: boolean: Option to not generate design units with sites not matching the reference (if one is provided).Default: True Rotamer Coverage % (rot_coverage) type: decimal: Coverage of the rotamers returned from the library in percent.Default: 100.0 Rotamer Library (rot_lib) type: string: Rotamer library to use for side-chain building.Default: Richardson2016Choices: Dunbrack, Richardson, Richardson2016 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Site Residue Entry (site_residue) type: string: Single site residue specification for APO structures. Format: ‘RESNAME:RESNUM:ICODE:CHAINID[:FRAGNO:ALTLOC]’, e.g. ‘ALA:325: :A’ (note the blank/whitespace insert code). The regex ‘.*’ notation can be used as a wildcard. Size Used to Define Binding Site (site_size) type: decimal: Distance used to determine the size of the site.Default: 5.0 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required Strict Ligand (strict_ligand) type: boolean: Option to only emit design units with ligands that match the ligand names (if any are provided)Default: True Enforce Proline Positions in Loop Templates (strict_proline_match) type: boolean: Fuzzy matches in the loop database have to have proline in exact locations of sequence.Default: True Strict Protonation Mode (strict_protonate) type: boolean: Option to fail prep if protons could not be added.Default: True Superpose Design Units (superpose) type: boolean: Option to superpose DUs (if multiple), first onto the reference structure (if provided).Default: True Superposition Method (superpose_method) type: string: Superposition method.Default: SiteSequenceChoices: GlobalSequence, SiteSequence, DDMatrix, SSE, SiteHopper Target Classification (target) type: string: Option to pick whether target is protein or nucleic acid component.Default: ProteinChoices: Protein, Nucleic Number to Transform (transform_threshold) type: integer: Number of loops to allow through the sidechain clash checker. No matter this number, will process all with an identical sequence to target.Default: 25 output verbosity (verbosity) type: string: verbose levelDefault: warningChoices: info, warning, error, debug, ddebug
Field parameters
Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log fieldDefault: Extended Log Field Log Field (log_field) type: Field Type: String: The field to store messages to floe reportDefault: Log Field
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
- Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network
Parallel Prepare Biological Units
The parallel version adds these extra parameters.
Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.Default: 1 , Min: 1, Max: 65535 Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work itemDefault: 10 , Min: 1, Max: 100 Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this CubeDefault: True Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this CubeDefault: 1000 , Min: 1 Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this CubeDefault: 0