SPRUCE - Protein Preparation

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Role-based/Computational Chemist

  • Product-based/SPRUCE

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation

  • Task-based/Target Prep & Analysis/Protein Preparation

Description

This floe uses Spruce to prepare biomolecules for downstream modeling applications in Orion, such as docking, posit, gameplan, or short-trajectory MD by generating a design unit.

At least one input is required to run this floe and can come from one or more of the following input types. (1) PDB or MMCIF files accompanied by MTZ files with electron density maps (if available). (2) A valid PDB code. Multiple PDB codes may be submitted using a csv string.

An optional reference for biological unit extraction may be provided. This reference can be a dataset from a previous Classic Spruce: Prep run. Alternatively, the reference can a pdb code, or a pdb file and mtz map, and new structure will be prepared as a reference structure. A reference structure prepared in this way will have different default Spruce preparation parameters, and the structure may not match settings established for the input structures. It is thus recommended that a reference structure also be included in floe input.

If a ligand cannot be detected during the run, consider specifying the ligand residue name, increasing the size of the input variable “max_residues”, given as an optional input to this floe. Or if this is a known apo structure, you can provide the definition of a residue in the binding site.

You can read more about Spruce in the toolkit documentation.

Promoted Parameters

Title in user interface (promoted name)

Reference Structure Inputs

Optional Reference DU Dataset (ref_dataset_in): Only the first design unit of a reference dataset will be read if multiple

  • Type: data_source

Optional PDB Code for reference DU (ref_code_cube_in): PDB code to generate a reference design unit from

  • Type: string

Optional PDB File for reference DU (ref_pdb_file_cube_in): PDB file to generate a reference design unit from

  • Type: file_in

Optional MTZ File for reference DU (ref_mtz_file_cube_in): MTZ file to generate a reference design unit from

  • Type: file_in

Reference Dataset (ref_data_out): Reference Dataset if generated as part of the Floe

  • Type: dataset_out

  • Default: Reference structure dataset

Reference Structure Prep Parameters

Build missing loops (ref_build_loops): Option to build missing loops (if information is available to do so)

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Build Missing Tails (ref_build_tails): Option to build missing tails (if information is available to do so)

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Ligand Name(s) (ref_ligand_names): format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.

  • Type: string

Strict Ligand (ref_strict_ligand): Option to only emit design units with ligands that match the ligand names (if any are provided)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Ligand Type (ref_lig_type): The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5; Peptide: min_atoms=8, max_atoms=200, max_residues=2; Macrocycle: min_atoms=8, max_atoms=250, max_residues=20; Fragment: min_atoms=2, max_atoms=35, max_residues=5

  • Required

  • Type: string

  • Default: Small Molecule

  • Choices: [‘Small Molecule’, ‘Peptide’, ‘Macrocycle’, ‘Fragment’]

Loop Builder Parameters

Build missing loops (build_loops): Option to build missing loops (if information is available to do so)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Build Missing Tails (build_tails): Option to build missing tails (if information is available to do so)

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Inputs

Input structure (PDB/MMCIF) files (structure_inp_files):

  • Type: file_in

Input electron density map (MTZ) files (mtz_files): Map file name must have the same root as its associated structure (PDB/MMCIF) file. If provided, floe will ensure that the density map(s) are paired with a coordinate file.

  • Type: file_in

PDB code(s) to download (input_codes): Separate multiple codes with a (default) comma delimiter, e.g. ‘1ABC, DEF2, G3HI’.

  • Type: string

Outputs

Output Dataset (dataset_data_out): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: Spruce_prep_dataset

Output Dataset (biodu_data_cube): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: Spruce_biodu_dataset

Output Dataset (failed_data_out): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: Failed_Spruce_prep_dataset

Ligand Parameters

Ligand Name(s) (ligand_names): format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.

  • Type: string

Strict Ligand (strict_ligand): Option to only emit design units with ligands that match the ligand names (if any are provided)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Ligand Type (lig_type): The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5; Peptide: min_atoms=8, max_atoms=200, max_residues=2; Macrocycle: min_atoms=8, max_atoms=250, max_residues=20; Fragment: min_atoms=2, max_atoms=35, max_residues=5

  • Required

  • Type: string

  • Default: Small Molecule

  • Choices: [‘Small Molecule’, ‘Peptide’, ‘Macrocycle’, ‘Fragment’]

Un-liganded Structure Parameters

Enumerate Pockets (enum_pocket): Option to enumerate pockets when no ligand is found

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Site Residue Entry (site_residue): Single site residue specification for APO structures. Format ‘name:num:insert:chain[:fragno:altloc]’, e.g. ‘ALA:325: :A’ (note the blank/whitespace insert code). The regex ‘.*’ notation can be used as a wildcard.

  • Type: string

Output All Biological Units (output_bio_designunits): Option to write all biological design units. These are intermediaries and should not be used forother applications.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

General Spruce Parameters

Add Interaction Hints (add_interactions): Option add interactions to the design units.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Add Style (add_style): Option add style to the design units.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Allow Cap Residue Truncation (allow_truncate):

Option to allow terminal residue to converted to cap, if cap will otherwise clash.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Alternate Location Handling Method (altloc): Option to pick method of handling alternate locations.

  • Required

  • Type: string

  • Default: Default

  • Choices: [‘Primary’, ‘Enumerate’, ‘Default’]

Build C-Terminal Caps (build_cterm_caps): Option to cap broken C-termini in protein chains.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Build N-Terminal Caps (build_nterm_caps): Option to cap broken N-termini in protein chains.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Build Partial Sidechains (build_sidechains): Option to build missing or partial protein sidechains.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Assign Charges and Radii (charge_radii): Option to assign partial charge and radii.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Add Cofactor Code(s) (cofactor_codes): Add uncommon, or custom, cofactor 3-letter codes.

  • Type: string

Collapse Non-Site Alternates (collapse_nonsite_alts): Option to deduplicate structures with different alts, if the alt locations are not near the binding site.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Delete Clashing Solvent (delete_clashing_solvent): Option to allow build steps to remove clashing solvent.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Duplicate Removal (duplicate_removal): Option to deduplicate identical structures resulting from symmetry operation.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Enumerate Cofactor Sites (enum_cofactors_sites): Option to generate individual design units based on the recognized cofactors.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Add Excipient Code(s) (excipient_codes): Add uncommon, or custom, excipient 3-letter codes.

  • Type: string

Fix Backbone Atom Issues (fix_backbone): Option to fix backbone atom issues in protein chains.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Generate Tautomers (generate_tautomers): Option to generate and use tautomers in the hydrogen network optimization.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Hetgroup Cluster Distance (het_group_nbr_dist): Distance between heterogens used to determine optimization clusters for protonation.

  • Required

  • Type: decimal

  • Default: 3.5

Include Binding Site Grids (include_bsite_edens_grids): Include electron density and difference density maps around the binding site

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Add Ligand Smiles (ligand_metadata): Add ligand smiles and 3-letter codes, e.g. ‘c1ccccc1 BNZ’.

  • Type: string

Add Lipid Codes(s) (lipid_codes): Add uncommon, or custom, lipid 3-letter codes

  • Type: string

Make Packing Residues (make_pack_res): Generate packing residues from an asymmetric unit.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Maximum Atoms in Biological Unit (max_bu_atoms): Option to limit the size of BUs processed based on number of atoms.

  • Required

  • Type: integer

  • Default: 50000

Maximum Parts in Biological Unit (max_bu_parts): Option to limit the size of BUs processed based on number of parts (chains).

  • Required

  • Type: integer

  • Default: 24

Max Atoms for a Ligand (max_lig_atoms): Override for the maximum number of heavy atoms in a molecule to be detected as a ligand.

  • Type: integer

Max Residues for a Ligand (max_lig_residues): Override for the maximum number of residues in a molecule to be detected as a ligand.

  • Type: integer

Max System Atoms (max_system_atoms): Maximum number of atoms in the system.

  • Required

  • Type: integer

  • Default: 50000

Minimum Alignment Score for Biological Unit Extraction (min_align_score): Option to specify minimum sequence alignment score for biological unit extraction.

  • Required

  • Type: integer

  • Default: 200

Min Atoms for a Ligand (min_lig_atoms): Override for the minimum number of heavy atoms in a molecule to be detected as a ligand.

  • Type: integer

Optimize Experimental Protons (opt_expt_protons): Option to optimize hydrogens assigned in the experiment.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Prefer Author BIOMT Records (pref_author_record): Option where the author BIOMT record is prefered over the software generated one.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Protonate (protonate): Option to add and optimize protons in the system.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Restrict DUs to Reference Site Removal (restrict_to_refsite): Option to not generate design units with sites not matching the reference (if one is provided).

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Rotamer Coverage % (rot_coverage): Coverage of the rotamers returned from the library in percent.

  • Required

  • Type: decimal

  • Default: 100.0

Rotamer Library (rot_lib): Rotamer library to use for side-chain building.

  • Required

  • Type: string

  • Default: Richardson2016

  • Choices: [‘Dunbrack’, ‘Richardson’, ‘Richardson2016’]

Size Used to Define Binding Site (site_size): Distance used to determine the size of the site.

  • Required

  • Type: decimal

  • Default: 5.0

Strict Protonation Mode (strict_protonate): Option to fail prep if protons could not be added.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Superpose Design Units (superpose): Option to superpose DUs (if multiple), first onto the reference structure (if provided).

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Superposition Method (superpose_method): Superposition method.

  • Required

  • Type: string

  • Default: SiteSequence

  • Choices: [‘GlobalSequence’, ‘SiteSequence’, ‘DDMatrix’, ‘SSE’, ‘SiteHopper’]

Target Classification (target): Option to pick whether target is protein or nucleic acid component.

  • Required

  • Type: string

  • Default: Protein

  • Choices: [‘Protein’, ‘Nucleic’]

General Loop Builder Parameters

Loop Backbone Clash Threshold (bb_clash_threshold): Loops from the database where more than the threshold fraction of the backbone atoms clash, are rejected.

  • Required

  • Type: decimal

  • Default: 0.25

Option to Build Disulfide Bridges (build_disulfidebridges): Allow the loop builder to build disulfide bridges during loop modeling (if possible).

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Loop Builder Include Crystal Packing (build_with_crystalpacking): Include packing residues when building loops.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Loop Crop Length (crop_length): Anchor residues on the protein to crop back for a better fit, results in longer loops being built.

  • Required

  • Type: integer

  • Default: 1

Include Solvent Accessible Surface Area Term (incl_SA_term): Include solvent accessible surface area term when ranking the loops.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Include Solvation (incl_solvation): Include simple solvation model when building loops.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Loop Clash Threshold (loop_clash_threshold): Loops from the database where more than the threshold fraction of the loops atoms in addition to the bacbkone clashing ones clash, are rejected.

  • Required

  • Type: decimal

  • Default: 0.2

Loop Anchor Atom Eistance Buffer (loop_distance_buffer): Fuzzy matches in the loop database has to have distance between anchor atoms correct, +/- buffer distance.

  • Required

  • Type: decimal

  • Default: 1.0

Loop Database File (loop_input_file): (Optional) A template loop database file, if not specified built-in database will be used

  • Type: file_in

Number of Loops to Minimize and Evaluate (max_eval_loops): Maximum number of loops to connect and minimize.

  • Required

  • Type: integer

  • Default: 5

Loop Optimization Shell (opt_shell): Include atoms within this distance in the loop optimization, larger distance results in slower optimizations.

  • Required

  • Type: decimal

  • Default: 15.0

Optimize Stage 1 Step/Residue Multiplier (opt_stage1_iter_multiplier): Number of steps per number of residues in the loop for the first stage optimizer.

  • Required

  • Type: integer

  • Default: 5

Optimize Stage 2 Step/Residue Multiplier (opt_stage2_iter_multiplier): Number of steps per number of residues in the loop for the second stage optimizer.

  • Required

  • Type: integer

  • Default: 10

Loop Optimization Tolerance (opt_tolerance): Tolerance for the loop optimization, smaller numbers result in slower optimizations.

  • Required

  • Type: decimal

  • Default: 0.001

Enforce Proline Positions in Loop Templates (strict_proline_match): Fuzzy matches in the loop database have to have proline in exact locations of sequence.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Number to Transform (transform_threshold): Number of loops to allow through the sidechain clash checker. No matter this number, will process all with an identical sequence to target.

  • Required

  • Type: integer

  • Default: 25