SPRUCE - Protein Preparation¶

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Role-based/Computational Chemist

Product-based/SPRUCE

Solution-based/Virtual-screening/Target Preparation

Solution-based/Hit to Lead/Target Preparation

Task-based/Target Prep & Analysis/Protein Preparation

Description

This floe uses Spruce to prepare biomolecules for downstream modeling applications in Orion, such as docking, posit, gameplan, or short-trajectory MD by generating a design unit.

At least one input is required to run this floe, and can come from one or more of the following input types. (1) PDB or MMCIF files accompanied by MTZ files with electron density maps (if available). (2) A valid PDB code. Multiple PDB codes may be submitted using a csv string.

An optional reference for biological unit extraction may be provided. This reference can be a dataset from a previous Classic Spruce: Prep run, a pdb code, or a pdb file and mtz map.

If a ligand cannot be detected during the run, consider specifying the ligand residue name, increasing the size of the input variable “max_residues”, given as an optional input to this floe. Or if this is a known apo structure, you can provide the definition of a residue in the binding site.

You can read more about Spruce in the toolkit documentation.

Promoted Parameters

Title in user interface (promoted name)

Reference Structure Inputs

Optional Reference DU Dataset (ref_dataset_in): Only the first design unit of a reference dataset will be read if multiple

Type: data_source

Optional PDB Code for reference DU (ref_code_cube_in): PDB code to generate a reference design unit from

Type: string

Optional PDB File for reference DU (ref_pdb_file_cube_in): PDB file to generate a reference design unit from

Type: file_in

Optional MTZ File for reference DU (ref_mtz_file_cube_in): MTZ file to generate a reference design unit from

Type: file_in

Reference Dataset (ref_data_out): Reference Dataset if generated as part of the Floe

Type: dataset_out

Default: Reference structure dataset

Reference Structure Prep Parameters

Build missing loops (ref_build_loops): Option to build missing loops (if information is available to do so)

Required

Type: boolean

Default: True

Choices: [True, False]

Build missing tails (ref_build_tails): Option to build missing tails (if information is available to do so)

Required

Type: boolean

Default: False

Choices: [True, False]

Ligand name(s) (ref_ligand_names): format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.

Type: string

Strict Ligand (ref_strict_ligand): Option to only emit design units with ligands that match the ligand names (if any are provided)

Required

Type: boolean

Default: True

Choices: [True, False]

Ligand Type (ref_lig_type): The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5 Peptide: min_atoms=8, max_atoms=200, max_residues=20 Macrocycle: min_atoms=8, max_atoms=250, max_residues=20 Fragment: min_atoms=2, max_atoms=35, max_residues=5

Required

Type: string

Default: Small Molecule

Choices: [‘Small Molecule’, ‘Peptide’, ‘Macrocycle’, ‘Fragment’]

Loop Builder Parameters

Build missing loops (build_loops): Option to build missing loops (if information is available to do so)

Required

Type: boolean

Default: True

Choices: [True, False]

Build missing tails (build_tails): Option to build missing tails (if information is available to do so)

Required

Type: boolean

Default: False

Choices: [True, False]

Inputs

Input structure (PDB/MMCIF) files (structure_inp_files):

Type: file_in

Input electron density map (MTZ) files (mtz_files): Map file name must have the same root as its associated structure (PDB/MMCIF) file. If provided, floe will ensure that the density map(s) are paired with a coordinate file.

Type: file_in

PDB code(s) to download (input_codes): Separate multiple codes with a (default) comma delimiter, e.g. ‘1ABC, DEF2, G3HI’.

Type: string

Outputs

Output Dataset (dataset_data_out): Output dataset to write to

Required

Type: dataset_out

Default: Spruce_prep_dataset

Output Dataset (failed_data_out): Output dataset to write to

Required

Type: dataset_out

Default: Failed_Spruce_prep_dataset

Ligand Parameters

Ligand name(s) (ligand_names): format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.

Type: string

Strict Ligand (strict_ligand): Option to only emit design units with ligands that match the ligand names (if any are provided)

Required

Type: boolean

Default: True

Choices: [True, False]

Ligand Type (lig_type): The type of ligand that is expected for the system. Affects the max/min atom counts and the max residue count (if applicable) for the ligand in the system. Overrides can be individually input. Defaults are as follow: Small Molecule: min_atoms=8, max_atoms=100, max_residues=5 Peptide: min_atoms=8, max_atoms=200, max_residues=20 Macrocycle: min_atoms=8, max_atoms=250, max_residues=20 Fragment: min_atoms=2, max_atoms=35, max_residues=5

Required

Type: string

Default: Small Molecule

Choices: [‘Small Molecule’, ‘Peptide’, ‘Macrocycle’, ‘Fragment’]

Un-liganded Structure Parameters

Enumerate pockets (enum_pocket): Option to enumerate pockets when no ligand is found

Required

Type: boolean

Default: False

Choices: [True, False]

Site residue entry (site_residue): Single site residue specification for APO structures. Format ‘name:num:insert:chain[:fragno:altloc]’, e.g. ‘ALA:325: :A’ (note the blank/whitespace insert code). The regex ‘.*’ notation can be used as a wildcard.

Type: string