SPRUCE - Protein Preparation from PDB Codes

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Role-based/Computational Chemist

  • Product-based/SPRUCE

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation

  • Task-based/Target Prep & Analysis/Protein Preparation

Description

This floe uses Spruce to prepare biomolecules for downstream modeling applications in Orion, such as docking, posit, gameplan, or short-trajectory MD by generating a design unit.

The required input for this floe is PDB codes. The PDB (or MMCIF if the PDB is not available), as well as the MTZ file, containing the electron density maps, (if available) will be downloaded from the RCSB.

An optional reference for biological unit extraction may be provided. This reference can be a dataset from a previous Classic Spruce: Prep run, a pdb code, or a pdb file and mtz map.

If a ligand cannot be detected during the run, consider specifying the ligand residue name, increasing the size of the input variable “max_residues”, given as an optional input to this floe. Or if this is a known apo structure, you can provide the definition of a residue in the binding site.

You can read more about Spruce in the toolkit documentation.

Promoted Parameters

Title in user interface (promoted name)

Reference Structure Inputs

Optional Reference DU Dataset (ref_dataset_in): Only the first design unit of a reference dataset will be read if multiple

  • Type: data_source

Optional PDB Code for reference DU (ref_code_cube_in): PDB code to generate a reference design unit from

  • Type: string

Optional PDB File for reference DU (ref_pdb_file_cube_in): PDB file to generate a reference design unit from

  • Type: file_in

Optional MTZ File for reference DU (ref_mtz_file_cube_in): MTZ file to generate a reference design unit from

  • Type: file_in

Reference Dataset (ref_data_out): Reference Dataset if generated as part of the Floe

  • Type: dataset_out

  • Default: Reference structure dataset

Reference Structure Prep Parameters

Build missing loops (ref_build_loops): Option to build missing loops (if information is available to do so)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Build missing tails (ref_build_tails): Option to build missing tails (if information is available to do so)

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Ligand name(s) (ref_ligand_names): format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.

  • Type: string

Strict Ligand (ref_strict_ligand): Option to only emit design units with ligands that match the ligand names (if any are provided)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Max atoms for a ligand (ref_max_lig_atoms): Maximum number of atoms in a molecule to be detected as a ligand. For peptides we recommend 200

  • Required

  • Type: integer

  • Default: 100

Max residues for a ligand (ref_max_lig_residues): Maximum number of residues in a molecule to be detected as a ligand. For peptides we recommend 20

  • Required

  • Type: integer

  • Default: 5

Loop Builder Parameters

Build missing loops (build_loops): Option to build missing loops (if information is available to do so)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Build missing tails (build_tails): Option to build missing tails (if information is available to do so)

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Inputs

PDB code(s) to download (input_codes): Separate multiple codes with a (default) comma delimiter, e.g. ‘1ABC, DEF2, G3HI’.

  • Required

  • Type: string

Outputs

Output Dataset (dataset_data_out): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: Spruce_prep_dataset

Output Dataset (failed_data_out): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: Failed_Spruce_prep_dataset

Ligand Parameters

Ligand name(s) (ligand_names): format 3-letter codes e.g. ‘LIG’, for peptides separate codes with dashes(e.g. ‘SER-VAL-TPO-ALA’.

  • Type: string

Strict Ligand (strict_ligand): Option to only emit design units with ligands that match the ligand names (if any are provided)

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Max atoms for a ligand (max_lig_atoms): Maximum number of atoms in a molecule to be detected as a ligand. For peptides we recommend 200

  • Required

  • Type: integer

  • Default: 100

Max residues for a ligand (max_lig_residues): Maximum number of residues in a molecule to be detected as a ligand. For peptides we recommend 20

  • Required

  • Type: integer

  • Default: 5

Un-liganded Structure Parameters

Enumerate pockets (enum_pocket): Option to enumerate pockets when no ligand is found

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Site residue entry (site_residue): Single site residue specification for APO structures. Format ‘name:num:insert:chain[:fragno:altloc]’, e.g. ‘ALA:325: :A’ (note the blank/whitespace insert code). The regex ‘.*’ notation can be used as a wildcard.

  • Type: string