Frequently Asked Questions for Spruce Floes

What input formats does Spruce accept, and what are the requirements?

The Spruce floes can read any molecule format that is supported by OEChem, and importable into Orion in a molecule field. Typical input formats are PDB and MMCIF. The specs are well described in the linked pages here: PDB and MMCIF. The most common problem encountered when using the PDB files, particularly from non-experimental data sources, is that hetereogen or non-polymer atoms (e.g. ligand atoms) are marked as ATOM sections, instead of HETATM sections.

What is a reference design unit (DU) used for, and when should I provide one?

A reference DU serves different purposes during the preparation process.

  1. A reference DU may be used as a template to guide the biological unit extraction process. As an example, if the asymmetric unit is a tetramer, but the biological unit is a dimer, the reference DU guides the extraction of dimers from the asymmetric unit. This is of particularly importance if the REMARK 300 and REMARK 350 in the PDB header are incorrect.

  2. A reference DU may be used to identify the binding site. This is relevant when the binding site of the structure being prepared is known but empty (e.g. for apo structures), or when multiple ligands and binding sites are available on a structure. In this case, the reference DU can be used to generate the desired binding site.

  3. A reference DU may be used as a superposition reference: each design unit generated during preparation is superposed onto the reference structure. This is helpful when preparing multiple structures of a target with different ligands bound in the active site, as it ensures all prepared DUs can be overlayed in the viewer.

Spruce generated multiple design units from a structure, which one should I pick?

The preparation process can generate multiple design units for a few different reasons.

  1. Multiple unique biological units are available in the asymmetric unit (e.g. dimers of chains A+B, and C+D).

  2. The experimental data contains alternate locations (altlocs), e.g. different conformations of residues in the structure. These are typically enumerated (A-form, B-form etc.), but can be collapsed if desired. Spruce optionally removes duplicate design units generated with alternate locations that are far from the binding site to avoid redundant design units.

Note that the two cases above result in combinatorial enumeration, sometimes causing a large number of design units being generated from a single experimental structure. The decision of which design unit to utilize is not necessarily trivial, and number of factors need to be considered. Firstly, the intended downstream application(s): in the case of docking, it maybe necessary to use both forms of alternate locations, particularly if the alternate locations have the potential of affecting the ligand binding pose. For applications involving MD simulations it maybe less relevant, as the simulation will likely explore the conformational space, unless the barriers between the two states are high. Secondly, we recommend using the Iridium classification to judge between two design units that are otherwise similar. The Iridium classification judges the ligand and the binding site based on the quality of the experimental data, the diffraction precision index (DPI), R-factors, coverage of electron density of the ligand atoms and the binding site atoms, and additionally flags excipients or packing residues potentially affecting the ligand pose. The Iridium classification ranges from High(HT)->Medium(MT)->Not(NT) in trustworthiness, as well as NA for structures where the Iridium classification cannot be determined (e.g. apo binding sites, or lack of electron density maps). The Iridium classification is published here.