Preparing Input

Download Initial Files

The WEMD simulation floes in the OpenEye Structural Biology Floes package require one initial all-atom model and the related cryo-EM maps. The initial all-atom model can be downloaded from the RCSB PDB bank or built from any structure building or predicting software such as AlphaFold. To explore the tutorials, first try the two small systems listed below (1 and 2), then apply the HER2 real application example (3) for the Soup to Nuts tutorial.

The prepared files can be downloaded here and expanded locally. They will be uploaded to Orion under the individual folders in each tutorial.

Download Files

all_init_files.tgz

  1. Adenylate Kinase (ADK): ADK catalyzes the reaction from ATP and AMP to two ADPs. 4AKE is the open state of the protein in the absence of a ligand, while 1AKE is the closed state with the bound inhibitor Ap5A. Due to its small size (around 210 residues), ADK can demonstrate many cases of conformational transitions in the development of simulation algorithms. Please note that no well-defined closed state without a bound ligand has been reported experimentally.

    1. We removed the Ap5A inhibitor from 1AKE and generated synthetic cryo-EM maps with a resolution of 4 Å to use as reference “experimental” maps to illustrate the Automated WEMD Simulation and Best Structure Search Guided by Target CryoEM Map Floe.

    2. We will start the simulation from the open state (4AKE) to explore how it transfers to the closed state to form a cryptic pocket. The files provided above will be uploaded to Orion under the adk folder for the Basic Tutorial 1.

  2. Human CDK-activating kinase (CAK): CAK is composed of CDK7, cyclin H, and MAT1. It is involved in the control of transcription initiation and the cell cycle and has been identified as a promising target for cancer chemotherapy. 7B5Q is an all-atom model recently derived from cryo-EM data; however, conformational variability analysis is missing due to its small size (~600 residues).

    1. From a previous WEMD simulation not using cryo-EM data, we generated synthetic mean maps and eigenmaps with a resolution of 2.5 Å as reference “experimental” maps to illustrate the Automated WEMD Simulation and Best Structure Search Guided By Eigen CryoEM Maps Floe.

    2. The files provided above will be uploaded to Orion under the cak folder for the Basic Tutorial 2.

  3. Human epidermal growth factor receptor 2 (HER2): HER2 belongs to the ErbB family of tyrosine kinase receptors. It is involved in cell signaling and its deregulation plays a critical role in many cancers. RCSB PDB ID 8PWH was derived from work by Bressanelli and coworkers [Bressanelli-2024].

    1. Based on the related experimental data, we analyzed some cryo-EM maps and prepared heterogeneous maps as well as eigenmaps for practice.

    2. The files provided above will be uploaded to Orion under the her2 folder for the Soup to Nuts Tutorial.

Note

The most popular way to generate simulation or synthetic maps uses the Gaussian approximation of atomic densities on a 3D grid. For the structural biology floes, we use functions related to the OEShape and OEGrid from the OpenEye toolkits. Please refer to the Shape TK theory for more details.

Create a Project for Tutorials

As a first step, we need to create a new project, such as StructBio Tutorials, for all related simulations and analysis for this package. To create a new project, log into Orion and click the “Project” button on the blue navigation bar. Then click on the “Create Project” button in the upper right, and in the pop-up window, enter StructBio Tutorial for the name of the project and click “Create Project”.

Tip

If you have already created a tutorial project, you can reuse the existing one.

  1. Although you are free to set your own file names and parameters, we suggest following the tutorial examples to reproduce the workflow as closely as possible.

  2. Analysis floes using the saved dataset or collection as input should be run after the simulation floes have completed to avoid opening and analyzing a collection when it is still updating shards for new MD trajectories. After a WEMD simulation, most analysis floes can be launched independently, as they do not require the output of the others to run.

  3. Please also follow the order of the tutorials in preparation, simulation, and analysis. For a quick examination, you can skip those marked as optional or advanced initially, then return later for a more detailed investigation.

Design Unit Preparation from Initial Structure

An initial 3D structure from third-party software might not include all information to sufficiently prepare a protein for modeling. Similar to other OpenEye floe packages, we use the SPRUCE - Protein Preparation Floe to generate a prepared design unit with associated depictions for use in downstream applications. This is especially important to build and model missing pieces, such as partial side chains, missing loops and tails, and capping termini or chain breaks. For more details, please follow the directions in the Soup to Nuts tutorial.

Note

  1. By default, the SPRUCE - Protein Preparation Floe will try to find possible pockets when the input is an apo structure without a ligand. The resulting dataset might contain records each with one design unit and one pocket. This dataset can be passed to downstream preparation and simulation floes directly without specifying which record or design unit to use. The design unit in the first record is always used since all design units are the same in the all-atom structure.

  2. We can also obtain a biological unit to start a simulation from the SPRUCE - Protein Preparation Floe without enumerating pockets if Output all Biological Units is toggled On and Enumerate Pockets is Off as we illustrated in the HER2 example. We don’t recommend this method unless the preparation using default settings fails or pocket enumeration takes overly long. We cannot currently subset the target system to remove unnecessary parts such as sugars for a complicated biological unit.

Preparation of Reference Structure

During the simulation, a protein complex might drift or rotate in the solution, making it difficult to calculate the real-space correlation coefficient (RSCC) directly between the simulation and experimental maps without alignment. However, aligning 3D cryo-EM maps is very expensive, and the computing time and required memory scale as the power of three to the map box size. To accelerate this process at the simulation stage, the simulated conformations should be aligned with a reference structure fitted into or refined against the experimental map before generating simulated maps and calculating the correlation coefficients. We prepare the design unit for this reference structure in the same way as for the initial starting structure above.

Tip

  1. If the initial structure was derived from the related experimental map, it should be aligned to the map and can be used as the reference structure.

  2. To fit a structure into a cryo-EM map, you can use Chimera, Phenix, or other third-party software to align an all-atom model to a cryo-EM map using rigid-body fitting.

WEMD Simulation Preparation from a Design Unit

After the initial and reference structures are prepared, additional steps are necessary to run WEMD simulations successfully. These steps include adding explicit water or other mixed solvent species; setting the padding distance for a solvent; selecting force fields for proteins, ligands, and other components; and setting parameters related to WEMD simulations.

  • Use the output dataset from the SPRUCE - Protein Preparation Floe as the input to the Solvate and Equilibrate Target Protein Floe. More information about this floe can be found in the related tutorial in the Cryptic Pocket Detection Floes package.

  • Use the output dataset from the Solvate and Equilibrate Target Protein Floe as the input dataset for all related structural biology WEMD simulations.