Frequently Asked Questions
Questions Related to WEMD Simulation
How do you select the starting structure for WEMD simulations?
Starting structures can be obtained from several resources:
They can be downloaded from public resources such as the RCSB PDB website. Generally, those structures are derived from experimental data such as X-ray crystallography, NMR, and cryo-EM. Recently, computational models have also been included.
They can be built from ab initio structure builders such as AlphaFold(2), OmegaFold, and RoseTTAFold. Some preprocessing may be needed to fit the related cryo-EM maps.
They can be obtained from previous simulations or rebuilt from internal experimental data.
How do you select the interation number for WEMD simulations?
By default, the interation number is set to 100. However, protein dynamics are very complicated in real applications, and different systems might exhibit conformational changes on different timescales. Generally, slower dynamics require more iterations of WEMD simulations to be observed. Due to the diversity of biomolecules, we will leave the final decision to the user based on the suggestions below:
Theoretically, it is impossible for a simulation to reach a global equilibrium in a finite time. So all simulations of large and complicated systems are conditional to the initial structures and the simulation time. In general, we observe more diverse ensemble structures and higher energy minima as the iteration number increases. But we might obtain a local equilibrium around important minima in a finite timescale by examining the evolution of progress coordinates or the KL divergence of the associated probability density distributions.
In practice, it is better to run a short simulation first (10 or 20 iterations) using the two Automated WEMD Simulation and Best Structure Search floes to estimate the simulation time and cost, then check the simulation report and analysis results.
For a longer simulation, it can be achieved iteratively, launching the Continue WEMD Simulation Guided by CryoEM Maps Floe repeatedly until the desired results are obtained, including good KL divergence or no obvious changes on the pocket prediction.
How do you select the interation interval for WEMD simulations?
The interation interval (the length of each MD segment in one WEMD iteration) is set to 10 ps by default. This value is preferred because most simulated systems from structural biology are quite large (greater than 1000 amino acid residues, not including other components such as solvents). For small systems containing less than a few hundred residues, we suggest increasing the iteration interval to 100 picoseconds. Due to the nature of cloud computing, each short MD simulation can induce an overhead cost; the percentage of overhead becomes smaller for larger systems in general. For serious consideration, it is better to launch a few short simulations with different settings to estimate the simulation time and cost first.
Questions Related to Cryo-EM Maps
How do you get cryo-EM consensus maps, reference maps, mean maps, and eigenmaps?
Cryogenic Electron Microscopy (cryo-EM) is not a new technique, but recently the “resolution revolution” in cryo-EM [Kühlbrandt-2014] has enabled researchers to obtain near-atomic resolution 3D density maps for many large protein complexes, in comparison with low-resolution maps having only global shapes or secondary structures of proteins. Due to the rapid freezing, cryo-EM raw 2D images obtained from an experiment might contain biomolecular information in different compositional or conformational structures. Postprocessing techniques can reconstruct and refine 3D density maps from thousands or millions of these noisy 2D images using different software packages such as RELION, EMAN, cryoSPARC, cryoDRGN, and RECOVAR. These density maps can then be used to derive all-atom structural models of the proteins themselves, most of which have been uploaded to the RCSB website, often with the derived structures.
Routine reconstruction of 2D particle images can produce one consensus map with various resolutions depending on the quality of raw images. We can perform the simulation using the Automated WEMD Simulation and Best Structure Search Guided by Target CryoEM Map Floe with the RSCC between the simulation map and the consensus map as the progress coordinate and obtain a structure ensemble more consistent with experiments, even when the consensus map has a low resolution. This consensus map is called a target map when it is used to create the progress coordinate of RSCC.
Cryo-EM packages such as RELION, EMAN, cryoDRGN, and cryoSPARC are able to perform 3D classification analysis and obtain a few distinct 3D maps corresponding to different conformational states. We can build the transition paths among them from a series of WEMD simulations using the Automated WEMD Simulation and Best Structure Search Guided by Target CryoEM Map Floe using RSCC as progress coordinates. Generally, these heterogeneous maps are not called consensus maps but target maps when they are used to set up the progress coordinates of RSCC.
Recently, some cryo-EM packages such as cryoDRGN, cryoSPARC, and RECOVAR have been able to perform variability analyses focusing on the representation of continuous states for cryo-EM maps in terms of the mean maps and eigenmaps. Using these output mean maps and eigenmaps, we can launch the Automated WEMD Simulation and Best Structure Search Guided By Eigen CryoEM Maps Floe to explore the conformational changes corresponding to the dimensions of the eigenmaps.
When searching for the best structures, the related input maps are referred to as reference or experimental maps. They can either be different from or include the consensus map or target map for setting up the progress coordinates.
How do you choose a contour value of cryo-EM maps for masking the noise region?
Experimental cryo-EM maps include many noise densities. Cryo-EM maps of biomolecules are typically smeared out by noise grids when visualizing a map where the contour value is not set at a low threshold, below which density grids will be masked out. Different, more complicated software might apply different strategies to normalize the whole cryo-EM map. We have implemented a few methods to separate the noise region:
For a quick extraction of the density region corresponding to a biomolecule, software such as Chimera selects density grids of the top 99th percentile by default. We use a similar method to rescale the simulation map to the experimental mean map for the Automated WEMD Simulation and Best Structure Search Floes using mean maps and eigenmaps.
For calculating RSCC, we don’t need to normalize the maps to each other, but we still need to mask out the noise region for a better value of RSCC. We have two different ways: threshold, below which values density grids are masked out; and std, where density grids with a value below a few standard deviations above the mean are masked out.
We should mention that it is not necessary to optimize the best setting for exploring conformational space unless we need to achieve high RSCC values and compare them between different systems. The easiest way is to use the default setting and apply a small nonzero threshold for the simulation map and a suggested threshold for the experimental map from the RCSB website.
How do you select a mask type for RSCC calculations?
We can apply different masking strategies to both simulation and experimental maps. When calculating RSCC, we need to use a combined selection to include common grids from both of them. We have several options to combine masks between the simulation trajectory and reference cryo-EM map: union uses the union mask from two maps; overlap uses the overlap mask between two maps; reference uses the mask from reference experimental map; and trajectory uses the mask from the simulation trajectory. We suggest using the default (trajectory) first, although higher RSCC values might be achieved if other options are explored on different systems.
Can I select some parts of the experimental map to calculate the RSCC?
For complicated systems, the experimental map might contain components, such as the density from membranes, DNA, or nanobodies, that are not necessary to simulate an all-atom model. Currently we do not have the option to mask out those densities based solely on an experimental map. But we can achieve similar effects by using the mask type from the simulation trajectory if we include only the necessary parts to build a starting structure for the corresponding WEMD simulation, assuming that we have a good reference structure fitted onto the related map parts.