Introduction

Proteins, and enzymes in particular, are dynamic 3D structures that must move to perform their biological functions [Kay-2019]. These dynamical properties are exhibited on multiple time scales, varying from picoseconds to seconds or more ([Palmer-2015]; [Boehr-2006]). For example, mutations in proteins can induce changes in protein structure and dynamics that significantly alter the protein’s free energy landscape and change the kinetics of functional motions. These changes can result in cancer or other diseases. Understanding the effect of mutation on the structure and dynamics of proteins and enzymes can be a crucial part of the drug discovery process. In addition, many protein targets related to diseases can be considered “undruggable” if drug-related binding pockets are cryptic: that is, they appear transiently or are short-lived [Vajda-2018]. In short, better understanding of the structural diversity and dynamical properties of biomolecules is a crucial part of predicting and validating possible targets and characterizing binding pockets for drug discovery [Kuzmanic-2020].

Several experimental techniques can be used to derive all-atom 3D models of biomolecules [Kay-2019]. However, each method has its limitations and most of them work better for a single model of a rigid structure. Cryogenic electron microscopy (cryo-EM) is capable of capturing (often quite noisy) 2D projections of randomly oriented 3D protein structures which are frozen rapidly in solution. Postprocessing techniques can reconstruct and refine 3D density maps from thousands or millions of these noisy 2D images. These density maps can then be used to derive all-atom structural models of the proteins themselves. Additionally, due to both the rapid freezing and to the fact that data is recorded from hundreds of thousands to millions of individual particles, cryo-EM data naturally contains information about the biomolecules’ native conformational ensemble. Recently, the “resolution revolution” in cryo-EM has enabled researchers to obtain near-atomic resolution 3D density maps for many large protein complexes [Kühlbrandt-2014], and it has become an important tool in molecular biology and drug discovery [Renaud-2018]. Machine learning algorithms have also been applied to study conformational variability and heterogeneity of biomolecules, but it is sill a challenge to obtain (and validate) population and kinetic information [Tang-2023].

Weighted ensemble molecular dynamics (WEMD) methods ([Huber-1996]; [Zuckerman-2017]) are an effective way to sample rare events while maintaining an MD trajectory with continuous and unperturbed kinetics. This method can reveal transient or metastable states which may contain druggable pockets. WEMD simulation can generate thousands of unperturbed transitions through parallel simulations of many short trajectories (called “walkers”), which are merged, split, and probabilistically reweighted during successive iterations in such a way that exploration of a chosen degree of freedom (called the “progress coordinate”) is encouraged, and an extracted MD trajectory is continuous without applying jumps in temperature or external forces. For example, the Cryptic Pocket Detection Floes use normal modes as general progress coordinates to accelerate conformational sampling and predict cryptic pockets.

For the Structural Biology Floes package, we combine WEMD simulation with cryo-EM experimental data. Cryo-EM data are used to guide the WEMD simulations in areas of conformational space that are more relevant to experimental data, such as sampled structures that agree with target cryo-EM maps or transition paths between distinct maps. For this purpose, we designed progress coordinates to use cryo-EM maps available from experiments as discussed below.

  1. When only one consensus map is available, the correlation coefficient between the simulated map (calculated from a conformational snapshot of simulated protein) and the consensus map is used as the progress coordinate.

  2. When the mean maps and eigenmaps from 3D Variability Analysis or RECOVAR analyses are available, the projection coefficients of the simulated maps onto the eigenmaps are used as the progress coordinates.

For both cases listed above, we start a WEMD simulation using an initial structure which can be built from various ab initio structure builders (e.g., AlphaFold(2), OmegaFold, RoseTTAFold) or an all-atom model refined against cryo-EM or other experimental data. The WEMD simulation can drive the starting conformer toward the experimental maps. In this sense, it performs a kind of structural refinement. However, it should be emphasized that no additional forces are added which bias the simulation toward agreement with provided cryo-EM maps. The WESTPA simulation method simply encourages exploration along the progress coordinate specified and accelerates the transitions to reach a better equilibrium distribution than normal MD simulations on a short time scale. In addition to obtaining a conformation with a better fit to a cryo-EM map, we can also create an ensemble of structures that include clusters of conformations around certain regions on the explored conformational landscape, as well as trajectories representing transition paths between different regions.

For traditional structural refinement, a high resolution map is crucial to obtain a well-defined all-atom model. As the resolution decreases, the quality of single structure all-atom refinement decreases, as there is insufficient evidence to support one conformation over another similar one. For WEMD simulations, a low resolution map is still useful in that it provides information to interpret the conformational space explored using global shape or secondary structure properties. A WEMD simulation produces a diverse ensemble of structures nonetheless consistent with the provided maps. In short, the ensemble structures obtained from a WEMD simulation reflect the equilibrium distribution of a protein in explicit solvents with many diverse conformers that are consistent with experimental cryo-EM maps; these maps might not be available for normal MD or WEMD simulations without the guidance of experimental maps. These more diverse ensemble structures provide more opportunities to find druggable cryptic pockets from metastable transient states or paths between states on the energy landscape.

Our WEMD simulations can be performed in explicit water or mixed solvents to allow for the detection of cryptic pockets using the Cryptic Pocket Detection Floes package. When successful, the cryptic pockets can be used as input for other OpenEye drug discovery tools such as Gigadock and related floes in the Large-Scale Floes package and SiteHopper in the Protein Modeling package.