Run a Weighted Ensemble MD Simulation

Quick floe search term: CPD A3a

For a protein with less than 200 residues, you should run the simulation for approximately 50 iterations in order to get sufficient sampling. For proteins with less than 600 residues, we recommend running the simulation for approximately 100 iterations. For this tutorial, we recommend running the simulation with a total of 50 iterations (broken up over the Run a Weighted Ensemble MD Simulation and Continue a Weighted Ensemble MD Simulation floes).

For production runs, we recommend running a short simulation initially with the Run a Weighted Ensemble MD Simulation floe and then use the Continue a Weighted Ensemble MD Simulation floe to extend the simulations. This facilitates checking on the simulation using the Perform Weighted Ensemble MD Analysis floe before committing more time and money to extend sampling.

Tip

Running 45 iterations of WEMD for the tutorial will typically take 15-20 hours to complete and cost approximately $300. After submitting this floe, take a break and check in the next day to make sure the job is complete before proceeding to the next step.

If you are the second person on your stack to run this tutorial, you can ask the first person to share their output data with you (e.g., via Team Data) and proceed with the other tutorials using the shared data. Then, follow the rest of this tutorial by running a short simulation with a total of 10 iterations (5 iterations the Run a Weighted Ensemble MD Simulation and Continue a Weighted Ensemble MD Simulation floes each) to save cost.

Tip

If your simulation is interrupted or cancelled, the collection will be left open and may appear to have a size of 0 MB. In order to close it, run the Continue a Weighted Ensemble MD Simulation floe on the open collection for an additional n iterations, where n = Maximum Iteration Number - Iteration Number from the last output record generated before the interruption. This will facilitate finishing the rest of the simulation and closing the collection correctly.

Caution

We recommend waiting for jobs from the Run a Weighted Ensemble MD Simulation or Continue a Weighted Ensemble MD Simulation floes to complete before running any analysis on the output collection.

Prerequisite: Selection of Protein Normal Modes (Progress Coordinates)

Before we start running the Run a Weighted Ensemble MD Simulation floe, we need to select the modes that the weighted ensemble simulation will use as progress coordinates.

The cryptic pocket opening that you are likely to observe depends on the conformational landscape you are exploring. Global modes often correspond to low-energy collective motions of the protein. These motions involve coordinated movements of multiple residues or domains, allowing the protein to explore its conformational space efficiently. In choosing your normal mode progress coordinates, you are determining which large-scale protein motions will be enhanced and thereby what part of the conformational landscape is sampled. If you run your simulations and do not see a desirable pocket, a different selection of normal modes would explore a different conformational ensemble and has the potential reveal a different subset of pockets. The selection of normal modes is a stage in the floes that can be revisited.

The normal modes calculated from the Calculate Normal Modes floe only describe the motion of the protein backbone. The side chain motions are governed by the MD forcefield. This is by design which enables comprehensive sampling of large-scale motions like domain shifts, hinge movements, and overall structural alterations, while ensuring the realistic sampling of smaller-scale motions.

While some global modes may not appear to directly correlate with the opening and closing of the pocket, it is important to note that global modes correspond to low-energy slopes. Enhancing sampling along these slopes could facilitate transitions between various conformational states, such as open and closed states of the pocket, by circumventing large energy barriers in between.

There are two ways to select the input modes that you will use as progress coordinates in the weighted ensemble MD simulation. If you have already inspected the modes visually in the Analyze page and know which modes you will be picking, you can create a dataset with just two modes on it using Option 1. Otherwise, use Option 2.

Tip

The Run a Weighted Ensemble MD Simulation and Continue a Weighted Ensemble MD Simulation floes support up to two input modes, and we recommend selecting two modes to ensure more extensive sampling.

Selecting Input Normal Modes (Option 1)

Navigate to the output dataset from the Calculate Normal Modes job. Open Tile View for the dataset. In Tile View, you will unfortunately be unable to view the protein with the depiction of the orange arrows representative of the mode vectors. However, you will be able to see the Mode Number and Mode Collectivity by clicking on the Columns icon and toggling those record fields to be visible.

../../../../../../_images/a3_wemd_tileview_columns.png

Selecting data from the record to be visible.

Now sort the records by setting the criterion to Mode Collectivity and order to Descending on the top right corner. In this tutorial, we shall select the first two records with the highest Mode Collectivity.

../../../../../../_images/a3_wemd_tileview_sort.png

Sorting the modes by collectivity, selecting the top two modes, and saving them to a dataset.

Caution

The values, e.g., Mode Collectivity from your calculations and therefore the ranking of the modes may not match the ones shown in the screenshots due to structural differences in the conformations generated by the Solvate and Equilibrate Target Protein floe.

Click the blue Save Selected Records button at the bottom of the page and then New Dataset in the pop-up window. Give the dataset a name that you will remember. This dataset will contain only the two modes that are going to be used as input for the weighted ensemble simulation.

Search and Run the Floe in Orion

Start by using the left hand vertical navigation tabs on your Orion home page to go to Floe page.

On the Floe page, click on the Floes tab, where you will find the list of the available floes and packages.

Click on a small caret next to Packages (under Filter Floes By section on the left) to expand the list of packages and click on the OpenEye Cryptic Pocket Detection Floes package. This will ensure that the floes listed in the middle of the page are from the Cryptic Pocket Detection package.

From this list, click on the Run a Weighted Ensemble MD Simulation floe, and then click on the blue LAUNCH FLOE button in the bottom right corner of the page to launch the job submission form.

Proceed to the Provide Input Files and Parameters to Run the Floe section for instructions on running the floe.

Selecting Input Normal Modes (Option 2)

Click on the blue navigation side bar Analyze tab. Make sure that the normal mode output dataset is the Active Datasets. Make sure that your Analyze page is set up to view the representative protein structure with the normal mode represented with orange arrows for the mode vectors. This is done by clicking on the Layout icon in the top right corner of the page and selecting Analyze with 3D (see the previous tutorial for details). Make sure that the x-axis of the scatter plot shows the Mode Number and the y-axis shows the Mode Collectivity. Follow the steps below to set up the normal mode input for the Run a Weighted Ensemble MD Simulation floe:

  • Clicking on a single data point on the scatter plot will display the representative structure and the normal mode vectors for that dataset record.

  • You can select multiple data points by using the click and drag selection box or by holding the command key on your keyboard and clicking on data points. Select the two highest collectivity modes.

  • Right click on one of the selected modes to activate a drop-down menu. Select the Send to Workfloe option.

  • A pop-up window will give you the option to select an existing workfloe or to search for a floe. Search and click on the Run a Weighted Ensemble MD Simulation floe.

../../../../../../_images/a3_wemd_send2workfloe.png

Send normal mode records directly to the floe from the Analyze page.

This will redirect you to the floe options page. The selected records will auto-fill the first input dataset field of the floe options. Make sure that they have been assigned to the Input Data - Protein Normal Mode Records field. You can change the assigned field by clicking on the inverted triangle on the Choose Input button for the correct field.

The following section gives instructions on providing the remaining inputs to run the floe.

Provide Input Files and Parameters to Run the Floe

Caution

If more than two modes are provided to Protein Normal Mode Records, the floe will automatically select the first two modes based on the order they appear in the dataset.

You can customize the file names under the Outputs options, but in this case, we have used the default values. Set the Weighted Ensemble Parameters - Iterations field to 45 (or 5 for the short run).

../../../../../../_images/a3_wemd_walkers_per_bin.png

Advanced Weighted Ensemble Parameters on the floe options page.

Once your floe parameters have been set, click on the Start Job button at the bottom of the window. The floe will generate two outputs: a dataset (default name: Protein Sampling Summary Table) and a collection (default name: Protein Sampling Data).

Tip

This part of the tutorial will take the longest time to finish running. We recommend running it overnight. You can check on your simulations progress using the instructions in the next section.

Monitoring a Weighted Ensemble MD Simulation (Optional)

You can keep track of the progress of your simulation job by checking the number of records in the output dataset, which equals to the number of weighted ensemble iterations that have been run. Alternatively, you can inspect the dataset as a SPREADSHEET in the Analyze page:

../../../../../../_images/a3_wemd_iter_num.png

Output dataset visualized as a spreadsheet in the Analyze page.

In the snapshot above you can see that Iteration Number (highlighted by a red box) is currently at 4 indicating the number of completed iterations to date. The Maximum Iteration Number field (highlighted by a blue box) keeps track to the total number of iterations for the simulation to complete. The Mode Numbers field indicate the modes that were chosen as the input progress coordinates.

You can also inspect a few summary statistics during the course of the simulation using the SCATTER plot in the Analyze page. This can be achieved by setting the x-axis to Iteration Number and y-axis to another column, such as Segment Count, Wall Clock Time, or CPU Time.

../../../../../../_images/a3_wemd_stats_per_iter.png

Segment Count versus Iteration Number. The data points show how many parallel simulations ran for each iteration.

Continue a Weighted Ensemble MD Simulation (Optional)

To continue the weighted ensemble simulation, click on the blue navigation side bar Floe tab and then click on the Floes tab at the top of the page. Filter the Packages - OpenEye Cryptic Pocket Detection Floes and select the Continue a Weighted Ensemble MD Simulation floe. Click the blue LAUNCH FLOE button.

This will open the floe options page. Select the output collection from the previous floe as the Input Data - Collection to this floe. Note that this floe will append data to the input collection. Input a value of 5 for Weighted Ensemble Parameters - Additional Iterations.

Click on the green Start Job button at the bottom of the page.