Calculate Normal Modes

Quick floe search term: CPD A2

Normal modes are orthogonal degrees of motion that the protein can access. Typically, the modes with lower frequencies tend to describe collective motions that engage large parts of the protein; therefore, they are referred to as the global modes [Bahar2010]. The modes are modeled based on the input PDB structure that has been relaxed using the Solvate and Equilibrate Target Protein Floe. As such, your global modes may not look exactly like the ones displayed in this tutorial, since your protein relaxation could have resulted in a slightly different structure.

Global modes are used as progress coordinates to drive sampling of a chosen protein in the Run a Weighted Ensemble MD Simulation Floe. Larger changes in protein backbone motions occur on timescales that are expensive to sample via brute force simulation. The hope is that by using global modes to drive the larger protein fluctuations, we can sample a broad range of conformations including those that reveal pocket opening. This tutorial demonstrates how to select global modes from a set of normal modes calculated using the input structure.

Tip

This floe typically takes less than five minutes to run, just enough time to get a cup of tea or coffee. It costs less than $2.

Search and Run the Floe in Orion

Locate the floe in Orion

Start by using the left hand vertical navigation tabs on your Orion home page to go to Floe page.

On the Floe page, click on the Floes tab, where you will find the list of the available floes and packages.

Click on a small caret next to Packages (under Filter Floes By section on the left) to expand the list of packages and click on the OpenEye Cryptic Pocket Detection Floes package. This will ensure that the floes listed in the middle of the page are from the Cryptic Pocket Detection package.

From this list, click on the Calculate Normal Modes Floe, and then click on the blue LAUNCH FLOE button in the bottom right corner of the page to launch the Job Submission Form.

Provide Input Files and Parameters to Run the Floe

  • Output path:

    Select the destination for your output data by specifying the Output path.

  • Input Data:

    You will need to provide the Solvated and Equilibrated Protein dataset generated by the Solvate and Equilibrate Target Protein Floe as an input.

  • Output Data:

    You can customize the output dataset and collection names under the Output Data options.

  • System Parameters:

    This option should be used when you wish to perform a normal mode calculation on a selected region of the protein instead of the entire protein. This becomes useful when you want to focus the enhanced sampling and cryptic pocket search on a region of the target protein. In practice, the input for the System Selection String would look like A:1~150 when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries using the Add more button provided below the input box.

  • Advanced System Parameters:

    The Environment Selection String option should be used only when an input is given for System Selection String. Environment Selection String should be left empty to include the effect of the entire protein as the environment for the normal mode calculation. You can alternatively choose a sub-region of the protein as the environment. This selection may or may not include the region selected for System Selection String, as the floe will automatically detect and remove overlapping regions. The figure given below depicts environment selection depending on the input provided. System-Environment Framework specifies the method for performing the system-environment calculation. The default is “Reduce”. See below for a detailed discussion on the system-environment calculation.

  • ANM Parameters:

    The normal modes are calculated using the Anisotropic Network Model (ANM) [Atilgan2001]. Cutoff Distance and Spring Constant have been set to optimal values by default. Since the Run a Weighted Ensemble MD Simulation Floe only takes up to two modes as the progress coordinates, it would make sense to set Maximum Number of Selected Modes to 2 if you would like the floe to automatically rank and select two global modes based on the criterion specified by Mode Filtering Property. By default, this is set to Collectivity (see below for details about Mode Collectivity). Emptying Maximum Number of Selected Modes disables the automatic mode filtering, which is the default setting.

Note

Use of System Selection String is recommended when dealing with a multi-chain system and if you wish to neglect the effect of entire multi-meric protein on the dynamics of the selected region. Keep in mind that by selecting a sub-region for normal mode calculation, you might not be able to effectively sample slow motions associated with the excluded regions. You may also use the environment selection option to ignore the effect of floppy regions during normal mode calculation of the selected region. Excluding floppy regions should be a reasonable choice since such regions tend to have relatively fast structural fluctuations and their dynamics can often be sampled within short time scales.

../../../../../../_images/a2_nma_calculation.png

Job Submission Form

Click on the green Start Job button a the bottom right corner of the page.

Visualize and Analyze Normal Modes

When the floe job is complete, the output normal mode analysis dataset should be inspected using the Analyze page to identify the best normal modes to use in the next floe. You can get to the dataset by clicking on the normal mode analysis job. Click on the Floe tab on the blue navigation side bar and then click on the Jobs tab at the top of the page. Then click on the job that you want to inspect. Under Results, click on Show in Project Data to see the dataset associated with the job.

This will redirect you to the Data navigation side bar tab and show only the dataset associated with the job. Click on the icon of a blue circle with a + symbol that is next to the dataset name. It will change to a green circle with a white checkmark and will allow you to view the dataset in the 3D Modeling page and the Analyze page.

Click on the blue navigation side bar Analyze tab. Make sure that the Active Dataset is set to the normal mode analysis dataset. On the scatter plot on the Analyze page, use the Mode Number for the x-axis and use the Mode Collectivity for the y-axis. Click on the Layouts button in the top-right corner and select the Analyze with 3D option. This shows a visual representation of the mode vectors as orange arrows on a representative structure of the protein taken from the solvated and relaxed protein structure supplied for the normal mode analysis.

../../../../../../_images/a2_nma_with_3d.png

Normal Mode Analysis Dataset

Select Normal Modes for Weighted Ensemble MD Simulation

Mode numbers are assigned to the modes ranking them from the lowest frequency modes to the highest. By default, if you have not set a value for Maximum Number of Selected Modes above, the top 20 modes (out of thousands) are calculated. As mentioned above, modes with the lowest frequency, or equivalently, the highest magnitude, tend to describe global motion. However, low frequency modes may also correspond to modes where free loops or trailing protein tails dominate the normal mode. This is called the “tip effect.” To avoid issues caused by the tip effect, we recommend picking the modes with high Mode Collectivity among the top 20 modes using the graph mentioned above.

The tip effect can also be avoided by selecting a subset of residues on the protein that exclude loops and/or portions of the protein that you know will dominate the normal modes. In the floe, this can be done by specifying the range(s) of residues that are considered as part of the system, excluding the tail region, using the System Parameters - System Selection String parameter. The general syntax follows

[chain_id]:[from_res_num]~[to_res_num]

In practice this would look like A:1~150 when selecting residue numbers 1 to 150 on chain A of the input protein. Omitting either from_res_num or to_res_num allows for open-ended selections. For example, A:10~ can be used to select any residue with a residue number greater than or equal to 10. A:~150 will select residues up to residue 150. chain_id can be also omitted, so that the selection will be applied on every chain. For example, 1~150 will select residues from 1 to 150 for all chains.

Note

The selection is made based on the input structure, so please make sure to get the residue numbers from the design unit you input to the Calculate Normal Modes Floe.

The Calculate Normal Modes Floe also provides more advanced methods for dealing with the tip effect using the system-environment methods. In short, with this approach, the protein is divided into two parts, the system and the environment. Then, normal modes for the system can be calculated with the effects from the environment being considered. This way, the tail regions can be labeled as the environment, so that their motion is excluded from the modes of the rest of the protein (system), while their effects on the rest are still incorporated into the modes. This method can be turned on by setting the System Parameters - System Selection String to the main part of the protein (i.e., non-tail region). The rest of the system will be considered the environment unless you use Advanced System Parameters - Environment Selection String to specify the range of the environment using the same syntax as described above.

Last but not least, two system-environment methods are provided by the floe. The “Slice” method performs the original normal mode analysis on the whole system and simply extracts the subsystem portion of the motion from each mode. The “Reduce” method calculates the modes of the subsystem with vibrational energy terms from the environment being minimized and integrated out [Woodcock2008]. One of these two methods can be chosen under Advanced System Parameters - System-Environment Framework. The default option is “Reduce.”

../../../../../../_images/a2_nma_system_env_selection.png

The System and the Environment for Normal Mode Calculation

In this example, we have not specified any residues in the System Parameters - String Selection or Advanced System Parameters - Environment Selection String fields. As such, the calculation was performed on the whole protein. Inspecting the results of the analysis on the Analyze page allows you to identify modes that are desirable. In the event that you do not have modes that you specifically know will open and close a desired portion of the protein, we recommend choosing modes with high Mode Collectivity that you have inspected to ensure that the mode vectors (shown in orange) are sensible.