Nonequilibrium Switching

The Nonequilibrium Switching (NES) method is a relatively novel method in the Binding Free context to calculate Relative Binding Affinities (RBFE) of a given target and its ligands. The theory was developed during the 1990s 1 2. However, due to its high computational demand, the approach has not been fully explored, and few pioneering works have been published so far 3.

In general, the relative binding affinity \(\Delta\Delta G\) is defined as the free energy difference between the binding affinities of a ligand A \(\Delta G_{A}\) and B \(\Delta G_{B}\) related to their target: \(\Delta\Delta G=\Delta G_{B}-\Delta G_{A}\). Nevertheless, the direct computation of these affinities could be quite challenging because a direct binding process has to be simulated in-silico and often, other thermodynamics paths are used. During the last 30 years, Alchemical methods have been proved to be very effective to calculate the RBFE between two ligands and, in this context the ligand pair involved in the RBFE calculations are named as an Egde. In this approach, a starting ligand A is “mutated” into a final one B in different environments. For example, \(\Delta\Delta G\) can also be computed following the alternative thermodynamics paths shown in the figure below where \(\Delta G_{Bound}\) and \(\Delta G_{Unbound}\) are estimated by using alchemical methods: \(\Delta\Delta G=\Delta G_{Bound}-\Delta G_{Unbound}\).

RBFE and alternative thermodynamics paths

RBFE and alternative thermodynamics paths

In the Unbound path the starting ligand is mutated into the final one just in solution while, in the Bound path, the mutation happens in the complex binding site. In the NES approach, these mutations are done in a non-equilibrium regime many times starting from equilibrium snapshots and building the forward and reverse work probability distribution functions that can be used to estimate \(\Delta G_{Bound}\) and \(\Delta G_{Unbound}\). For this reason, the NES methodology requires prior to run to have the equilibrium ensembles for the Bound and the Unbound systems.

In addition, the implemented NES floe-protocol tries to estimate the binding affinities from the computed RBFEs by using the maximum likelihood estimator method 4 and, making possible to compare the predicted affinities values with experimental results derived, for example, from activities measurements, IC50 etc.

Warning

RBFE calculations via Alchemical methods must be carefully used. In particular, mutations should be carried out between similar ligands i.e. where a common ligand scaffold can be identified and small different functional groups are mutated.

The Equilibration and Non-Equilibrium Switching floe

The Equilibration and Non-Equilibrium Switching floe can be divided into four main sections shown here.

The Equilibration and Non-Equilibrium Switching floe

The four main sections of the Equilibration and Non-Equilibrium Switching floe

The first and second floe sections are designed to assemble and run the Bound and Unbound simulations. The protocol for the Bound simulations is similar to the tutorial on the Short Trajectory MD with Analysis and it is not repeated here. In the Unbound simulations, each ligand is charged, solvated in a box of solvent and parametrized accordingly to the selected force field and then is equilibrated in three different steps performing: restrained minimization, NVT and NPT MD runs. At the end of the equilibration stages, both the Bound and Unbound prepared flasks are set into an equilibrium production stage where they run for a total of a default 6ns.

The third section of the floe carries out the NES calculations and is shown here

The Non-Equilibrium Switching floe section

The Non-Equilibrium Switching floe section

In this section, important cubes are the Gathering cube which selects the equilibrium runs involved in a RBFE calculation and handing them to the Chimera cube. In this cube, a chimeric molecule between the ligands participating in an edge is created (the ligands are topologically merged together with their force field parameters) and injected into selected equilibrium trajectory frames collected during the equilibrium runs. In the NES method, for each chimeric molecule (or edge) the following switching simulations are performed per each selected equilibrium frame:

  • A Bound forward (ligA is mutate to ligB in the complex binding site)

  • A Bound reverse (ligB is mutate to ligA in the complex binding site)

  • An Unbound forward (ligA is mutate to ligB just in solution)

  • An Unbound reverse (ligB is mutate to ligA just in solution)

By default, 80 equilibrium frames are selected for the Bound and Unbound runs therefore, for each edge a total of:

80 (Bound forward) + 80 (Bound reverse) + 80 (Unbound forward) + 80 (Unbound reverse) = 320

switching mutations are completed. Each run is effectively made with a short NPT equilibration of 5ps to adapt the equilibrium frame system to the new chimeric molecule followed by a default 50ps NPT ensemble simulation where the chimeric molecule is switched between the starting and final ligand thermodynamic states. All the runs are done by using Gromacs as MD engine at this stage.

The final NES-floe section analyzes the NES data to produce binding affinities and relative binding affinities results.

The NES Analysis floe section

The NES Analysis floe section

For each edge, the forward and reverse works for the Bound and Unbound switching are evaluated by using the Bennet Acceptance ratio for Non-Equilibrium and these values are used to compute the \(\Delta G_{Bound}\) and \(\Delta G_{Unbound}\) free energies that are related to \(\Delta\Delta G\). The RBFE values for an edge are used to attempt estimates of the affinity values by using the maximum likelihood estimator. However, this approach succeeds if the provided ligand edge map is enough connected, otherwise no estimates will be done.

Warning

\(\Delta Gs\) are estimated from \(\Delta \Delta Gs\) by using a maximum likelihood estimator however, the estimate will fail if the edge mapper provided does have disconnected parts. This circumstance could also happen due to edge failures along the Chimera molecule assembly or along the NES switching

Protein, Ligand and Edge mapping file inputs

As for the Short Trajectory MD floe, the NES floe requires ligands to have reasonable 3D coordinates, all atoms, and correct chemistry. In particular, bond orders and formal charges should be correctly assigned. The floe can be directly input from docking programs like Posit but, bad clashes should have been relaxed prior to input the floe to resolve high gradients with the protein or other components like cofactors. In general, bad poses will evaluated and eventually rejected at floe running-time.

The NES floe requires correctly prepared proteins up to “MD ready” standards which requires chain capping, all atoms in protein residues (including hydrogens) and missing protein loops resolved or capped. Protein side chain formal charges and protonation at this point determine their tautomeric state. Additionally, cofactors and structured internal waters are also important to be included. We strongly recommend using Spruce for protein preparation.

In addition to the protein target and its ligands the NES floe requires the ligand mapper dataset produced by the Edge Mapper for RBFE calculations floe introduced in the MD Affinity pkg starting from the v5.0.3. This dataset contains the edges information for the relative binding affinity calculations to be carried out.

For this tutorial the Tyk2 receptor and few ligands have been selected. The files can be download below with the ligand dataset file as well.

How to use the floe

After selecting the Equilibration and Non-Equilibrium Switching floe in the Orion UI, you will be presented with a job form with inputs, outputs and parameters to select. In the next Figures you can see how we filled out the key fields for the Tyk2 receptor case.

The NES floe inputs

The NES floe inputs

As first step the user should select the folder where results will be saved. In this case the Tyk2_NES_Tutorial folder was used. The Equilibration and Non-Equilibrium Switching floe requires two mandatory inputs and two optional inputs. The mandatory inputs are the ligand and the ligand edge mapping datasets. The two optional inputs are the protein input file and the experimental binding affinity text file. However, If the optional protein is not provided, the floe will check if the protein is present on the ligand datasets in form of Design Unit produced by Spruce. If this data cannot be found an error will be raised and the floe will fail. The OpenEye Posit floe is able to produce datasets in this form otherwise the user must provides a protein as input as well. In the case that the protein is provided as input and, also the protein is present on the ligand input datasets the protein input will supersede. The other optional input is the experimental affinity file which will be used to generate comparison plots and tables between experimental and predicted results for \(\Delta\Delta Gs\) and \(\Delta Gs\) in the floe reports.

Warning

The protein input is optional but, if it is not provided the protein must be present on the ligand input datasets as Spruce Design Unit.

In order to submit the floe the user has to fill out the Orion output datasets section with meaningful names.

The NES floe outputs

The NES floe outputs

The Equilibrium Bound and Unbound dataset outputs are the datasets produced at the end of the Equilibrium or Analysis runs. If the provided edge mapping file is well connected, the NES floe can predict Affinities, and these results are saved in the Affinity dataset output. The NES and Failure datasets are also produced as outputs. The first contains all the information produced along the NES runs at edge and ligand level while, the latter gathers all the failures produced along the whole floe for debugging purposes. Finally, the user must provide a dataset name for the Recovery dataset. This dataset can be used to try to recover and generate partial results from the NES runs by using the designed recovery Non-Equilibrium Switching Recovery floe.

The final NES floe selection is related to the promoted parameters

The NES floe parameters

The NES floe parameters

Their meanings are explained below:

  • NES Switching Time. This parameter (Default 50ps) controls the time length of the NES switching. For difficult and large mutations, this parameter could be used to try to have better bound/unbound work convergence.

  • Total Number of NES Trajectory Frames. This parameters (Default 80) controls how many snapshots are taken from the Bound and Unbound Equilibrium trajectories to run the NES switching. For example, suppose that the Bound equilibrium runs for 6ns production and it is collected a total of 1000 frames. From these frames, 80 equally distanced frames are selected (~each 12 frames). The chimeric molecule is injected into these frames to run the forward and reverse Bound and Unbound runs.

  • Max Number Of Mapper Edges Allowed. This parameters (Default 100) controls how many edges are allowed to run. It is a safeguard parameters to avoid unexpected large costs.

  • Ligand Affinity Experimental File. The experimental text file (Default None) with the binding affinity in kcal/mol or kJ/mol. The syntax of this text file is strict. Each line entry must be in the syntax form:

    • ligA \(\Delta G\) \(\Delta G_{error}\) units

    where ligA is the molecule title name, \(\Delta G\) the binding affinity value, \(\Delta G_{error}\) the binding affinity valuer error and units in the syntax form of kcal/mol or kJ/mol. The \(\Delta G_{error}\) is optional and if not provided will not be used. An example of this file for the Tyk2 receptor can be downloaded here:

  • Protein Name. The protein name (Default blanc) used to identify your flask. This name will be used for the produced output file names and other information.

  • Assign Ligand Partial Charges. If True (Default True) the ligand will be charged by using the ELF10 methods otherwise the ligand partial charged will be used (if any).

  • Custom Ligand Force Field File: One or more SMIRNOFF XML files defining the force field to be applied to the ligand. This input is required when Ligand Force Field is set to Custom.

  • Ligand Force Field. The ligand force field to be used (Default OpenFF 2.0.0).

  • Protein Force Field. The protein force field to be used (Default Amber14SB).

  • Hydrogen Mass Repartitioning. If true the MD time step used along the equilibrium runs will be set to 4fs otherwise to 2 fs (Default True).

  • Equilibrium Running Time. The total equilibrium time (Default 6ns) for the Bound and Unbound simulations.

Understanding the Results from NES

The results from the floe are accessed via three main floe reports at the end of the running NES job and selectable in the Jobs tab in the Orion Floe page. For this tutorial we will focus on the results produced by running the Tyk2 receptor and ten selected relative binding affinity calculations.

The NES floe finished Job

The NES floe finished job

From the NES run job page, the NES Report is accessible by clicking on the Floe Report Tabs. Here, many pieces of information are shown, but the most relevant are the edges related to the submitted RBFE calculations in form of tiles. Each tile shows the edge and the predicted RBFE by using the Bennet Acceptance Ratio method for Non-Equilibrium. In addition, other relevant pieces of information are shown, like the total floe running time and its cost. Clicking on each tile will show important information related to the RBFE calculation. For example, in the below figure, the detailed calculation information are shown for the edge involving the ligand ejm_46 to ejm_54

A good RBFE calculation

A good RBFE calculation

We have two main plots and tables. The upper and lower plots are respectively related to the Bound and Unbound NES simulations. We are going to focus on the Bound NES simulations but, the same considerations are true for the Unbound one. The blue and red graphs are plotting the calculated Forward and Reverse NES switching works for each selected starting equilibrium frames. The important message to take is that if the the graphs overlap well then this is a good indicator that the computed free energy change is trustworthy and accurate compared to experimantal results. The probability distribution plots are made by binning the work values for the forward and reverse works and again if the two probability distributions overlap it is a good sign that the calculation was successfully. The work distribution overlap is also estimated by using BAR and it is reported in the work plot distribution and in a table. Usually, an overlap value greater than the 0.2 threshold is considered to be a good overlap. Analog considerations are valid for the Unbound graph. By using the work values recorded along the switching it is possible to estimate the free energy changes \(\Delta G_{Bound}\) and \(\Delta G_{Unbound}\) shown in figure_RBFE and indirectly estimates \(\Delta\Delta G = \Delta G_{Bound} - \Delta G_{UnBound}\). These computed values are reported in a table as well.

The figure below reports the RBFE calculation involving the ligand ejm_43 to ejm_54

A bad RBFE calculation

A bad RBFE calculation

In this case, both the Bound and Unbound graphs and work probability distributions do not overlap well and the results could be unreliable. In addition, in this case, the overlap estimate does not help. Indeed it seems to be good (>0.2) however, the distributions for the bound and unbound are bimodal suggesting that something wrong is happening along the switching.

From the NES job page, another report, the Edge Floe Report is accessible from the tabs. This report shows the edge graphs for all the compounds involved in the edges.

NES edge floe report

NES edge floe report

Foe each edge the predicted (P) and if provided the experimental (E) \(\Delta \Delta Gs\) are shown. In addition for each compound the predicted (P) and if provided the experimental (E) \(\Delta Gs\) are also reported. Users should also be able to click on the graph edges and visualize the work distributions as well. Edges are also colored based on the BAR work overlap estimate. In particular if both bound and unbound work overlap scores are greater than 0.2 then the edges are displayed in blue; if at least one of the bound or unbound work overlap score is less than 0.2 and the other is greater than 0.2 the edges are displays in yellow; finally if both work overlap scores are less than 0.2 then the edges are colored in red. An example of edge coloring is shown below for the CDK2 dataset edges:

NES edge coloring

NES edge coloring

Finally, from the NES job page, the Affinity Report, is selectable from the tabs. The affinity report shows two main sections. In the first section, experimental and predicted affinities are compared. Here, a graph between \(\Delta G_{Experimental}\) vs \(\Delta G_{Predicted}\) is shown. This graph is available if the experimental affinity file has been provided and the ligand edge map is connected enough to be able to make affinity predictions.

Experimental vs predicted affinities

Experimental vs predicted affinities

In addition, the graph data are tabulated, and different statistical metrics are shown, such as correlation metrics and linear models. The second section of the Affinity Report shows a comparison between the experimental and predicted relative binding affinities \(\Delta\Delta G_{Experimental}\) vs \(\Delta\Delta G_{Predicted}\) in a graph and tables with the statistical metrics as well.

Experimental vs predicted relative binding affinities

Experimental vs predicted relative binding affinities

In addition to the NES and the Affinity floe reports, the results themselves are also uploaded to Orion S3 in .tar.gz file format ready to be locally downloaded. These files, contain all the nes and affinity data provided in .csv file formats that can be used for a more close analysis and the .html reports. Below are attached the result files generated along the Tyk2 receptor NES run:

References

1

Jarzynski, C. (1997), “Non-Equilibrium equality for free energy differences”, Phys. Rev. Lett., 78 (14): 2690

2

Jarzynski, C. (1997), “Equilibrium free-energy differences from non-equilibrium measurements: A master-equation approach”, Phys. Rev. E, 56 (5): 5018

3

Vytautas Gapsys, et al., “Large scale relative protein ligand binding affinities using non-equilibrium alchemy”, Chem. Sci. (2020) 11, 1140-1152

4

Huafeng Xu “Optimal Measurement Network of Pairwise Differences”, J Chem Inf Model. 2019 Nov 25;59(11):4720-4728