A Small Molecule Membrane Permeability Calculation¶
The OpenEye Permeability Floes are designed to be a user-friendly way to understand the mechanism of passive membrane permeability. The starting input is a small molecule (2D or 3D) on a record in a dataset, which is turned into a set of input systems (basis states) containing the molecule, a pre-equilibrated POPC membrane, and a layer of water to solvate the entire system. These basis states are used in weighted ensemble (WE) molecular dynamics (MD) simulations to predict the mechanism of passive permeation, as well as provide an estimate for the permeability coefficient.
A permeability estimate using this Floe package can be useful for the following reasons:
Permeability liabilities can hinder progress of a promising lead candidate, and this tool will provide insight into the mechanistic process that prevents or permits acceptable permeation.
Detailed knowledge and analysis of the permeation pathways provide insight into the bottleneck regions for a common lead series, which could help guide a medicinal chemist toward rational drug design for permeability.
The WE-MD method applies MD simulations and WE algorithm in an iterative fashion. In this tutorial, a small molecule (tacrine) will be prepared and run for a short simulation (10 iterations of 100 ps each in terms of molecular time). This short simulation will be analyzed using the analysis Floe, and the resulting data will be compared to a larger, pre-generated output dataset of the same molecule.
This tutorial uses the following Floes from the OpenEye Permeability Floes package:
Create a Tutorial Project¶
Note
If you have already created a tutorial project you can re-use the existing one.
Log into Orion and click the home button at the top of the blue ribbon on the left of the Orion Interface. Then click on the “Create New Project” button and in the pop up window enter Tutorial for the name of the project and click “Save”.
Launch a Permeability Simulation Job¶
Note
If you have already created a dataset for use in the Permeability - Run Permeability Simulation Floe, you can re-use it.
In the Permeability - Run Permeability Simulation Floe, any 2D or 3D molecule can be used. However, for sampling concerns, users are encouraged to use molecules that fall within Lipinski’s Rule of Five. In this tutorial, a rigid small molecule acetylcholinesterase inhibitor, tacrine, will be used. The user will be instructed on how to prepare a molecule using the 2D sketcher, and will run the simulation for a short period (10 WE iterations).
The steps to prepare the dataset and run the permeability are detailed below.
Create Initial Tacrine Structure¶
An input dataset must either be select or generated on-the-fly. Here, we will generate tacrine on-the-fly using the 2D sketcher.
Click on the Input Dataset button (Choose Input…), and click on the 2D Sketcher tab.
Copy and paste the SMILES for tacrine into the 2D Sketcher:
Nc1c2c(nc3ccccc13)CCCC2
.
Set the molecule name to “Tacrine” in the “Name this Molecule” field, and then click the “Use dataset as input” button.
Warning
All the Permeability Floes can only run one molecule at a time, so if your dataset contains multiple records, only the last one will be taken as the input by the Floe.
Locate the Simulation Floe¶
Click on the “Floes” button in the left menu bar.
Click on the “Floes” tab.
Click “All Floes” under “FLOE FILTERS” on the left hand side to remove previous filters if there is any.
From the “CATEGORIES” on the left hand side, select Solution-based > Small Molecule Lead-opt > ADME & Tox > Permeability.
Alternatively, you can locate the permeability Floes by typing “permeability” in the search bar, i.e., “Search floes and categories”, and click on any category that the permeability Floes belong to. This is also a convenient way to find floes that you don’t know their categories.
Find the Permeability - Run Permeability Simulation Floe from the results shown on the right hand side.
Click on the Floe and a Job Form will pop up. Specify the following parameter settings in the Job Form.
Modify Floe Parameters¶
Back on the job form, add the name “tacrine_output” to the “Output Dataset” field.
Under the “Weighted Ensemble Parameters” section, change the number of iterations from 500 to 10.
Warning
If you leave the value at the default setting (500 iterations), this run may cost several thousand dollars.
Note
For a real permeability job run, running 500 iterations is recommended for the simulation to converge.
Execute the Floe¶
Click the “Start Job” button to launch the floe. Wait for the Floe status to be complete before moving on to the next step in the tutorial (include system preparation, a set of 10 iterations will take approximately 1 hour).
Monitor the Floe Run¶
While the permeability simulation is running (as indicated by the “Status” of the job in the “Jobs” tab of the “Floe” page), the progress of the simulation can be monitored by setting the output dataset as an “Active Dataset” and inspecting it in the “Analyze” page.
Open the running job page and locate the output dataset by clicking “Show in Project Data”;
In the “Data” page, add the output dataset to active data by clicking the “+” symbol before the dataset name;
Go to the “Analyze” page and inspect the output record.
Find the columns named “Iteration Number” and “Maximum Iteration Number”.
The iteration number will be updated as the simulation progresses. The simulation finishes when the iteration number reaches the maximum iteration number that was set up when launching the simulation job (see Modify Floe Parameters). So, an iteration number that is smaller than the maximum iteration number indicates that the job is not yet finished.
However, the simulation can stop unexpectedly due to various reasons, including but not limited to connection issues or system setup errors. In this case, the job will be marked as “Failed” in the “Jobs” tab and the output collection may have a size of zero. Nonetheless, all the data that has been generated by the previous iterations will be preserved. When such an early termination occurs, it is recommended to resume the simulation using the provided procedure to ensure safe analysis of the simulation at a later stage.
Resume/Restart a Previous Job (Optional)¶
The same Permeability - Run Permeability Simulation Floe can be used to resume a simulation (either finished or unfinished). If a dataset generated by a previous job is provided as the input to the Floe, the Floe will automatically continue the simulation from the last iteration.
When resuming a job, most of the parameters are immutatable and their values will be kept the same as what was used in the initial simulation, except for the following:
Iterations. The simulation will be resumed to run up to the specified total number of iterations. If the specified number is smaller than the current number of iterations, then the Floe will still run one iteration.
Reweighting, Reweighting Period, and Reweighting Window Size (WESS). Since the WESS reweighting procedure can be applied at any point of the simulation, the Floe allows the user to turn on/off the reweighting feature when resuming a job.
Restart Simulation. The switch for restarting a simulation. If this is turned on, all the previous iterations will be discarded, and the simulation will run from iteration 1. Note that the simulation will still use the same prepared system, i.e., the system will not be prepared again.
Warning
If a simulation needs to be resumed with a different reweighting setup, please set the reweighting parameters to desired values. Otherwise, the simulation will be resumed to run using the default values.
For either resuming or restarting, a new output dataset will be created by the new job, but the trajectory data will still be added to the original collection used by the previous job.
In this example, simply find the output dataset from the job above and follow the steps below:
Click on the Input Dataset button (Choose Input…), and click on the output dataset from the previous job.
Change the value for “Iterations” in “Weighted Ensemble Parameters” to 20.
Click the “Start Job” button to launch the Floe.
Then the simulation will be extended to a total of 20 iterations.
Perform a calculation using CPUs instead of GPUs (Optional)¶
In the case of a severe GPU shortage on an Orion stack, one may want to run the permeability calculation using CPUs only. While it is possible, it is not advised to run in this manner due to the quite steep increase in cost per compound. Despite this warning, below we will demonstrate how to run the permeability simulation using CPUs.
At the bottom of the Job Form of the Permeability - Run Permeability Simulation Floe, slide the bar across so that “Show cube parameters” is in the “Yes” position.
Under the Cube Parameters section, scroll down to the “Run WESTPA Segments (MD)” cube and click on the name to expand the cube parameters. Click on the “System” tab, and change the number of CPUs to 48, and change the number of GPUs to 0.
With these changes, the Permeability - Run Permeability Simulation Floe will now run with CPU instances.
Analyze the Permeability Simulation¶
This tutorial demonstrates how to perform basic analysis on the data generated by permeability simulation job. The set of steps needed to analyze any general output from a previously run Permeability Simulation will be presented here.
Note
A sample 500 iteration dataset of tacrine has been provided by OpenEye for use in this
tutorial. This sample dataset can be found at tacrine_mab_wess_iter500.oedb
.
Or, you can choose to analyze your own dataset generated from a Permeability - Run Permeability Simulation
job.
Set Up the Analysis Job¶
Locate the Permeability - Analyze Permeability Simulation Floe the same way as shown in the Locate the Simulation Floe section and click on the Floe to open the Job Form.
In the Job Form, find and set the dataset generated by the simulation job as the input dataset, and fill in entries for the “Failed Output” and “Analysis Output”, which provide output storage for the analyzed simulation.
Execute the Floe¶
Since no remaining parameters need to be set for analysis, simply click the “Start Job” button to launch the floe. Once the floe is complete, a floe report will be available.
Inspect the Results¶
Once the Permeability - Analyze Permeability Simulation Floe completes, it is best to visually inspect the results to assess the quality of the calculation. Here, we will discuss two important plots that are automatically generated to help the user inspect the quality of the results.
In the Permeability - Analyze Permeability Simulation Floe Report, a plot of the time evolution of the permeability estimate will be presented. An example from tacrine is presented below:
Here in this plot, one can see the permeability evolution as a function of the WE iteration number. Initially, the permeability estimate is 0 since there is no crossing events. This is due to the fact that without crossing events, no weight can make it to the target state in the acceptor water compartment on the opposite side of the membrane, which means the rate constant is 0 therefore the permeability is also 0.
Each time a crossing event occurs, more instantaneous weight is averaged into the rate constant and the permeability estiamte “jumps” a small amount. A slow decrease in the permeability estimate could indicate that no new crossing events have been simulated, and the system lacks strong convergence.
Note
To go from iteration number to molecular time, simply multiply the amount of time simulated per-segment times the number of iterations. For instance, if each MD segment requires 100 ps, and there are 500 iterations total, then the total “molecular time” simulated would be 50 ns.
To gain a better sense of what exactly is happening inside the permeability simulations, one should also visually inspect the plot of the successful crossing events. An example from the same tacrine dataset is given below:
In this plot, the z-position of the successful crossing events are shown as a function of time. Here, roughly at iteration number 20, a single walker enters the membrane. At about iteration 50, the single walker splits into 3 separate walkers, where each separate walker attempt to exit the membrane through the opposing membrane leaflet. After about 100 iterations, all walkers are on the other side of the membrane attempting to escape to the acceptor aqueous phase.
Note
A small number of non-independent crossing events may lead to a permeability prediction that lacks statistical convergence. In such a case, multiple permeability simulations may be required.
What Does Bad Permeability Data Look like?¶
With any algorithm, it is important to know when you can, and more importantly when you cannot, trust your data. Here, we will show an example of a bad permeability result, and potential steps to correct the issue. For this section, it is assumed that you have already run the Permeability - Analyze Permeability Simulation Floe.
Below is an example of “bad data” that was generated with the Permeability Floe, and analyzed with the Permeability - Analyze Permeability Simulation Floe. First, take note of the permeability evolution plot:
Here, one should observe the very low permeability value (~10 -30 cm/s) at the end of 500 WE iterations compared to the much more reasonable MDCK-LE experiment (10 -4.6 cm/s).
In order to understand the source of the erroneous prediction, it can be useful to look at the plot of the “sucessful” crossings. Below we provide the corresponding plot of “successful” crossing events for the unsuccesful simulation.
In this plot, one may notice only a single crossing event takes place during 50ns of molecular time. This crossing event occurs at roughly WE iteration 50, even before the first splitting can occur inside the membrane. This is a clear indicator that only one independent trajectory crossed the membrane, which is far too few for an accurate estimate of permeation kinetics.
Here are the following steps one might take to mitigate such an issue:
Continue the simulation for more WE iterations. It might be possible that an additional 10% of molecular time could provide enough sampling to produce another membrane crossing event.
Restart the permeability simulation from scratch. It may be that your particular system became kinetically or thermodynamically trapped in a state that it cannot escape, so a fresh restart could help this situation.
Is your molecule charged? If it is possible your molecule can have multiple charged states, check that a charged molecule was not inadvertently simulated. A charged molecule will likely have a low permeability value.
If you have large, highly flexible molecule, it is possible that a separate progress coordinate may be needed to improve conformational sampling. Please email support@eyesopen.com to discuss methods to facilitate this type of samping within your permeability simulation.
Calculate the Auxiliary Coordinates¶
The main progress coordinate, that is the position of the molecule relative to the center of the membrane along the lipid bilayer normal (z), is calculated during the simulation. Additional coordinates of the system can be calculated using the Permeability - Calculate Auxiliary Coordinates (Optional) Floe. These additional coordinates are called auxiliary coordinates. Currently, 11 auxiliary coordinates can be calculated using the Floe:
Cosine of the angle to \(\hat{z}\): the cosine of the angle of the molecule relative to the lipid bilayer normal, which is defined through the vector product of the unit electric dipole moment of the drug molecule and bilayer normal.
No. of hydrophobic contacts: the number of hydrophobic contacts, which is defined to be the number of aliphatic atoms of the lipid tails within the 10 Å distance of any hydrophobic atoms of the drug molecule.
No. of H-bonds to membrane: the number of hydrogen bonds between the drug molecule and the membrane, which is identified using the Baker-Hubbard definition.
End-to-end distance (Å): the end-to-end distance of the drug molecule in Angstrom. The distance is defined as the maximum distance among all pairs of atoms in three dimensions.
Hydration number: the number of water molecules within the first solvation shell (3Å) of the drug molecule.
No. of H-bonds to water: the number of hydrogen bonds formed between the drug molecule and the water molecules.
No. of H-bonds to water (acceptor): the number of hydrogen bonds formed between the drug molecule and the waters when the drug molecule is the acceptor.
No. of H-bonds to water (donor): the number of hydrogen bonds formed between the drug molecule and the waters when the drug molecule is the donor.
Desolvation penalty \(\Delta G\) (kcal/mol): the desolvation penalty of the drug molecule calculated by OEZAP Toolkit. The desolvation penalty is calculated as the transfer energy of the molecule from a high dielectric (\(\epsilon=80\)) to a low dielectric (\(\epsilon=2\)) environment while the molecule maintains the same conformation. The penalty is composed of two main terms: solvation energy and surface area. For more information, see The Way of Zap.
Solvation Energy (kcal/mol): the solvation energy calculated using Poisson Boltzmann.
Surface area (\(Å^2\)): the surface area of the drug molecule in square Angstrom.
Execute the Floe¶
Locate the Permeability - Calculate Auxiliary Coordinates (Optional) Floe the same way as shown in the Locate the Simulation Floe section and click on the Floe to open the Job Form.
Set the dataset generated by the Permeability - Analyze Permeability Simulation Floe as the input.
Click the “Start Job” button to launch the Floe.
Once the job is complete, the results will be presented in the Floe Report in the form of 2D probability distributions, where the main progress coordinate is the y-axis and each of the 11 auxiliary coordinates is the x-axis, respectively. A black line will overlay each 2D distribution, indicating the top weighted trajectory. These plots allow you to analyze the movement of the drug molecule through the membrane and observe specific chemical and physical changes it undergoes.
By default, these probability distributions are symmetrized along z to account for the symmetry of the membrane system. However, the original, unsymmetrized versions are also provided by unchecking the “Symmetrize” box at the bottom of the plots (it may take a while for the plots to update).