Short Trajectory MD with Analysis [MDPrep] [MDRun] [MDAnalysis]¶

Purpose:
- This Floe performs short MD simulations given a prepared protein and a set of posed and prepared ligands, then analyzes the trajectory for pose stability.
Method Recommendations/Requirements:
- To avoid excessively large output floe reports, the floe report is truncated at the top 100 ligands by ensemble MMPBSA score.
- The ligands need to have reasonable 3D coordinates, all atoms, and correct chemistry (in particular bond orders and formal charges).
- Each ligand can have multiple conformers but each conformer will be run separately as a different ligand.
- The starting poses should not have very high gradients, in particular no bad clashes with the protein.
- The protein needs to be prepared to MD standards: protein chains must be capped, all atoms in protein residues (including hydrogens) must be present, and missing protein loops resolved or capped.
- Crystallographic internal waters should be retained where possible.
Limitations
- Currently this floe cannot handle covalent bonds between different components such as ligand, protein, and cofactors.
- Glycosylation on proteins is truncated and the amino acid is capped with H.
Expertise Level:
- Regular/Intermediate/Advanced
Compute Resource:
- Minimal
Keywords:
- MD, MDPrep, MDAnalysis
Related Floes:
- Bound Protein-Ligand MD [MDPrep] [MD]
  - First half of this floe
- Analyze Protein-Ligand MD [MDAnalysis]
  - Last half of this floe
- Convert MD Analysis results to Cluster-Centric Dataset [Utility]
  - Convert ligand-centric output from this floe into cluster-centric output to select clusters for further work
- Extract Short Trajectory MD Results for Download [Utility]
  - Extract and save in a .tar.gz file:
    - The protein, ligand and binding site water trajectories as multi-conformer OEMols.
    - The Average and Median protein-ligand complex for each cluster.

Given the inputs of the protein and posed ligands, the complex is formed with each ligand/conformer separately, and the complex is solvated and parametrized according to the selected force fields. A minimization stage is performed on the system followed by a warm up (NVT ensemble) and three equilibration stages (NPT ensemble). In the minimization, warm up, and equilibration stages, positional harmonic restraints are applied on the ligand and protein. At the end of the equilibration stages a short (default 2ns) production run is performed on the unrestrained system. The production run is then analyzed. Trajectories from different starting poses of the same ligand are combined and analysed collectively. One analysis is in terms of interactions between the ligand and the active site. Another looks at clustering the ligand positions in the protein active site after fitting the trajectory based on active site C_alphas. Ensemble MMPBSA and ensemble BintScore calculations are carried out on the trajectory and are localized to the ligand clusters. An HTML Floe report is generated for the top 100 ligands by ensemble MMPBSA score. Once the analysis is done, it generates a ready-to-be-downloaded tarball file in Amazon S3, which includes the analysis results in CSV files, the HTML floe report, ligand trajectories, and molecular structure files of cluster medians and averages.

Promoted Parameters

HMR (boolean) : On enables Hydrogen Mass Repartitioning. Not currently implemented in Gromacs

Default: True

md_engine (string) : Select the MD available engine

Default: OpenMM

Choices: OpenMM, Gromacs

cpu_count_md (integer) : The number of CPUs to run this cube with

Default: 16 Min: 1 Max: 128

gpu_count_md (integer) : The number of GPUs to run this cube with

Default: 1 Max: 16

spot_policy_md (string) : Control cube placement on spot market instances

Default: Allowed

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

ligands (data_source) : Ligand Dataset

out (dataset_out) : Output dataset to write to

charge_ligands (boolean) : Assign ligand partial charges or not

Default: True

protein_ff (string) : Force field parameters to be applied to the protein

Default: Amber14SB

Choices: Amber14SB, Amber99SB, Amber99SBildn, AmberFB15

ligand_ff (string) : Force field to be applied to the ligand

Default: OpenFF_2.0.0

Choices: Gaff_1.81, Gaff_2.11, OpenFF_1.1.1, OpenFF_1.2.1, OpenFF_1.3.1, OpenFF_2.0.0, Smirnoff99Frosst

prod_ns (decimal) : Length of MD run in nanoseconds

Default: 2.0

prod_trajectory_interval (decimal) : Trajectory saving interval in nanoseconds

Default: 0.004

fail (dataset_out) : MD Dataset Failures out

max_md_runs (integer) : The maximum allowed number of MD runs

Default: 500

n_md_starts (integer) : The number of MD starts for each ligand/conformer

Default: 1

protein (data_source) : Protein Dataset

flask_title (string) : Prefix name used to identify the Protein. If not specified, it will use the title of the input protein.

Default: “”

Extra Required Parameters

Log Field (Field Type: String) : The field to store messages to floe report

Default: Log Field

Log Field (Field Type: String) : The field to store messages to floe report

Default: Log Field

Log Field (Field Type: String) : The field to store messages to floe report

Default: Log Field

Log Field (Field Type: String) : The field to store messages to floe report

Default: Log Field

Log Field (Field Type: String) : The field to store messages to floe report

Default: Log Field