Short Trajectory MD with Analysis [MDPrep] [MDRun] [MDAnalysis]

  • Purpose:

    • This Floe performs short MD simulations given a prepared protein and a set of posed and prepared ligands, then analyzes the trajectory for pose stability.

  • Method Recommendations/Requirements:

    • To avoid excessively large output floe reports, the floe report is truncated at the top 100 ligands by ensemble MMPBSA score.

    • The ligands need to have reasonable 3D coordinates, all atoms, and correct chemistry (in particular bond orders and formal charges).

    • Each ligand can have multiple conformers but each conformer will be run separately as a different ligand.

    • The starting poses should not have very high gradients, in particular no bad clashes with the protein.

    • The protein needs to be prepared to MD standards: protein chains must be capped, all atoms in protein residues (including hydrogens) must be present, and missing protein loops resolved or capped.

    • Crystallographic internal waters should be retained where possible.

  • Limitations

    • Currently this floe cannot handle covalent bonds between different components such as ligand, protein, and cofactors.

    • Glycosylation on proteins is truncated and the amino acid is capped with H.

  • Expertise Level:

    • Regular/Intermediate/Advanced

  • Compute Resource:

    • Minimal

  • Keywords:

    • MD, MDPrep, MDAnalysis

  • Related Floes:

    • Bound Protein-Ligand MD [MDPrep] [MD]

      • First half of this floe

    • Analyze Protein-Ligand MD [MDAnalysis]

      • Last half of this floe

    • Convert MD Analysis results to Cluster-Centric Dataset [Utility]

      • Convert ligand-centric output from this floe into cluster-centric output to select clusters for further work

    • Extract Short Trajectory MD Results for Download [Utility]

      • Extract and save in a .tar.gz file:

        • The protein, ligand and binding site water trajectories as multi-conformer OEMols.

        • The Average and Median protein-ligand complex for each cluster.

Given the inputs of the protein and posed ligands, the complex is formed with each ligand/conformer separately, and the complex is solvated and parametrized according to the selected force fields. A minimization stage is performed on the system followed by a warm up (NVT ensemble) and three equilibration stages (NPT ensemble). In the minimization, warm up, and equilibration stages, positional harmonic restraints are applied on the ligand and protein. At the end of the equilibration stages a short (default 2ns) production run is performed on the unrestrained system. The production run is then analyzed. Trajectories from different starting poses of the same ligand are combined and analysed collectively. One analysis is in terms of interactions between the ligand and the active site. Another looks at clustering the ligand positions in the protein active site after fitting the trajectory based on active site C_alphas. Ensemble MMPBSA and ensemble BintScore calculations are carried out on the trajectory and are localized to the ligand clusters. An HTML Floe report is generated for the top 100 ligands by ensemble MMPBSA score. Once the analysis is done, it generates a ready-to-be-downloaded tarball file in Amazon S3, which includes the analysis results in CSV files, the HTML floe report, ligand trajectories, and molecular structure files of cluster medians and averages.

Promoted Parameters

  • HMR (boolean) : On enables Hydrogen Mass Repartitioning. Not currently implemented in Gromacs
    Default: True
  • md_engine (string) : Select the MD available engine
    Default: OpenMM
    Choices: OpenMM, Gromacs
  • cpu_count_md (integer) : The number of CPUs to run this cube with
    Default: 16 Min: 1 Max: 128
  • gpu_count_md (integer) : The number of GPUs to run this cube with
    Default: 1 Max: 16
  • spot_policy_md (string) : Control cube placement on spot market instances
    Default: Allowed
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • ligands (data_source) : Ligand Dataset
  • out (dataset_out) : Output dataset to write to
  • charge_ligands (boolean) : Assign ligand partial charges or not
    Default: True
  • protein_ff (string) : Force field parameters to be applied to the protein
    Default: Amber14SB
    Choices: Amber14SB, Amber99SB, Amber99SBildn, AmberFB15
  • ligand_ff (string) : Force field to be applied to the ligand
    Default: OpenFF_2.0.0
    Choices: Gaff_1.81, Gaff_2.11, OpenFF_1.1.1, OpenFF_1.2.1, OpenFF_1.3.1, OpenFF_2.0.0, Smirnoff99Frosst
  • prod_ns (decimal) : Length of MD run in nanoseconds
    Default: 2.0
  • prod_trajectory_interval (decimal) : Trajectory saving interval in nanoseconds
    Default: 0.004
  • fail (dataset_out) : MD Dataset Failures out
  • max_md_runs (integer) : The maximum allowed number of MD runs
    Default: 500
  • n_md_starts (integer) : The number of MD starts for each ligand/conformer
    Default: 1
  • protein (data_source) : Protein Dataset
  • flask_title (string) : Prefix name used to identify the Protein. If not specified, it will use the title of the input protein.
    Default: “”

Extra Required Parameters

  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field
  • Log Field (Field Type: String) : The field to store messages to floe report
    Default: Log Field