Automated Cryptic Pocket Detection with Probe Occupancy Analysis

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Molecular Dynamics

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling

  • Solution-based/Target Identification/Target Preparation/Pocket Detection

  • Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection

  • Role-based/Computational Chemist

  • Task-based/Target Prep & Analysis/Pocket Detection

Description

Caution: This floe can be expensive. This floe uses weighted ensemble molecular dynamics to sample the conformations of a target protein and performs cryptic pocket search using probe occupancy analysis. The expense scales with the size of the target protein and the number of Weighted Ensemble Molecular Dynamics (WEMD) iterations performed. Note that the job can cost a few thousand dollars if the number of iterations for the simulation is too high (>500). For large systems (consisting of >400 residues), the job can cost a few thousand dollars for >50 iterations. In previous versions, this floe was run as a series of A1-C2 floes

Promoted Parameters

Title in user interface (promoted name)

Input Data

Target Protein (data_in): A protein structure prepared by SPRUCE (preferred method). If more than one record is provided, only the first will be processed by this floe.

  • Required

  • Type: data_source

Output Data

Solvated and Equilibrated Protein (a1_data_out): Dataset to which to write the solvated and equilibrated protein.

  • Type: dataset_out

  • Default: Solvated and Equilibrated Protein

Output Dataset (a2_data_out): Dataset contaning protein normal modes, their mode collectivity and variance values.

  • Required

  • Type: dataset_out

  • Default: Protein Normal Modes

Output Dataset (a3_data_out): Output dataset containing the current iteration number, simulation settings, and a reference design unit. This dataset contains only 1 record, and can be viewed on the Orion Analyze page to track the total number of iterations that have been completed.

  • Required

  • Type: dataset_out

  • Default: Protein Sampling Summary Table

Collection Name (a3a_collection_name): Name of the collection to create

  • Required

  • Type: collection_sink

  • Default: Protein Sampling Data

Failure Report (failure_report): Output report to generate upon failure.

  • Type: string

  • Default: Failure Report

MD Output Collection (collection_output_name): Name of the MD output collection.

  • Type: string

  • Default: Solvated and Equilibrated MD Collection

Floe Report Output Collection (floe_report_out_a4):

  • Required

  • Type: string

  • Default: Weighted Ensemble MD Analysis Floe Report

Pocket Receptors (pocket_receptors_dataset): A dataset containing 1 or 2 receptors for each pocket identified using probe occupancy analysis. The dataset can be visualized on the Analyze page with 3D layout

  • Required

  • Type: dataset_out

  • Default: Pocket Receptors (Probe Occupancy Analysis)

Cryptic Pocket Analysis Floe Report (floe_report_out): Floe report containing interactive network plot of cryptic pockets with structural visualization of the cryptic pockets

  • Required

  • Type: string

  • Default: Cryptic Pockets Floe Report (Probe Occupancy Analysis)

Protein Solvation and Equilibration Advanced Settings

Protein Parameters (protein_forcefield): Force field parameters for the protein.

  • Required

  • Type: string

  • Default: Amber14SB

  • Choices: [‘Amber14SB’, ‘Amber99SB’, ‘Amber99SBildn’, ‘AmberFB15’]

Ligand Parameters (ligand_forcefield): Force field parameters for the ligand (if present).

  • Required

  • Type: string

  • Default: OpenFF_2.0.0

  • Choices: [‘Gaff_1.81’, ‘Gaff_2.11’, ‘OpenFF_1.1.1’, ‘OpenFF_1.2.1’, ‘OpenFF_1.3.1’, ‘OpenFF_2.0.0’, ‘OpenFF_2.2.0’, ‘Smirnoff99Frosst’, ‘Custom’]

Padding Distance (padding_distance): The padding distance between the solute and the box edge (in Å).

  • Type: decimal

  • Default: 10

Salt Concentration (salt_concentration): The salt concentration (in millimolar). This does not include the ions required to neutralize the system.

  • Type: decimal

  • Default: 150

Normal Mode Calculation Advanced Settings

System Selection String (selstr): Selection strings for selecting a subset of the structure to be used to perform the ANM calculations (i.e., system). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries. See tutorial for more examples.

  • Type: string

System-Environment Framework (sys_env_method):

Method for performing the system-environment calculations. Slice: simply extract the specified portion (i.e., system) of the entire motion. The resulting modes are normalized but their orthogonality may be lost. The eigenvalue/variance of the modes are unaltered so they still correspond to the energy of the entirety, i.e., system and environment. Reduce: calculate system’s motion while taking account of the effects from the environment. The resulting modes are orthonormal and their eigenvalue/variance correspond to the energy of only the system part.

  • Type: string

  • Default: Reduce

  • Choices: [‘Disabled’, ‘Slice’, ‘Reduce’]

Environment Selection String (substr): Selection strings for selecting the environment for performing the system-environment analysis. Environment must include the system specified by “System Selection String”. Leave empty for selecting everything as the envrionment (default). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries.

  • Type: string

Spring Constant (gamma):

Spring Constant of the elastic network.

  • Type: decimal

  • Default: 1.0

Cutoff Distance (Å) (cutoff):

Cutoff distance for pairwise interactions.

  • Type: decimal

  • Default: 15.0

Mode Filtering Property (property): Which property mode filtering is based on.

  • Required

  • Type: string

  • Default: Collectivity

  • Choices: [‘Variance’, ‘Collectivity’]

Weighted Ensemble MD Advanced Settings

Iterations (a3a_iterations): Number of iterations for the WE simulation. Suggested values: 50 iterations (total) for proteins with <200 residues; 100 iterations (total) for proteins with <600 residues. Note that the job can cost a few thousand dollars if this parameter is too high (>500).

  • Required

  • Type: integer

  • Default: 50

Number of Frames Per Iteration (a3a_frames_per_iteration): Number of frames saved in each WE iteration.

  • Type: integer

  • Default: 5

Iteration Interval (Tau) (a3a_iteration_interval): Length of each WE iteration in picoseconds.

  • Type: decimal

  • Default: 100.0

Number of Bins per Dimension (a3a_nbins): The number of bins along each normal mode. Either a single value or a sequence of values (one for each mode) may be provided. Bin placement is controlled by the Minimal Adaptive Binning (MAB) scheme (P.A. Torrillo, A.T. Bogetti, L.T. Chong, J. Phys. Chem. A 2021, 125, 7, 1642–1649).

  • Required

  • Type: integer

  • Default: [10]

Trajectories Per Bin (a3a_walkers_per_bin): Number of trajectories per bin for the WEMD simulation.

  • Required

  • Type: integer

  • Default: 10

Smallest Allowed Weight (Log) (a3a_smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.

  • Required

  • Type: decimal

  • Default: -310.0

Log Verbosity (verbosity): verbose level

  • Type: string

  • Default: debug

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]

Weighted Ensemble MD Analysis Advanced Settings

Selection String (selstr_a4): Selection strings for defining a subset of the structure to be used to perform the trajectory analysis (NC fraction and RMSD analysis). The general syntax follows “[chain_id:][from_res_num][~to_res_num]”. Multiple residue ranges are supported by inputting multiple entries.

  • Type: string

Atoms selection modifier (select_modifier_a4): Additional selection criteria added to all chain segments chosen in chain selector

  • Type: string

  • Default: protein

  • Choices: [‘protein’, ‘backbone’, ‘name == CA’, ‘sidechain’]

Apply Weight (weight_option): Apply weights from WE-MD simulations when generating free energy maps.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Number of Histogram Bins (number_bins): Number of bins for plotting density distribution.

  • Type: integer

  • Default: 100

Maximum Free Energy Value (max_free_energy): Maximum value (in kcal/mol) for showing the 2D free energy map.

  • Type: decimal

  • Default: 50.0

Native Contact Map (is_ncfractions_on): Defines whether to plot the projection of the fraction of native contacts compared to the reference structure. The projection shows the average fraction of native contacts for all selected conformations for a given progress coordinate (pair).

  • Type: boolean

  • Default: True

  • Choices: [True, False]

RMSD Map (is_rmsd_on): Defines whether to plot the projection of the backbone root mean square deviation (RMSD) compared to the reference structure. The projection shows the average RMSD for all selected conformations for a given progress coordinate (pair).

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Cryptic Pocket Analysis Advanced Settings

Important Residues (c1_select_string_key_resids): String for selecting functionally important residues e.g. active site residues or a known disease mutation. Residues should be specified in <residue number><chain id> format. For example, active site consisting of residues 11, 12 (chain A) and residues 23 (chain B) should be specified as 11A, 12A, 23B (without any trailing punctuation). These residues will be displayed along with the pocket residues in the cryptic pocket floe report to visualize the location of each cryptic pocket with respect to functionally important residues.

  • Type: string

Selection Range for Trajectories

Stride (a4_stride): Integer factor by which to subsample frames. (Only every stride-th frame will be read.)

  • Type: integer

  • Default: 1

A4 RMSF Analysis Memory (MB) (a4_rmsf_memory): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

  • Type: decimal

  • Default: 1800