Automated Cryptic Pocket Detection with Probe Occupancy Analysis

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Molecular Dynamics

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling

  • Solution-based/Target Identification/Target Preparation/Pocket Detection

  • Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection

  • Role-based/Computational Chemist

  • Task-based/Target Prep & Analysis/Pocket Detection

Description

Caution: This floe can be expensive. This floe uses weighted ensemble molecular dynamics (WEMD) to sample the conformations of a target protein and performs a cryptic pocket search using probe occupancy analysis. The expense scales with the size of the target protein and the number of weighted ensemble molecular dynamics iterations performed. Note that the job can cost a few thousand dollars if the number of iterations for the simulation is too high (>500). For large systems (consisting of >400 residues), the job can cost a few thousand dollars for >50 iterations. In previous versions, this floe was run as a series of A1-C2 floes.

Promoted Parameters

Title in user interface (promoted name)

Input Data

Target Protein (data_in): A protein structure prepared by SPRUCE (preferred method). If more than one record is provided, only the first will be processed by this floe.

  • Required

  • Type: data_source

Output Data

Solvated and Equilibrated Protein (a1_data_out): Dataset to which to write the solvated and equilibrated protein.

  • Type: dataset_out

  • Default: Solvated and Equilibrated Protein

Output Dataset (a2_data_out): Dataset containing protein normal modes, their mode collectivity, and variance values.

  • Required

  • Type: dataset_out

  • Default: Protein Normal Modes

Output Dataset (a3_data_out): Output dataset containing the current iteration number, simulation settings, and a reference design unit. This dataset contains only one record and can be viewed on the Orion 3D & Analyze page to track the total number of iterations that have been completed.

  • Required

  • Type: dataset_out

  • Default: Protein Sampling Summary Table

Collection Name (a3a_collection_name): Name of the collection to create.

  • Required

  • Type: collection_sink

  • Default: Protein Sampling Data

Failure Report (failure_report): Output report to generate upon failure.

  • Type: string

  • Default: Failure Report

Floe Report Output Collection (floe_report_out_a4):

  • Required

  • Type: string

  • Default: Weighted Ensemble MD Analysis Floe Report

Pocket Receptors (pocket_receptors_dataset): A dataset containing one or two receptors for each pocket identified using probe occupancy analysis. The dataset can be visualized on the 3D & Analyze page with 3D layout.

  • Required

  • Type: dataset_out

  • Default: Pocket Receptors (Probe Occupancy Analysis)

Cryptic Pocket Analysis Floe Report (floe_report_out): Floe report containing interactive network plot of cryptic pockets with structural visualization of the cryptic pockets.

  • Required

  • Type: string

  • Default: Cryptic Pockets Floe Report (Probe Occupancy Analysis)

Medoids Structures (medoids_dus): A dataset containing medoids structures.

  • Required

  • Type: dataset_out

  • Default: Medoids Structures (Probe Occupancy Analysis)

Superposed Holo Design Units (holo_data_out): Superposed holo design units dataset to be used for validation. The superposed design units of holo structures can be compared with the pocket receptor design units to check the overlap between the ligand in the holo structure and the pocket receptors.

  • Type: dataset_out

  • Default: Superposed holo design units

Protein Solvation and Equilibration Advanced Settings

Protein Parameters (protein_forcefield): Force field parameters for the protein.

  • Required

  • Type: string

  • Default: Amber14SB

  • Choices: [‘Amber14SB’, ‘Amber99SB’, ‘Amber99SBildn’, ‘AmberFB15’]

Ligand Parameters (ligand_forcefield): Force field parameters for the ligand (if present).

  • Required

  • Type: string

  • Default: OpenFF_2.0.0

  • Choices: [‘Gaff_1.81’, ‘Gaff_2.11’, ‘OpenFF_1.1.1’, ‘OpenFF_1.2.1’, ‘OpenFF_1.3.1’, ‘OpenFF_2.0.0’, ‘OpenFF_2.2.0’, ‘Smirnoff99Frosst’, ‘Custom’]

Padding Distance (padding_distance): The padding distance between the solute and the box edge (in Å).

  • Type: decimal

  • Default: 10

Salt Concentration (salt_concentration): The salt concentration (in millimolar). This does not include the ions required to neutralize the system.

  • Type: decimal

  • Default: 150

Normal Mode Calculation Advanced Settings

System Selection String (selstr): Selection strings for selecting a subset of the structure to be used to perform the ANM calculations (i.e., system). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice, this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries. See the tutorial for more examples.

  • Type: string

System-Environment Framework (sys_env_method):

Method for performing the system-environment calculations. Slice: simply extract the specified portion (i.e., system) of the entire motion. The resulting modes are normalized but their orthogonality may be lost. The eigenvalue/variance of the modes are unaltered so they still correspond to the energy of the entirety, i.e., system and environment. Reduce: calculate system’s motion while taking account of the effects from the environment. The resulting modes are orthonormal and their eigenvalue/variance correspond to the energy of only the system part.

  • Type: string

  • Default: Reduce

  • Choices: [‘Disabled’, ‘Slice’, ‘Reduce’]

Environment Selection String (substr): Leave empty for selecting the entire protein as the environment (default). Selection strings for selecting the environment for performing the system-environment analysis. Environment must include the system specified by “System Selection String”. The general syntax is the same as the “System Selection String”.

  • Type: string

Spring Constant (gamma): Spring Constant of the elastic network.

  • Type: decimal

  • Default: 1.0

Cutoff Distance (Å) (cutoff): Cutoff distance for pairwise interactions.

  • Type: decimal

  • Default: 15.0

Mode Filtering Property (property): Which property mode filtering is based on.

  • Required

  • Type: string

  • Default: Collectivity

  • Choices: [‘Variance’, ‘Collectivity’]

Weighted Ensemble MD Advanced Settings

Iterations (a3a_iterations): Number of iterations for the WE simulation. Suggested values: 50 iterations (total) for proteins with <200 residues; 100 iterations (total) for proteins with <600 residues. Note that the job can cost a few thousand dollars if this parameter is too high (>500).

  • Required

  • Type: integer

  • Default: 50

Number of Frames Per Iteration (a3a_frames_per_iteration): Number of frames saved in each WE iteration.

  • Type: integer

  • Default: 5

Iteration Interval (Tau) (a3a_iteration_interval): Length of each WE iteration in picoseconds.

  • Type: decimal

  • Default: 100.0

Number of Bins per Dimension (a3a_nbins): The number of bins along each normal mode. Either a single value or a sequence of values (one for each mode) may be provided. Bin placement is controlled by the Minimal Adaptive Binning (MAB) scheme (P.A. Torrillo, A.T. Bogetti, L.T. Chong, J. Phys. Chem. A 2021, 125, 7, 1642–1649).

  • Required

  • Type: integer

  • Default: [10]

Trajectories Per Bin (a3a_walkers_per_bin): Number of trajectories per bin for the WEMD simulation.

  • Required

  • Type: integer

  • Default: 10

Smallest Allowed Weight (Log) (a3a_smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.

  • Required

  • Type: decimal

  • Default: -310.0

Log Verbosity (verbosity): verbose level

  • Type: string

  • Default: debug

  • Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]

Parameters for Free Energy Maps/Surfaces (optional)

Selection String (selstr_a4): Selection strings for defining a subset of the structure to be used to perform the trajectory analysis (NC fraction and RMSD analysis). The general syntax follows “[chain_id:][from_res_num][~to_res_num]”. Multiple residue ranges are supported by inputting multiple entries.

  • Type: string

Atoms Selection Modifier (select_modifier_a4): Additional selection modifier to specify required group of atoms for PMF, RMSD and NC-fractions

  • Type: string

  • Default: protein

  • Choices: [‘protein’, ‘backbone’, ‘alpha-carbons’, ‘sidechain’]

Apply Weight (is_weight_on): Apply weights from WEMD simulations when generating free energy maps.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Number of Histogram Bins (number_bins): Number of bins for plotting density distribution.

  • Type: integer

  • Default: 100

Maximum Potential of Mean Force Value (max_free_energy): Maximum on color scale (in kcal/mol) for 2D colormap of pPotential of mean force.

  • Type: decimal

  • Default: 50.0

Native Contact Map (is_ncfractions_on): Defines whether to plot the projection of the fraction of native contacts compared to the reference structure. The projection shows the average fraction of native contacts for all selected conformations for a given progress coordinate (pair).

  • Type: boolean

  • Default: True

  • Choices: [True, False]

RMSD Map (is_rmsd_on): Defines whether to plot the projection of the backbone root mean square deviation (RMSD) compared to the reference structure. The projection shows the average RMSD for all selected conformations for a given progress coordinate (pair).

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Cryptic Pocket Analysis Advanced Settings

Important Residues (c2_select_string_key_resids): String for selecting functionally important residues such as active site residues or a known disease mutation. Residues should be specified in <residue number><chain id> format. For example, active site consisting of residues 11, 12 (chain A) and residues 23 (chain B) should be specified as 11A, 12A, 23B (without any trailing punctuation). These residues will be displayed along with the pocket residues in the cryptic pocket floe report to visualize the location of each cryptic pocket with respect to functionally important residues.

  • Type: string

Minimum Ligandability Score (c2_lig_score_cutoff): Lower bound on the normalized ligandability score of pockets that are sent to the output dataset.

  • Type: decimal

  • Default: 0.05

Holo Design Units (Optional) (holo_data_in): Holo design units to be used for validation. These design units will be superposed with the pocket receptor design units.

  • Type: data_source