Automated Cryptic Pocket Detection with Probe Occupancy Analysis

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Product-based/Molecular Dynamics

Solution-based/Virtual-screening/Target Preparation

Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling

Solution-based/Target Identification/Target Preparation/Pocket Detection

Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection

Role-based/Computational Chemist

Task-based/Target Prep & Analysis/Pocket Detection

Description

Caution: This floe can be expensive. This floe uses weighted ensemble molecular dynamics to sample the conformations of a target protein and performs cryptic pocket search using probe occupancy analysis. The expense scales with the size of the target protein and the number of Weighted Ensemble Molecular Dynamics (WEMD) iterations performed. Note that the job can cost a few thousand dollars if the number of iterations for the simulation is too high (>500). For large systems (consisting of >400 residues), the job can cost a few thousand dollars for >50 iterations. In previous versions, this floe was run as a series of A1-C2 floes

Promoted Parameters

Title in user interface (promoted name)

Input Data

Target Protein (data_in): A protein structure prepared by SPRUCE (preferred method). If more than one record is provided, only the first will be processed by this floe.

Required

Type: data_source

Output Data

Solvated and Equilibrated Protein (a1_data_out): Dataset to which to write the solvated and equilibrated protein.

Type: dataset_out

Default: Solvated and Equilibrated Protein

Output Dataset (a2_data_out): Dataset contaning protein normal modes, their mode collectivity and variance values.

Required

Type: dataset_out

Default: Protein Normal Modes

Output Dataset (a3_data_out): Output dataset containing the current iteration number, simulation settings, and a reference design unit. This dataset contains only 1 record, and can be viewed on the Orion Analyze page to track the total number of iterations that have been completed.

Required

Type: dataset_out

Default: Protein Sampling Summary Table

Collection Name (a3a_collection_name): Name of the collection to create

Required

Type: collection_sink

Default: Protein Sampling Data

Failure Report (failure_report): Output report to generate upon failure.

Type: string

Default: Failure Report

MD Output Collection (collection_output_name): Name of the MD output collection.

Type: string

Default: Solvated and Equilibrated MD Collection

Floe Report Output Collection (floe_report_out_a4):

Required

Type: string

Default: Weighted Ensemble MD Analysis Floe Report

Pocket Receptors (pocket_receptors_dataset): A dataset containing 1 or 2 receptors for each pocket identified using probe occupancy analysis. The dataset can be visualized on the Analyze page with 3D layout

Required

Type: dataset_out

Default: Pocket Receptors (Probe Occupancy Analysis)

Cryptic Pocket Analysis Floe Report (floe_report_out): Floe report containing interactive network plot of cryptic pockets with structural visualization of the cryptic pockets

Required

Type: string

Default: Cryptic Pockets Floe Report (Probe Occupancy Analysis)

Protein Solvation and Equilibration Advanced Settings

Protein Parameters (protein_forcefield): Force field parameters for the protein.

Required

Type: string

Default: Amber14SB

Choices: [‘Amber14SB’, ‘Amber99SB’, ‘Amber99SBildn’, ‘AmberFB15’]

Ligand Parameters (ligand_forcefield): Force field parameters for the ligand (if present).

Required

Type: string

Default: OpenFF_2.0.0

Choices: [‘Gaff_1.81’, ‘Gaff_2.11’, ‘OpenFF_1.1.1’, ‘OpenFF_1.2.1’, ‘OpenFF_1.3.1’, ‘OpenFF_2.0.0’, ‘OpenFF_2.2.0’, ‘Smirnoff99Frosst’, ‘Custom’]

Padding Distance (padding_distance): The padding distance between the solute and the box edge (in Å).

Type: decimal

Default: 10

Salt Concentration (salt_concentration): The salt concentration (in millimolar). This does not include the ions required to neutralize the system.

Type: decimal

Default: 150

Normal Mode Calculation Advanced Settings

System Selection String (selstr): Selection strings for selecting a subset of the structure to be used to perform the ANM calculations (i.e., system). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries. See tutorial for more examples.

Type: string

System-Environment Framework (sys_env_method):

Method for performing the system-environment calculations. Slice: simply extract the specified portion (i.e., system) of the entire motion. The resulting modes are normalized but their orthogonality may be lost. The eigenvalue/variance of the modes are unaltered so they still correspond to the energy of the entirety, i.e., system and environment. Reduce: calculate system’s motion while taking account of the effects from the environment. The resulting modes are orthonormal and their eigenvalue/variance correspond to the energy of only the system part.

Type: string

Default: Reduce

Choices: [‘Disabled’, ‘Slice’, ‘Reduce’]

Environment Selection String (substr): Selection strings for selecting the environment for performing the system-environment analysis. Environment must include the system specified by “System Selection String”. Leave empty for selecting everything as the envrionment (default). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries.

Type: string

Spring Constant (gamma): Spring Constant of the elastic network.

Type: decimal

Default: 1.0

Cutoff Distance (Å) (cutoff): Cutoff distance for pairwise interactions.

Type: decimal

Default: 15.0

Mode Filtering Property (property): Which property mode filtering is based on.

Required

Type: string

Default: Variance

Choices: [‘Variance’, ‘Collectivity’]

Weighted Ensemble MD Advanced Settings

Iterations (a3a_iterations): Number of iterations for the WE simulation. Suggested values: 50 iterations (total) for proteins with <200 residues; 100 iterations (total) for proteins with <600 residues. Note that the job can cost a few thousand dollars if this parameter is too high (>500).

Required

Type: integer

Default: 50

Number of Frames Per Iteration (a3a_frames_per_iteration): Number of frames saved in each WE iteration.

Type: integer

Default: 5

Iteration Interval (Tau) (a3a_iteration_interval): Length of each WE iteration in picoseconds.

Type: decimal

Default: 100.0

Number of Bins per Dimension (a3a_nbins): The number of bins along each normal mode. Either a single value or a sequence of values (one for each mode) may be provided. Bin placement is controlled by the Minimal Adaptive Binning (MAB) scheme (P.A. Torrillo, A.T. Bogetti, L.T. Chong, J. Phys. Chem. A 2021, 125, 7, 1642–1649).

Required

Type: integer

Default: [10]

Trajectories Per Bin (a3a_walkers_per_bin): Number of trajectories per bin for the WEMD simulation.

Required

Type: integer

Default: 10

Smallest Allowed Weight (Log) (a3a_smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.

Required

Type: decimal

Default: -310.0

Log Verbosity (verbosity): verbose level

Type: string

Default: debug

Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]

Weighted Ensemble MD Analysis Advanced Settings

Selection String (selstr_a4): Selection strings for defining a subset of the structure to be used to perform the trajectory analysis (NC fraction and RMSD analysis). The general syntax follows “[chain_id:][from_res_num][~to_res_num]”. Multiple residue ranges are supported by inputting multiple entries.

Type: string

Atoms selection modifier (select_modifier_a4): Additional selection criteria added to all chain segments chosen in chain selector

Type: string

Default: protein

Choices: [‘protein’, ‘backbone’, ‘name == CA’, ‘sidechain’]

Apply Weight (weight_option): Apply weights from WE-MD simulations when generating free energy maps.

Type: boolean

Default: True

Choices: [True, False]

Number of Histogram Bins (number_bins): Number of bins for plotting density distribution.

Type: integer

Default: 100

Maximum Free Energy Value (max_free_energy): Maximum value (in kcal/mol) for showing the 2D free energy map.

Type: decimal

Default: 50.0

Native Contact Map (is_ncfractions_on): Defines whether to plot the projection of the fraction of native contacts compared to the reference structure. The projection shows the average fraction of native contacts for all selected conformations for a given progress coordinate (pair).

Type: boolean

Default: True

Choices: [True, False]

RMSD Map (is_rmsd_on): Defines whether to plot the projection of the backbone root mean square deviation (RMSD) compared to the reference structure. The projection shows the average RMSD for all selected conformations for a given progress coordinate (pair).

Type: boolean

Default: True

Choices: [True, False]

Cryptic Pocket Analysis Advanced Settings

Important Residues (c1_select_string_key_resids): String for selecting functionally important residues e.g. active site residues or a known disease mutation. Residues should be specified in <residue number><chain id> format. For example, active site consisting of residues 11, 12 (chain A) and residues 23 (chain B) should be specified as 11A, 12A, 23B (without any trailing punctuation). These residues will be displayed along with the pocket residues in the cryptic pocket floe report to visualize the location of each cryptic pocket with respect to functionally important residues.

Type: string

Selection Range for Trajectories

Stride (a4_stride): Integer factor by which to subsample frames. (Only every stride-th frame will be read.)

Type: integer

Default: 1

A4 RMSF Analysis Memory (MB) (a4_rmsf_memory): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Type: decimal

Default: 1800