Automated Cryptic Pocket Detection with Probe Occupancy Analysis
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/Molecular Dynamics
Solution-based/Virtual-screening/Target Preparation
Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling
Solution-based/Target Identification/Target Preparation/Pocket Detection
Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection
Role-based/Computational Chemist
Task-based/Target Prep & Analysis/Pocket Detection
Description
Caution: This floe can be expensive. This floe uses weighted ensemble molecular dynamics to sample the conformations of a target protein and performs cryptic pocket search using probe occupancy analysis. The expense scales with the size of the target protein and the number of Weighted Ensemble Molecular Dynamics (WEMD) iterations performed. Note that the job can cost a few thousand dollars if the number of iterations for the simulation is too high (>500). For large systems (consisting of >400 residues), the job can cost a few thousand dollars for >50 iterations. In previous versions, this floe was run as a series of A1-C2 floes
Promoted Parameters
Title in user interface (promoted name)
Input Data
Target Protein (data_in): A protein structure prepared by SPRUCE (preferred method). If more than one record is provided, only the first will be processed by this floe.
Required
Type: data_source
Output Data
Solvated and Equilibrated Protein (a1_data_out): Dataset to which to write the solvated and equilibrated protein.
Type: dataset_out
Default: Solvated and Equilibrated Protein
Output Dataset (a2_data_out): Dataset contaning protein normal modes, their mode collectivity and variance values.
Required
Type: dataset_out
Default: Protein Normal Modes
Output Dataset (a3_data_out): Output dataset containing the current iteration number, simulation settings, and a reference design unit. This dataset contains only 1 record, and can be viewed on the Orion Analyze page to track the total number of iterations that have been completed.
Required
Type: dataset_out
Default: Protein Sampling Summary Table
Collection Name (a3a_collection_name): Name of the collection to create
Required
Type: collection_sink
Default: Protein Sampling Data
Failure Report (failure_report): Output report to generate upon failure.
Type: string
Default: Failure Report
MD Output Collection (collection_output_name): Name of the MD output collection.
Type: string
Default: Solvated and Equilibrated MD Collection
Floe Report Output Collection (floe_report_out_a4):
Required
Type: string
Default: Weighted Ensemble MD Analysis Floe Report
Pocket Receptors (pocket_receptors_dataset): A dataset containing 1 or 2 receptors for each pocket identified using probe occupancy analysis. The dataset can be visualized on the Analyze page with 3D layout
Required
Type: dataset_out
Default: Pocket Receptors (Probe Occupancy Analysis)
Cryptic Pocket Analysis Floe Report (floe_report_out): Floe report containing interactive network plot of cryptic pockets with structural visualization of the cryptic pockets
Required
Type: string
Default: Cryptic Pockets Floe Report (Probe Occupancy Analysis)
Protein Solvation and Equilibration Advanced Settings
Protein Parameters (protein_forcefield): Force field parameters for the protein.
Required
Type: string
Default: Amber14SB
Choices: [‘Amber14SB’, ‘Amber99SB’, ‘Amber99SBildn’, ‘AmberFB15’]
Ligand Parameters (ligand_forcefield): Force field parameters for the ligand (if present).
Required
Type: string
Default: OpenFF_2.0.0
Choices: [‘Gaff_1.81’, ‘Gaff_2.11’, ‘OpenFF_1.1.1’, ‘OpenFF_1.2.1’, ‘OpenFF_1.3.1’, ‘OpenFF_2.0.0’, ‘OpenFF_2.2.0’, ‘Smirnoff99Frosst’, ‘Custom’]
Padding Distance (padding_distance): The padding distance between the solute and the box edge (in Å).
Type: decimal
Default: 10
Salt Concentration (salt_concentration): The salt concentration (in millimolar). This does not include the ions required to neutralize the system.
Type: decimal
Default: 150
Normal Mode Calculation Advanced Settings
System Selection String (selstr): Selection strings for selecting a subset of the structure to be used to perform the ANM calculations (i.e., system). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries. See tutorial for more examples.
Type: string
System-Environment Framework (sys_env_method):
Method for performing the system-environment calculations. Slice: simply extract the specified portion (i.e., system) of the entire motion. The resulting modes are normalized but their orthogonality may be lost. The eigenvalue/variance of the modes are unaltered so they still correspond to the energy of the entirety, i.e., system and environment. Reduce: calculate system’s motion while taking account of the effects from the environment. The resulting modes are orthonormal and their eigenvalue/variance correspond to the energy of only the system part.
Type: string
Default: Reduce
Choices: [‘Disabled’, ‘Slice’, ‘Reduce’]
Environment Selection String (substr): Selection strings for selecting the environment for performing the system-environment analysis. Environment must include the system specified by “System Selection String”. Leave empty for selecting everything as the envrionment (default). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries.
Type: string
Spring Constant (gamma):
Spring Constant of the elastic network.
Type: decimal
Default: 1.0
Cutoff Distance (Å) (cutoff):
Cutoff distance for pairwise interactions.
Type: decimal
Default: 15.0
Mode Filtering Property (property): Which property mode filtering is based on.
Required
Type: string
Default: Collectivity
Choices: [‘Variance’, ‘Collectivity’]
Weighted Ensemble MD Advanced Settings
Iterations (a3a_iterations): Number of iterations for the WE simulation. Suggested values: 50 iterations (total) for proteins with <200 residues; 100 iterations (total) for proteins with <600 residues. Note that the job can cost a few thousand dollars if this parameter is too high (>500).
Required
Type: integer
Default: 50
Number of Frames Per Iteration (a3a_frames_per_iteration): Number of frames saved in each WE iteration.
Type: integer
Default: 5
Iteration Interval (Tau) (a3a_iteration_interval): Length of each WE iteration in picoseconds.
Type: decimal
Default: 100.0
Number of Bins per Dimension (a3a_nbins): The number of bins along each normal mode. Either a single value or a sequence of values (one for each mode) may be provided. Bin placement is controlled by the Minimal Adaptive Binning (MAB) scheme (P.A. Torrillo, A.T. Bogetti, L.T. Chong, J. Phys. Chem. A 2021, 125, 7, 1642–1649).
Required
Type: integer
Default: [10]
Trajectories Per Bin (a3a_walkers_per_bin): Number of trajectories per bin for the WEMD simulation.
Required
Type: integer
Default: 10
Smallest Allowed Weight (Log) (a3a_smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.
Required
Type: decimal
Default: -310.0
Log Verbosity (verbosity): verbose level
Type: string
Default: debug
Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]
Weighted Ensemble MD Analysis Advanced Settings
Selection String (selstr_a4): Selection strings for defining a subset of the structure to be used to perform the trajectory analysis (NC fraction and RMSD analysis). The general syntax follows “[chain_id:][from_res_num][~to_res_num]”. Multiple residue ranges are supported by inputting multiple entries.
Type: string
Atoms selection modifier (select_modifier_a4): Additional selection criteria added to all chain segments chosen in chain selector
Type: string
Default: protein
Choices: [‘protein’, ‘backbone’, ‘name == CA’, ‘sidechain’]
Apply Weight (weight_option): Apply weights from WE-MD simulations when generating free energy maps.
Type: boolean
Default: True
Choices: [True, False]
Number of Histogram Bins (number_bins): Number of bins for plotting density distribution.
Type: integer
Default: 100
Maximum Free Energy Value (max_free_energy): Maximum value (in kcal/mol) for showing the 2D free energy map.
Type: decimal
Default: 50.0
Native Contact Map (is_ncfractions_on): Defines whether to plot the projection of the fraction of native contacts compared to the reference structure. The projection shows the average fraction of native contacts for all selected conformations for a given progress coordinate (pair).
Type: boolean
Default: True
Choices: [True, False]
RMSD Map (is_rmsd_on): Defines whether to plot the projection of the backbone root mean square deviation (RMSD) compared to the reference structure. The projection shows the average RMSD for all selected conformations for a given progress coordinate (pair).
Type: boolean
Default: True
Choices: [True, False]
Cryptic Pocket Analysis Advanced Settings
Important Residues (c1_select_string_key_resids): String for selecting functionally important residues e.g. active site residues or a known disease mutation. Residues should be specified in <residue number><chain id> format. For example, active site consisting of residues 11, 12 (chain A) and residues 23 (chain B) should be specified as 11A, 12A, 23B (without any trailing punctuation). These residues will be displayed along with the pocket residues in the cryptic pocket floe report to visualize the location of each cryptic pocket with respect to functionally important residues.
Type: string
Selection Range for Trajectories
Stride (a4_stride): Integer factor by which to subsample frames. (Only every stride-th frame will be read.)
Type: integer
Default: 1
A4 RMSF Analysis Memory (MB) (a4_rmsf_memory): The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
Type: decimal
Default: 1800