Automated Cryptic Pocket Detection with Probe Occupancy Analysis
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/Molecular Dynamics
Solution-based/Virtual-screening/Target Preparation
Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling
Solution-based/Target Identification/Target Preparation/Pocket Detection
Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection
Role-based/Computational Chemist
Task-based/Target Prep & Analysis/Pocket Detection
Description
Caution: This floe can be expensive. This floe uses weighted ensemble molecular dynamics (WEMD) to sample the conformations of a target protein and performs a cryptic pocket search using probe occupancy analysis. The expense scales with the size of the target protein and the number of weighted ensemble molecular dynamics iterations performed. Note that the job can cost a few thousand dollars if the number of iterations for the simulation is too high (>500). For large systems (consisting of >400 residues), the job can cost a few thousand dollars for >50 iterations. In previous versions, this floe was run as a series of A1-C2 floes.
Promoted Parameters
Title in user interface (promoted name)
Input Data
Target Protein (data_in): A protein structure prepared by SPRUCE (preferred method). If more than one record is provided, only the first will be processed by this floe.
Required
Type: data_source
Output Data
Solvated and Equilibrated Protein (a1_data_out): Dataset to which to write the solvated and equilibrated protein.
Type: dataset_out
Default: Solvated and Equilibrated Protein
Output Dataset (a2_data_out): Dataset containing protein normal modes, their mode collectivity, and variance values.
Required
Type: dataset_out
Default: Protein Normal Modes
Output Dataset (a3_data_out): Output dataset containing the current iteration number, simulation settings, and a reference design unit. This dataset contains only one record and can be viewed on the Orion 3D & Analyze page to track the total number of iterations that have been completed.
Required
Type: dataset_out
Default: Protein Sampling Summary Table
Collection Name (a3a_collection_name): Name of the collection to create.
Required
Type: collection_sink
Default: Protein Sampling Data
Failure Report (failure_report): Output report to generate upon failure.
Type: string
Default: Failure Report
Floe Report Output Collection (floe_report_out_a4):
Required
Type: string
Default: Weighted Ensemble MD Analysis Floe Report
Pocket Receptors (pocket_receptors_dataset): A dataset containing one or two receptors for each pocket identified using probe occupancy analysis. The dataset can be visualized on the 3D & Analyze page with 3D layout.
Required
Type: dataset_out
Default: Pocket Receptors (Probe Occupancy Analysis)
Cryptic Pocket Analysis Floe Report (floe_report_out): Floe report containing interactive network plot of cryptic pockets with structural visualization of the cryptic pockets.
Required
Type: string
Default: Cryptic Pockets Floe Report (Probe Occupancy Analysis)
Medoids Structures (medoids_dus): A dataset containing medoids structures.
Required
Type: dataset_out
Default: Medoids Structures (Probe Occupancy Analysis)
Superposed Holo Design Units (holo_data_out): Superposed holo design units dataset to be used for validation. The superposed design units of holo structures can be compared with the pocket receptor design units to check the overlap between the ligand in the holo structure and the pocket receptors.
Type: dataset_out
Default: Superposed holo design units
Protein Solvation and Equilibration Advanced Settings
Protein Parameters (protein_forcefield): Force field parameters for the protein.
Required
Type: string
Default: Amber14SB
Choices: [‘Amber14SB’, ‘Amber99SB’, ‘Amber99SBildn’, ‘AmberFB15’]
Ligand Parameters (ligand_forcefield): Force field parameters for the ligand (if present).
Required
Type: string
Default: OpenFF_2.0.0
Choices: [‘Gaff_1.81’, ‘Gaff_2.11’, ‘OpenFF_1.1.1’, ‘OpenFF_1.2.1’, ‘OpenFF_1.3.1’, ‘OpenFF_2.0.0’, ‘OpenFF_2.2.0’, ‘Smirnoff99Frosst’, ‘Custom’]
Padding Distance (padding_distance): The padding distance between the solute and the box edge (in Å).
Type: decimal
Default: 10
Salt Concentration (salt_concentration): The salt concentration (in millimolar). This does not include the ions required to neutralize the system.
Type: decimal
Default: 150
Normal Mode Calculation Advanced Settings
System Selection String (selstr): Selection strings for selecting a subset of the structure to be used to perform the ANM calculations (i.e., system). Use the design unit in the output dataset from the Solvate and Equilibrate Target Protein floe as a reference when selecting the residue and chains for the selection string(s). The general syntax follows “[chain_id]:[from_res_num]~[to_res_num]”. In practice, this would look like “A:1~150” when selecting residues 1 to 150 on chain A of the input protein. Multiple residue ranges are supported by inputting multiple entries. See the tutorial for more examples.
Type: string
System-Environment Framework (sys_env_method):
Method for performing the system-environment calculations. Slice: simply extract the specified portion (i.e., system) of the entire motion. The resulting modes are normalized but their orthogonality may be lost. The eigenvalue/variance of the modes are unaltered so they still correspond to the energy of the entirety, i.e., system and environment. Reduce: calculate system’s motion while taking account of the effects from the environment. The resulting modes are orthonormal and their eigenvalue/variance correspond to the energy of only the system part.
Type: string
Default: Reduce
Choices: [‘Disabled’, ‘Slice’, ‘Reduce’]
Environment Selection String (substr): Leave empty for selecting the entire protein as the environment (default). Selection strings for selecting the environment for performing the system-environment analysis. Environment must include the system specified by “System Selection String”. The general syntax is the same as the “System Selection String”.
Type: string
Spring Constant (gamma): Spring Constant of the elastic network.
Type: decimal
Default: 1.0
Cutoff Distance (Å) (cutoff): Cutoff distance for pairwise interactions.
Type: decimal
Default: 15.0
Mode Filtering Property (property): Which property mode filtering is based on.
Required
Type: string
Default: Collectivity
Choices: [‘Variance’, ‘Collectivity’]
Weighted Ensemble MD Advanced Settings
Iterations (a3a_iterations): Number of iterations for the WE simulation. Suggested values: 50 iterations (total) for proteins with <200 residues; 100 iterations (total) for proteins with <600 residues. Note that the job can cost a few thousand dollars if this parameter is too high (>500).
Required
Type: integer
Default: 50
Number of Frames Per Iteration (a3a_frames_per_iteration): Number of frames saved in each WE iteration.
Type: integer
Default: 5
Iteration Interval (Tau) (a3a_iteration_interval): Length of each WE iteration in picoseconds.
Type: decimal
Default: 100.0
Number of Bins per Dimension (a3a_nbins): The number of bins along each normal mode. Either a single value or a sequence of values (one for each mode) may be provided. Bin placement is controlled by the Minimal Adaptive Binning (MAB) scheme (P.A. Torrillo, A.T. Bogetti, L.T. Chong, J. Phys. Chem. A 2021, 125, 7, 1642–1649).
Required
Type: integer
Default: [10]
Trajectories Per Bin (a3a_walkers_per_bin): Number of trajectories per bin for the WEMD simulation.
Required
Type: integer
Default: 10
Smallest Allowed Weight (Log) (a3a_smallest_allowed_log_weight): The smallest allowed weight for splitting in logarithmic scale.
Required
Type: decimal
Default: -310.0
Log Verbosity (verbosity): verbose level
Type: string
Default: debug
Choices: [‘info’, ‘warning’, ‘error’, ‘debug’, ‘ddebug’]
Parameters for Free Energy Maps/Surfaces (optional)
Selection String (selstr_a4): Selection strings for defining a subset of the structure to be used to perform the trajectory analysis (NC fraction and RMSD analysis). The general syntax follows “[chain_id:][from_res_num][~to_res_num]”. Multiple residue ranges are supported by inputting multiple entries.
Type: string
Atoms Selection Modifier (select_modifier_a4): Additional selection modifier to specify required group of atoms for PMF, RMSD and NC-fractions
Type: string
Default: protein
Choices: [‘protein’, ‘backbone’, ‘alpha-carbons’, ‘sidechain’]
Apply Weight (is_weight_on): Apply weights from WEMD simulations when generating free energy maps.
Type: boolean
Default: True
Choices: [True, False]
Number of Histogram Bins (number_bins): Number of bins for plotting density distribution.
Type: integer
Default: 100
Maximum Potential of Mean Force Value (max_free_energy): Maximum on color scale (in kcal/mol) for 2D colormap of pPotential of mean force.
Type: decimal
Default: 50.0
Native Contact Map (is_ncfractions_on): Defines whether to plot the projection of the fraction of native contacts compared to the reference structure. The projection shows the average fraction of native contacts for all selected conformations for a given progress coordinate (pair).
Type: boolean
Default: True
Choices: [True, False]
RMSD Map (is_rmsd_on): Defines whether to plot the projection of the backbone root mean square deviation (RMSD) compared to the reference structure. The projection shows the average RMSD for all selected conformations for a given progress coordinate (pair).
Type: boolean
Default: True
Choices: [True, False]
Cryptic Pocket Analysis Advanced Settings
Important Residues (c2_select_string_key_resids): String for selecting functionally important residues such as active site residues or a known disease mutation. Residues should be specified in <residue number><chain id> format. For example, active site consisting of residues 11, 12 (chain A) and residues 23 (chain B) should be specified as 11A, 12A, 23B (without any trailing punctuation). These residues will be displayed along with the pocket residues in the cryptic pocket floe report to visualize the location of each cryptic pocket with respect to functionally important residues.
Type: string
Minimum Ligandability Score (c2_lig_score_cutoff): Lower bound on the normalized ligandability score of pockets that are sent to the output dataset.
Type: decimal
Default: 0.05
Holo Design Units (Optional) (holo_data_in): Holo design units to be used for validation. These design units will be superposed with the pocket receptor design units.
Type: data_source