B2. Trajectory Analysis (for Cryptic Pockets): Cluster Conformations¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/Molecular Dynamics
Solution-based/Virtual-screening/Target Preparation
Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling
Solution-based/Target Identification/Target Preparation/Pocket Detection
Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection
Role-based/Computational Chemist
Task-based/Target Prep & Analysis/Pocket Detection
Description
This floe performs clustering analysis of trajectory analysis output (‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’) computed by respective Trajectory Analysis Floes.
Promoted Parameters
Title in user interface (promoted name)
Inputs from Protein Sampling
Protein Sampling (Weighted Ensemble MD Simulation) Dataset (westdata_in): This is a ‘Protein Sampling Dataset’ output generated by ‘A3a. Protein Sampling (for Cryptic Pockets): Run a Weighted Ensemble MD Simulation’ or ‘A3b. Protein Sampling (for Cryptic Pockets): Continue a Weighted Ensemble MD Simulation’. The dataset should come from the most recent Protein Sampling job run for a given protein.
Required
Type: data_source
Topology File (top_file): PDB file specifying the system topology. This file is generated by
the ‘A1. Protein Sampling (for Cryptic Pockets): Solvate and Equilibrate Target Protein’ Floe.
Required
Type: file_in
Inputs from Trajectory Analysis
Trajectory Analysis Dataset (data_in): This dataset containing trajectory analysis output is generated by one of the ‘B1. Trajectory Analysis’ Floes: ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’.
Required
Type: data_source
Variance and Mean Dataset (variance_data_in): This dataset containing variance of ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’ is generated by one of the ‘B1. Trajectory Analysis’ Floes: ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’. This dataset should be visualized on the Analyze page to determine variance cutoff value for filtering out feature matrix.
Required
Type: data_source
Outputs
Cluster Medoids Dataset (cluster_medoids_data_out): This dataset stores atomic coordinates and features of the cluster medoids. One record is created for each cluster medoid.
Required
Type: dataset_out
Default: Cluster Medoids
Cluster Members Dataset (cluster_members_data_out): This dataset stores cluster-labels assigned to each trajectory frame. One record is created for each MD frame.
Required
Type: dataset_out
Default: Cluster Members
Cluster Report Title (analysis_report_title): Title of the analysis report for clustering.
Type: string
Default: Cluster Analysis Report
Output Trajectory Options
Trajectory Create Control (switch): Controls whether an .xtc trajectory file is generated containing all of the representative conformations for the medoid cluster centers.
Required
Type: boolean
Default: True
Choices: [True, False]
Cluster Medoids Trajectory (traj_out): Trajectory file to save coordinates of cluster medoids.
Required
Type: file_out
Default: Cluster_medoids_traj.xtc
(Optional) Advanced Clustering Options
Number of Clusters (n_clusters): Number of clusters to be generated by K-Means clustering method. By default the number of clusters are determined from the total number of conformations in the input dataset. The users are not required to determine the number of clusters unless the default values are incompatible with the cryptic pocket detection analysis. We recommend checking the Clustering Report generated by this Floe to determine the compatibility.
Type: integer
Clustering Method (cluster_method): Scikit-learn method for performing clustering analysis.
Type: string
Choices: [‘Agglomerative’, ‘DBSCAN’, ‘K-Means’, ‘Spectral’, ‘Weighted DBSCAN’, ‘Weighted K-Means’]
Clustering Parameters (JSON) (cluster_params_json_file): The JSON file should contain clustering parameters to be used for clustering conformations. These parameters are specific to the Clustering Method selected by the user. An example JSON file for each Clustering Method is provided in the tutorial.
Type: file_in
Variance Cutoff (variance_cutoff): Variance cutoff value used as threshold to filter out low-variance elements of feature vectors before clustering. A default value of 0.0 indicates that the feature matrix will not be filtered to remove feature elements based on their variance. User can view feature_variance_out dataset generated by MD Feature analysis Floe on analyze page to make decision on this cutoff value.
Type: decimal
Scaling Method (feature_scaler): Scikit-learn scaling method for preprocessing data before clustering analysis. By default, no preprocessing is done.
Type: string
Choices: [‘MinMaxScaler’, ‘StandardScaler’]