B2. Trajectory Analysis (for Cryptic Pockets): Cluster Conformations

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/Molecular Dynamics

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling

  • Solution-based/Target Identification/Target Preparation/Pocket Detection

  • Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection

  • Role-based/Computational Chemist

  • Task-based/Target Prep & Analysis/Pocket Detection

Description

This floe performs clustering analysis of trajectory analysis output (‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’) computed by respective Trajectory Analysis Floes.

Promoted Parameters

Title in user interface (promoted name)

Inputs from Protein Sampling

Protein Sampling (Weighted Ensemble MD Simulation) Dataset (westdata_in): This is a ‘Protein Sampling Dataset’ output generated by ‘A3a. Protein Sampling (for Cryptic Pockets): Run a Weighted Ensemble MD Simulation’ or ‘A3b. Protein Sampling (for Cryptic Pockets): Continue a Weighted Ensemble MD Simulation’. The dataset should come from the most recent Protein Sampling job run for a given protein.

  • Required

  • Type: data_source

Topology File (top_file): PDB file specifying the system topology. This file is generated by

the ‘A1. Protein Sampling (for Cryptic Pockets): Solvate and Equilibrate Target Protein’ Floe.

  • Required

  • Type: file_in

Inputs from Trajectory Analysis

Trajectory Analysis Dataset (data_in): This dataset containing trajectory analysis output is generated by one of the ‘B1. Trajectory Analysis’ Floes: ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’.

  • Required

  • Type: data_source

Variance and Mean Dataset (variance_data_in): This dataset containing variance of ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’ is generated by one of the ‘B1. Trajectory Analysis’ Floes: ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’. This dataset should be visualized on the Analyze page to determine variance cutoff value for filtering out feature matrix.

  • Required

  • Type: data_source

Outputs

Cluster Medoids Dataset (cluster_medoids_data_out): This dataset stores atomic coordinates and features of the cluster medoids. One record is created for each cluster medoid.

  • Required

  • Type: dataset_out

  • Default: Cluster Medoids

Cluster Members Dataset (cluster_members_data_out): This dataset stores cluster-labels assigned to each trajectory frame. One record is created for each MD frame.

  • Required

  • Type: dataset_out

  • Default: Cluster Members

Cluster Report Title (analysis_report_title): Title of the analysis report for clustering.

  • Type: string

  • Default: Cluster Analysis Report

Output Trajectory Options

Trajectory Create Control (switch): Controls whether an .xtc trajectory file is generated containing all of the representative conformations for the medoid cluster centers.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Cluster Medoids Trajectory (traj_out): Trajectory file to save coordinates of cluster medoids.

  • Required

  • Type: file_out

  • Default: Cluster_medoids_traj.xtc

(Optional) Advanced Clustering Options

Number of Clusters (n_clusters): Number of clusters to be generated by K-Means clustering method. By default the number of clusters are determined from the total number of conformations in the input dataset. The users are not required to determine the number of clusters unless the default values are incompatible with the cryptic pocket detection analysis. We recommend checking the Clustering Report generated by this Floe to determine the compatibility.

  • Type: integer

Clustering Method (cluster_method): Scikit-learn method for performing clustering analysis.

  • Type: string

  • Choices: [‘Agglomerative’, ‘DBSCAN’, ‘K-Means’, ‘Spectral’, ‘Weighted DBSCAN’, ‘Weighted K-Means’]

Clustering Parameters (JSON) (cluster_params_json_file): The JSON file should contain clustering parameters to be used for clustering conformations. These parameters are specific to the Clustering Method selected by the user. An example JSON file for each Clustering Method is provided in the tutorial.

  • Type: file_in

Variance Cutoff (variance_cutoff): Variance cutoff value used as threshold to filter out low-variance elements of feature vectors before clustering. A default value of 0.0 indicates that the feature matrix will not be filtered to remove feature elements based on their variance. User can view feature_variance_out dataset generated by MD Feature analysis Floe on analyze page to make decision on this cutoff value.

  • Type: decimal

Scaling Method (feature_scaler): Scikit-learn scaling method for preprocessing data before clustering analysis. By default, no preprocessing is done.

  • Type: string

  • Choices: [‘MinMaxScaler’, ‘StandardScaler’]