B2. Trajectory Analysis (for Cryptic Pockets): Cluster Conformations¶

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Product-based/Molecular Dynamics

Solution-based/Virtual-screening/Target Preparation

Solution-based/Hit to Lead/Target Preparation/Enhanced Sampling

Solution-based/Target Identification/Target Preparation/Pocket Detection

Solution-based/Hit to Lead/Target Preparation/Cryptic Pocket Detection

Role-based/Computational Chemist

Task-based/Target Prep & Analysis/Pocket Detection

Description

This floe performs clustering analysis of trajectory analysis output (‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’) computed by respective Trajectory Analysis Floes.

Promoted Parameters

Title in user interface (promoted name)

Inputs from Protein Sampling

Protein Sampling (Weighted Ensemble MD Simulation) Dataset (westdata_in): This is a ‘Protein Sampling Dataset’ output generated by ‘A3a. Protein Sampling (for Cryptic Pockets): Run a Weighted Ensemble MD Simulation’ or ‘A3b. Protein Sampling (for Cryptic Pockets): Continue a Weighted Ensemble MD Simulation’. The dataset should come from the most recent Protein Sampling job run for a given protein.

Required

Type: data_source

Topology File (top_file): PDB file specifying the system topology. This file is generated by

the ‘A1. Protein Sampling (for Cryptic Pockets): Solvate and Equilibrate Target Protein’ Floe.

Required

Type: file_in

Inputs from Trajectory Analysis

Trajectory Analysis Dataset (data_in): This dataset containing trajectory analysis output is generated by one of the ‘B1. Trajectory Analysis’ Floes: ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’.

Required

Type: data_source

Variance and Mean Dataset (variance_data_in): This dataset containing variance of ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’ is generated by one of the ‘B1. Trajectory Analysis’ Floes: ‘Residue-Cosolvent Distances’ or ‘Per-Residue SASA’. This dataset should be visualized on the Analyze page to determine variance cutoff value for filtering out feature matrix.

Required

Type: data_source

Outputs

Cluster Medoids Dataset (cluster_medoids_data_out): This dataset stores atomic coordinates and features of the cluster medoids. One record is created for each cluster medoid.

Required

Type: dataset_out

Default: Cluster Medoids

Cluster Members Dataset (cluster_members_data_out): This dataset stores cluster-labels assigned to each trajectory frame. One record is created for each MD frame.

Required

Type: dataset_out

Default: Cluster Members

Cluster Report Title (analysis_report_title): Title of the analysis report for clustering.

Type: string

Default: Cluster Analysis Report

Output Trajectory Options

Trajectory Create Control (switch): Controls whether an .xtc trajectory file is generated containing all of the representative conformations for the medoid cluster centers.

Required

Type: boolean

Default: True

Choices: [True, False]

Cluster Medoids Trajectory (traj_out): Trajectory file to save coordinates of cluster medoids.

Required

Type: file_out

Default: Cluster_medoids_traj.xtc

(Optional) Advanced Clustering Options

Number of Clusters (n_clusters): Number of clusters to be generated by K-Means clustering method. By default the number of clusters are determined from the total number of conformations in the input dataset. The users are not required to determine the number of clusters unless the default values are incompatible with the cryptic pocket detection analysis. We recommend checking the Clustering Report generated by this Floe to determine the compatibility.

Type: integer

Clustering Method (cluster_method): Scikit-learn method for performing clustering analysis.

Type: string

Choices: [‘Agglomerative’, ‘DBSCAN’, ‘K-Means’, ‘Spectral’, ‘Weighted DBSCAN’, ‘Weighted K-Means’]

Clustering Parameters (JSON) (cluster_params_json_file): The JSON file should contain clustering parameters to be used for clustering conformations. These parameters are specific to the Clustering Method selected by the user. An example JSON file for each Clustering Method is provided in the tutorial.

Type: file_in

Variance Cutoff (variance_cutoff): Variance cutoff value used as threshold to filter out low-variance elements of feature vectors before clustering. A default value of 0.0 indicates that the feature matrix will not be filtered to remove feature elements based on their variance. User can view feature_variance_out dataset generated by MD Feature analysis Floe on analyze page to make decision on this cutoff value.

Type: decimal

Scaling Method (feature_scaler): Scikit-learn scaling method for preprocessing data before clustering analysis. By default, no preprocessing is done.

Type: string

Choices: [‘MinMaxScaler’, ‘StandardScaler’]