Antibody SiteHopper-based Clustering

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/SPRUCE

  • Product-based/SiteHopper

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/Analysis

  • Solution-based/Hit to Lead/Target Preparation/Structural Data Preparation

  • Solution-based/Biologics/Antibody Design/Target Preparation/CDR Analysis

  • Solution-based/Biologics/Antibody Design/Target Preparation/Surface Patch Analysis

  • Task-based/Target Prep & Analysis/Protein Similarity Search

Description

This floe takes all antibody structures from the input dataset(s) and clusters them based on the CDR surface patches. These patches are based on SiteHopper patch generation. Input systems could be of a single antibody in multiple configurations, difference antibodies, or a combination of both.

Limitations: Due to the limits of clustering, this floe is not suitable for systems with only a handful of structures. The greater the number of input structures, the better clusters can be defined.

Potential Input Sources: Antibody Sequences to 3D Models Floe, Antibody Experimental Structure Prep Floe

Promoted Parameters

Title in user interface (promoted name)

  • Distance Cutoff (Distance_cutoff) type: decimal: Sets the distance cutoff when running clustering on the distance matrix.
    Default: 2.0
  • Cube memory for NxN cube (aggregator_memory) type: decimal: Controls the memory needed to processes the NxN matrix. Memory requirement is dependent on input size N=10k ~0.1GB, N=100k ~10GB.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Output dataset of centroid records (centroids) type: dataset_out: Output dataset to write to
  • Chunk Size (chunk_size) type: integer: Control chunk size for patch overlays.
    Default: 50
  • Failure output dataset of records (fail_out) type: dataset_out: Output dataset to write to
  • Input dataset of 3D Antibodies (in) type: data_source: The dataset(s) to read records from
  • Sequence Numbering Scheme (numbering_scheme) type: string: This parameter sets the numbering scheme applied to antibodies.
    Default: IMGT
    Choices: IMGT, Chothia, Martin, Kabat
  • Output dataset of cluster records (out) type: dataset_out: Output dataset to write to

Titles of required parameters (promoted names)

  • Collection Name (collection_name) type: collection_sink: Name of the collection to create
    Default: temp_ab_sh_patch_collection
  • Collection Name (collection_name) type: collection_sink: Name of the collection to create
    Default: temp_ab_sh_input_collection

Optional parameters (promoted names)

  • Input dataset of 3D Antibodies (data_in) type: data_source: The dataset(s) to read records from
  • Sequence Numbering Scheme (numbering_scheme) type: string: This parameter sets the numbering scheme applied to antibodies.
    Default: IMGT
    Choices: IMGT, Chothia, Martin, Kabat
  • Output dataset of centroid records (data_out) type: dataset_out: Output dataset to write to
  • Distance Cutoff (distance_cutoff) type: decimal: Sets the distance cutoff when running clustering on the distance matrix.
    Default: 2.0
  • Cube memory for NxN cube (memory_mb) type: decimal: Controls the memory needed to processes the NxN matrix. Memory requirement is dependent on input size N=10k ~0.1GB, N=100k ~10GB.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Output dataset of cluster records (data_out) type: dataset_out: Output dataset to write to
  • Failure output dataset of records (data_out) type: dataset_out: Output dataset to write to
  • Chunk Size (chunk_size) type: integer: Control chunk size for patch overlays.
    Default: 50