Cluster (AbScan) Antibody Binding Regions - AbXtractΒΆ

Clusters sequence using AbScan (default), 100% homology, or edit distance (Levenshstein or Hamming) criteria. If AbScan is used, you have the option to condense dataset using biophysical conversion to reduce the dataset from 20 AAs to 11 using underlying physicochemical properties (e.g. aromatics, small aliphatic, etc), default is turned ON.

Promoted Parameters

  • Datasets to Cluster (data_source) : Input all the datasets (usually multiple) to cluster by roi. Typically should be >500 records. If # unique rois >500k may expect long run times, several hours to days, and may need to adjust system memory requirements.
  • Biophysical Conversion (boolean) : Convert sequences to physicochemical equivalent, e.g. E,D - negative charge. This is only applicable to AbScan and does not apply to Unique Only, Levenshstein distance or Hamming distance.
    Default: True
  • Clustering Type (string) : Cluster type to apply to sequencing dataset.
    Default: AbScan
    Choices: AbScan, Unique Only, Levenshtein Distance, Hamming Distance
  • Keep Only Functional Sequences (boolean) : Eliminates non-functional sequences, truncations, stop-codons, frame-shifts.
    Default: True
  • Region of Interest For Clustering (string) : Indicate the region of interest (ROI) for clustering.
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length, Including Framework
  • Output Name of Dataset with Clustered Outputs (dataset_out) : Dataset of sequences with sequences classified by cluster.
    Default: cluster
  • Failed Dataset Output Name (dataset_out) : Contains failed records from both upstream and downstream Processes.
    Default: problematic

Hidden Parameters

  • Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN (string) : Default base algorithm to identify clusters in an unsupervised manner. Both methods use an automated application of the Elbow Estimation method, but OPTICS uses this as a max as opposed to preset value so more optimal for automation.
    Default: OPTICS
    Choices: OPTICS, DBSCAN
  • Max Distance for Levenshtein or Hamming, If Selected (integer) : Select the maximum edit distance for two sequences to belong to same cluster group. Only used if Levenshstein or Hamming Distance is used for clustering.
    Default: 2 Max: 50
  • Minimum Number of Points to Define a Cluster (integer) : This is the minimum number of points that will be considered a cluster.
    Default: 2
  • Minimum Count for the Region of Interest (ROI) (integer) : This will set the minimum count for a given region of interest, all below will be removed.
    Default: 1 Min: 1 Max: 10000000000
  • Minimum Percent for the Region of Interest (ROI) (decimal) : This will set the minimum percent for a given region of interest, all below will be removed.
    Default: 1e-12 Min: 1e-12 Max: 100
  • Xi Minimum Steepness of Reachability Plot (decimal) : Float value between 0 and 1. Value sets the minimum steepness on the reachability plot to define cluster boundary. An upwards point in reachability is essentially the ratio from one point to successor being at most 1-xi.
    Default: 0.000 Max: 1.0