Clustering by AbScan, Edit Distance, or 100% Homology
Takes in set of records and will cluster the region of interest (roi) using unsupervised clustering or other standard cluster approaches such as condensing by unique sequences only, Levenshtein distance or hamming distance. The output will be a set of sequences that have a cluster assigned to it based on the roi.
Main Parameters
Parameter Name |
---|
Biophysical Conversion |
Clustering Type |
Keep Only Functional Sequences |
Max Distance for Levenshtein or Hamming, If Selected |
Minimum Number of Points to Define a Cluster |
Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN |
Region of Interest For Clustering |
Minimum Count for the Region of Interest (ROI) |
Minimum Percent for the Region of Interest (ROI) |
Write the AbScan Output to CSV File |
Xi Minimum Steepness of Reachability Plot |
Parameter Details
Calculation Parameters
Biophysical Conversion (biophysical_conversion) type: boolean: Convert sequences to physicochemical equivalent, e.g. E,D - negative charge. This is only applicable to AbScan and does not apply to Unique Only, Levenshstein distance or Hamming distanceDefault: True Clustering Type (cluster_type) type: string: Cluster type to apply to sequencing datasetDefault: AbScanChoices: AbScan, Unique Only, Levenshtein Distance, Hamming Distance CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Keep Only Functional Sequences (filter_functional) type: boolean: Eliminates non-functional sequences, truncations, stop-codons, frame-shiftsDefault: True GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Max Distance for Levenshtein or Hamming, If Selected (max_dist_ld_hm) type: integer: Select the maximum edit distance for two sequences to belong to same cluster group.Default: 2 , Max: 50 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Minimum Number of Points to Define a Cluster (min_pts) type: integer: This is the minimum number of points that will be considered a cluster.Default: 2 Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN (optics_or_dbscan) type: string: Default base algorithm to identify clusters in an unsupervised manner. Both methods use an automated
- application of the Elbow Estimation method, but OPTICS uses this as a max as opposed to preset value so more optimal for automation.
- Default: OPTICSChoices: OPTICS, DBSCAN
- Region of Interest For Clustering (roi) type: string: Indicate the region of interest (ROI) for processing. If Illumina, will only use upstream ‘cdr3_aa_1’ for clustering for AbScan, Levenshtein and Hamming. If option ‘Unique Only’ with Illumina, will condense according specified ROI according to chain_1.Default: CDR3 Chain_2 (Downstream Chain)Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length
- Minimum Count for the Region of Interest (ROI) (roi_count) type: integer: This will set the minimum count for a given region of interest, all below will be removed.Default: 1 , Min: 1, Max: 10000000000
- Minimum Percent for the Region of Interest (ROI) (roi_percent) type: decimal: This will set the minimum percent for a given region of interest, all below will be removed.Default: 1e-12 , Min: 1e-12, Max: 100
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Write the AbScan Output to CSV File (write_to_csv_file) type: boolean: Allows the option to write to CSV after the AbXtract Processing at the cost of additional time. If not, can do this in separate step.Default: True
- Xi Minimum Steepness of Reachability Plot (xi) type: decimal: Float value between 0 and 1. Value sets the minimum steepness on the reachability plot to define cluster boundary. An upwards
- point in reachability is essentially the ratio from one point to successor being at most 1-xi.
- Default: 0.0 , Max: 1.0
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network