AbXtract Processing, NGS Only

Processes NGS FASTQ files and SANGER sequences, clusters by region of interest (ROI), calculates fold enrichment and/or relative abundance by ROI, quantifies liabilities across all the cdrs,

Main Parameters

Parameter Name
Biophysical Conversion
Clustering Type
Early Round Absence Penalty
Late Round Absence Penalty
Edit Distance Method For NGS Overlay
Exclude Values That Did Not Match In-Line Barcode
Keep Only Functional Sequences
Max Distance for Levenshtein or Hamming, If Selected
Minimum Count for the Full-Length Sequence by Sample
Minimum Number of Points to Consider a Cluster
Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN
Edit Distance for Overlay of NGS Barcode Groups
Region of Interest For Enrichment and Clustering
Minimum Count for the Region of Interest (ROI)
Minimum Percent for the Region of Interest (ROI)
Tabulate Ratio to Top Most Abundant Clone in Sequence or By Selected ROI
Write the Downstream Output to CSV File
Xi Minimum Steepness of Reachability Plot

Calculation Parameters

Biophysical Conversion (biophysical_conversion) type: boolean: Should we convert each AA sequence into physicochemical equivalent, e.g. E,D - negative charge

This is only applicable to AbScan.

Default: True

Clustering Type (cluster_type) type: string: Cluster type to apply to sequencing dataset

Default: AbScan

Choices: AbScan, Unique Only, Levenshtein Distance, Hamming Distance
Early Round Absence Penalty (corr_factor_1) type: integer: Indicates the divisor in event sequence does not appear in a early round population min(round 2) / (correction factor 1). Greater value adds greater weight to enriched populations not appearing in earlier rounds.

Default: 2
Late Round Absence Penalty (corr_factor_2) type: integer: Indicates the divisor in event sequence does not appear in a late round population min(round 3) / (correction factor 2). Greater value adds greater weight to enriched populations not appearing in earlier rounds.

Default: 10
CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128
Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592
Edit Distance Method For NGS Overlay (edit_distance_method_overlap) type: string: Indicate the type of edit distance method to apply for the overlap to complete population. NOTE: only in effect if edit distance criterion for ‘Overlay of NGS Barcode Groups’ = 0

Default: Levenshtein Distance

Choices: Hamming Distance, Levenshtein Distance
Exclude Values That Did Not Match In-Line Barcode (exclude_unknown) type: boolean: If True, will exclude unknown values that did not have a barcode match, unless there is only one barcode for the entire NGS population.

Default: True
Keep Only Functional Sequences (filter_functional) type: boolean: Eliminates non-functional sequences, truncations, stop-codons, frame-shifts

Default: True
GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
liability database file (liabilities_db) type: file_in: Provide liabilities file (xls, csv, tsv) with 2 columns (regex pattern, name of liability)
Biophysical Liabilities (liability_choices_charge) type: string: Net charge or hydropathy liabilities to quantify

Default: [‘Charge (>1)’]

Choices: Charge (>-1), Charge (>0), Charge (>1), Charge (>2), Charge (>3), Charge (>4), Parker Hydropathy (<0.0), Parker Hydropathy (<-0.1), Parker Hydropathy (<-0.2), Parker Hydropathy (<-0.3), Parker Hydropathy (<-0.4), Parker Hydropathy (<-0.5), Parker Hydropathy (<-0.6), Parker Hydropathy (<-0.7), Parker Hydropathy (<-0.8), Parker Hydropathy (<-0.9), Parker Hydropathy (<-1.0), Parker Hydropathy (<-2.0), Parker Hydropathy (<-3.0), Parker Hydropathy (<-4.0), Parker Hydropathy (<-5.0)
Cysteine Liabilities (liability_choices_cysteine) type: string: cysteine-based liabilities to quantify

Default: [‘Unpaired Cysteine’]

Choices: Unpaired Cysteine, Any Cysteine
Deamidation Liabilities (liability_choices_deam) type: string: deamidation liabilities to quantify

Default: [‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]

Choices: N[GSTN] - Deamidation, NG - Deamidation, NS - Deamidation, NT - Deamidation, NN - Deamidation, GN[FYTG] - Deamidation, GNF - Deamidation, GNY - Deamidation, GNT - Deamidation, GNG - Deamidation, QG - Glutamine Deamidation
Glycosylation Liabilities (liability_choices_glyc) type: string: glycosylation liabilities to quantify

Default: [‘NXT/S - Glycosylation’]

Choices: NXT/S - Glycosylation, NXT - Glycosylation, NXS - Glycosylation
Hydrolysis Liabilities (liability_choices_hydrolysis) type: string: hydrolysis liabilities to quantify

Default: [‘DP - Hydrolysis’]

Choices: DP - Hydrolysis
Isomerization Liabilities (liability_choices_iso) type: string: isomerization liabilities to quantify

Default: [‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]

Choices: D[GSD] - Isomerization, DG - Isomerization, DS - Isomerization, DD - Isomerization
Polyspecificity Liabilities (liability_choices_poly) type: string: polyspecificity liabilities to quantify

Default: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’, ‘YY - Polyspecificity’]

Choices: Three Consecutive Aromatics - Polyspecificity, RR - Polyspecificity, VG - Polyspecificity, VV - Polyspecificity, YY - Polyspecificity, WW - Polyspecificity, GGG - Polyspecificity, WXW - Polyspecificity
Max Distance for Levenshtein or Hamming, If Selected (max_dist_ld_hm) type: integer: Select the maximum edit distance for two sequences to belong to same cluster group.

Default: 2 , Max: 50
Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592
Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Minimum Count for the Full-Length Sequence by Sample (min_count_sample) type: integer: Minimum count for the full-length sequence by the sample_name.

Default: 1 , Min: 1, Max: 10000000000
Minimum Number of Points to Consider a Cluster (min_pts) type: integer: This is the minimum number of points that will be considered a cluster.

Default: 2
Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN (optics_or_dbscan) type: string: Default base algorithm to identify clusters in an unsupervised manner. Both methods use an automated

application of the Elbow Estimation method, but OPTICS uses this as a max as opposed to preset value so more optimal for automation.

Default: OPTICS

Choices: OPTICS, DBSCAN

Edit Distance for Overlay of NGS Barcode Groups (overlap_edit_distance_overlap) type: integer: If there are multiple downstream groups, these will be compared to one another.

Default: 0 , Max: 100
Region of Interest For Enrichment and Clustering (roi) type: string: Indicate the region of interest for processing, only top representative full-length sequence will be kept

IF INPUT IS ILLUMINA WILL ONLY USE CDR3 (CHAIN_1/UPSTREAM CHAIN) FOR ENRICHMENT, RELATIVE ABUNDANCE, AND CLUSTERING.

Default: HCDR3 and LCDR3

Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length

Minimum Count for the Region of Interest (ROI) (roi_count) type: integer: This will set the minimum count for a given region of interest, all below will be removed.

Default: 1 , Min: 1, Max: 10000000000
Minimum Percent for the Region of Interest (ROI) (roi_percent) type: decimal: This will set the minimum percent for a given region of interest, all below will be removed.

Default: 1e-12 , Min: 1e-12, Max: 100
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address

Default: 64
Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
Tabulate Ratio to Top Most Abundant Clone in Sequence or By Selected ROI (tabulate_ratio_based_selected_roi) type: string: Select the cluster or sequence region to understand how frequent the 2nd, 3rd, 4th, etc., clone full-length frequency compares to top most abundant frequency in cluster or by given roi

Default: cluster

Choices: cluster, cluster_cdr3_1, cluster_cdr3_2, hcdr3+lcdr3 cluster, cdr3_aa_1, cdr3_aa_2, hcdr3+lcdr3 sequence, merged CDRs, full-length
Write the Downstream Output to CSV File (write_to_csv_file) type: boolean: Allows the option to write to CSV after the AbXtract Processing, NGS Only file at the cost of additional time. If not, can do this in separate step. Writes to empty file if turned off.

Default: True
Xi Minimum Steepness of Reachability Plot (xi) type: decimal: Float value between 0 and 1. Value sets the minimum steepness on the reachability plot to define cluster boundary. An upwards

point in reachability is essentially the ratio from one point to successor being at most 1-xi.

Default: 0.0 , Max: 1.0

Hardware Parameters

Machine hardware requirements

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address

Default: 64
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592
GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16
CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Metrics Parameters

Cube Metric Parameters

Metric Period (None) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Cube Metrics (None) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network