Write Consolidated AbXtract Dataset¶

A cube that takes processed SANGER and Quantifies Liabilities

Main Parameters¶

Parameter Name
Biophysical Conversion
Clustering Type
Keep Only Functional Sequences, Sanger
Max Distance for Levenshtein or Hamming, If Selected
Minimum Number of Points to Consider a Cluster
Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN
Region of Interest For Clustering Sanger Sequences (Uses Clustering Type Parameter)
Write the Quick Sanger Output to CSV File

Calculation Parameters¶

Biophysical Conversion (biophysical_conversion) type: boolean: Should we convert each AA sequence into physicochemical equivalent, e.g. E,D - negative charge? This is only applicable to AbScan.

Default: False

Clustering Type (cluster_type) type: string: Cluster type to apply to sequencing dataset

Default: Unique Only

Choices: AbScan, Unique Only, Levenshtein Distance, Hamming Distance

CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128

Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network

Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592

Keep Only Functional Sequences, Sanger (filter_functional) type: boolean: Eliminates non-functional sequences, truncations, stop-codons, frame-shifts

Default: False

GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16

Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on

liability database file (liabilities_db) type: file_in: Provide liabilities file (xls, csv, tsv) with 2 columns (regex pattern, name of liability)

Biophysical Liabilities (liability_choices_charge) type: string: Net charge or hydropathy liabilities to quantify

Default: [‘Charge (>1)’]

Choices: Charge (>-1), Charge (>0), Charge (>1), Charge (>2), Charge (>3), Charge (>4), Parker Hydropathy (<0.0), Parker Hydropathy (<-0.1), Parker Hydropathy (<-0.2), Parker Hydropathy (<-0.3), Parker Hydropathy (<-0.4), Parker Hydropathy (<-0.5), Parker Hydropathy (<-0.6), Parker Hydropathy (<-0.7), Parker Hydropathy (<-0.8), Parker Hydropathy (<-0.9), Parker Hydropathy (<-1.0), Parker Hydropathy (<-2.0), Parker Hydropathy (<-3.0), Parker Hydropathy (<-4.0), Parker Hydropathy (<-5.0)

Cysteine Liabilities (liability_choices_cysteine) type: string: cysteine-based liabilities to quantify

Default: [‘Unpaired Cysteine’]

Choices: Unpaired Cysteine, Any Cysteine

Deamidation Liabilities (liability_choices_deam) type: string: deamidation liabilities to quantify

Default: [‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]

Choices: N[GSTN] - Deamidation, NG - Deamidation, NS - Deamidation, NT - Deamidation, NN - Deamidation, GN[FYTG] - Deamidation, GNF - Deamidation, GNY - Deamidation, GNT - Deamidation, GNG - Deamidation, QG - Glutamine Deamidation

Glycosylation Liabilities (liability_choices_glyc) type: string: glycosylation liabilities to quantify

Default: [‘NXT/S - Glycosylation’]

Choices: NXT/S - Glycosylation, NXT - Glycosylation, NXS - Glycosylation

Hydrolysis Liabilities (liability_choices_hydrolysis) type: string: hydrolysis liabilities to quantify

Default: [‘DP - Hydrolysis’]

Choices: DP - Hydrolysis

Isomerization Liabilities (liability_choices_iso) type: string: isomerization liabilities to quantify

Default: [‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]

Choices: D[GSD] - Isomerization, DG - Isomerization, DS - Isomerization, DD - Isomerization

Polyspecificity Liabilities (liability_choices_poly) type: string: polyspecificity liabilities to quantify

Default: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’, ‘YY - Polyspecificity’]

Choices: Three Consecutive Aromatics - Polyspecificity, RR - Polyspecificity, VG - Polyspecificity, VV - Polyspecificity, YY - Polyspecificity, WW - Polyspecificity, GGG - Polyspecificity, WXW - Polyspecificity

Max Distance for Levenshtein or Hamming, If Selected (max_dist_ld_hm) type: integer: Select the maximum edit distance for two sequences to belong to same cluster group (must be >= 1 to take effect). Works if Levenshtein Distance or Hamming Distance selected for Clustering Type. See Hidden Parameters for AbScan (though do not recommend Abscan for N<=200)

Default: 0 , Max: 50

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592

Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300

Minimum Number of Points to Consider a Cluster (min_pts) type: integer: This is the minimum number of points that will be considered a cluster.

Default: 2

Indicate whether the ABSCAN should utilize OPTICS (preferred) or DBSCAN (optics_or_dbscan) type: string: Default base algorithm to identify clusters in an unsupervised manner. Both methods use an automated application of the Elbow Estimation method, but OPTICS uses this as a max as opposed to preset value so more optimal for automation.

Default: OPTICS

Choices: OPTICS, DBSCAN

Region of Interest For Clustering Sanger Sequences (Uses Clustering Type Parameter) (roi) type: string: Indicate the region of interest for processing, only top representative full-length sequence will be kept. IF INPUT IS ILLUMINA WILL ONLY USE CDR3 (CHAIN_1/UPSTREAM CHAIN) CLUSTERING.

Default: CDR3 Chain_2 (Downstream Chain)

Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length

Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Write the Quick Sanger Output to CSV File (write_to_csv_file) type: boolean: Allows the option to write to CSV after the AbXtract Processing, NGS Only file at the cost of additional time. If not, can do this in separate step. Writes to empty file if turned off.

Default: True

Hardware Parameters¶

Machine hardware requirements

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592
GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16
CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Metrics Parameters¶

Cube Metric Parameters

Metric Period (None) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Cube Metrics (None) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network