Quick Sanger from DNA or Amino Acid Sequence Files - AbXtract

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Biologics/Antibody Design

  • Role-based/Bioinformatician

  • Role-based/Biologist

  • Product-based/AbXtract

Description

This will process single file in either FASTA, FASTQ, TSV, CSV, or EXCEL format. Each sequence ID typically represents unique well ID. If using TSV, CSV or EXCEL files as input, should format each row as follows:

  1. column A = id

  2. column B = sequence (dna or amino acid), WITHOUT header.

This will condense AA sequences based on user-defined option (Default = Full-Length, Framework included). Redundant sequences will be condensed and the ‘id’ field will contain a ‘:’ separated list of IDs (typically Well ID). This FLOE will calculate liabilities and biophysical properties by CDR (length, net charge, Parker hydropathy).

Promoted Parameters

Title in user interface (promoted name)

Optional DATASET Inputs from Sanger

Optional Sanger Dataset for Input (typically upstream processed datasets) (optional_input): The optional dataset(s) to read records from. To further consolidate on region of interest, run CONDENSE FLOE subsequently.

  • Type: data_source

Regions of Interest (ROI) for consolidation and overlap

Region of Interest (ROI) For Condensing Sequences (roi): This will condense the Sanger sequences based on the ROI based rank ordered on abundance. IMPORTANT: this will remove full-length sequences and only keep most abundant full-length count. If two sequences have same full-length count, then it will pick one or the other.

  • Required

  • Type: string

  • Default: Full-Length

  • Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]

Shared Region of Interest (ROI) Sequences (shared_roi): This will provide an overlap_roi output that shows all the individual wells that share the same id.

  • Required

  • Type: string

  • Default: CDR3 Chain_2 (Downstream Chain)

  • Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]

Key Liability Parameters

Polyspecificity Liabilities (liability_choices_poly): polyspecificity liabilities to quantify

  • Type: string

  • Default: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’, ‘YY - Polyspecificity’]

  • Choices: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘YY - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’]

Deamidation Liabilities (liability_choices_deam): deamidation liabilities to quantify

  • Type: string

  • Default: [‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]

  • Choices: [‘N[GSTN] - Deamidation’, ‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GN[FYTG] - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]

Glycosylation Liabilities (liability_choices_glyc): glycosylation liabilities to quantify

  • Type: string

  • Default: [‘NXT/S - Glycosylation’]

  • Choices: [‘NXT/S - Glycosylation’, ‘NXT - Glycosylation’, ‘NXS - Glycosylation’]

Hydrolysis Liabilities (liability_choices_hydrolysis): hydrolysis liabilities to quantify

  • Type: string

  • Default: [‘DP - Hydrolysis’]

  • Choices: [‘DP - Hydrolysis’]

Isomerization Liabilities (liability_choices_iso): isomerization liabilities to quantify

  • Type: string

  • Default: [‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]

  • Choices: [‘D[GSD] - Isomerization’, ‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]

Biophysical Liabilities (liability_choices_charge): Net charge or hydropathy liabilities to quantify

  • Type: string

  • Default: [‘Charge (>1)’]

  • Choices: [‘Charge (>-1)’, ‘Charge (>0)’, ‘Charge (>1)’, ‘Charge (>2)’, ‘Charge (>3)’, ‘Charge (>4)’, ‘Parker Hydropathy (<0.0)’, ‘Parker Hydropathy (<-0.1)’, ‘Parker Hydropathy (<-0.2)’, ‘Parker Hydropathy (<-0.3)’, ‘Parker Hydropathy (<-0.4)’, ‘Parker Hydropathy (<-0.5)’, ‘Parker Hydropathy (<-0.6)’, ‘Parker Hydropathy (<-0.7)’, ‘Parker Hydropathy (<-0.8)’, ‘Parker Hydropathy (<-0.9)’, ‘Parker Hydropathy (<-1.0)’, ‘Parker Hydropathy (<-2.0)’, ‘Parker Hydropathy (<-3.0)’, ‘Parker Hydropathy (<-4.0)’, ‘Parker Hydropathy (<-5.0)’]

Cysteine Liabilities (liability_choices_cysteine): cysteine-based liabilities to quantify

  • Type: string

  • Default: [‘Unpaired Cysteine’]

  • Choices: [‘Unpaired Cysteine’, ‘Any Cysteine’]

Process by Population Parameters (OPTIONAL)

Process Populations Separately (write_group): Set this ON to prevent consolidation of shared sequences between different populations when analyzing.

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Clone name delimiter (delimiter): Use this delimiter to identify population from clone name

  • Type: string

  • Default: _

First part of clone name defining population (population_start): Use a 1-indexed integer to indicate start of population name after splitting on delimiter

  • Type: integer

  • Default: 1

Last part of clone name defining population (population_end): Use a 1-indexed integer to indicate end of population name after splitting on delimiter

  • Type: integer

  • Default: -1

Key Clustering Parameters

Clustering Type (cluster_type): Cluster type to apply to sequencing dataset

  • Required

  • Type: string

  • Default: Unique Only

  • Choices: [‘AbScan’, ‘Unique Only’, ‘Levenshtein Distance’, ‘Hamming Distance’]

Max Distance for Levenshtein or Hamming, If Selected (max_dist_ld_hm): Select the maximum edit distance for two sequences to belong to same cluster group (must be >= 1 to take effect). Works if Levenshtein Distance or Hamming Distance selected for Clustering Type. See Hidden Parameters for AbScan (though do not recommend Abscan for N<=200)

  • Required

  • Type: integer

  • Default: 0

Region of Interest For Clustering Sanger Sequences (Uses Clustering Type Parameter) (roi_cluster): Indicate the region of interest for processing, only top representative full-length sequence will be kept IF INPUT IS ILLUMINA WILL ONLY USE CDR3 (CHAIN_1/UPSTREAM CHAIN) CLUSTERING.

  • Required

  • Type: string

  • Default: CDR3 Chain_2 (Downstream Chain)

  • Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]