NGS Pipeline - AbXtract

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Biologics/Antibody Design

  • Role-based/Bioinformatician

  • Role-based/Biologist

  • Product-based/AbXtract

Description

Process NGS, FASTQ or Datasets, for annotation, demultiplexing, relative abundance, enrichment, clustering and overlap assessment.

Promoted Parameters

Title in user interface (promoted name)

Long-Read (PacBio) FILE Inputs

NGS Input FASTQ (Long-Read/PacBio) (pacbio_input_file): Input FASTQ File

  • Type: file_in

Barcode Table (barcode_table_ngs): XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Type: file_in

Species Database to Select From (species_ngs): Species reference database to generate the db for igmatcher

  • Required

  • Type: string

  • Default: [‘Human’]

  • Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

  • Type: file_in

Short-Read (Illumina) FILE Inputs

NGS Input FASTQ Forward (Short-Read/Illumina) (illumina_input_file1): Input FASTQ File

  • Type: file_in

NGS Input FASTQ Reverse (Short-Read/Illumina) (illumina_input_file2): Input FASTQ File

  • Type: file_in

Barcode Table (barcode_table_ngs_ill): XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Type: file_in

Species Database to Select From (species_ill): Species reference database to generate the db for igmatcher

  • Required

  • Type: string

  • Default: [‘Human’]

  • Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher_ill): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

  • Type: file_in

Optional DATASET Inputs from NGS

Optional NGS Dataset for Input (typically upstream processed datasets) (optional_input): The optional dataset(s) to read records from

  • Type: data_source

Key Downstream Parameters

Region of Interest For Enrichment and Clustering (roi_cluster): Indicate the region of interest for processing, only top representative full-length sequence will be kept IF INPUT IS ILLUMINA WILL ONLY USE CDR3 (CHAIN_1/UPSTREAM CHAIN) FOR ENRICHMENT, RELATIVE ABUNDANCE, AND CLUSTERING.

  • Required

  • Type: string

  • Default: HCDR3 and LCDR3

  • Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]

Clustering Type (cluster_type_ngs): Cluster type to apply to sequencing dataset

  • Required

  • Type: string

  • Default: AbScan

  • Choices: [‘AbScan’, ‘Unique Only’, ‘Levenshtein Distance’, ‘Hamming Distance’]

Keep Only Functional Sequences (filter_functional): Eliminates non-functional sequences, truncations, stop-codons, frame-shifts

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Exclude Values That Did Not Match In-Line Barcode (exclude_unknown): If True, will exclude unknown values that did not have a barcode match, unless there is only one barcode for the entire NGS population.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Key Liability Parameters

Polyspecificity Liabilities (liability_choices_poly): polyspecificity liabilities to quantify

  • Type: string

  • Default: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’, ‘YY - Polyspecificity’]

  • Choices: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘YY - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’]

Deamidation Liabilities (liability_choices_deam): deamidation liabilities to quantify

  • Type: string

  • Default: [‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]

  • Choices: [‘N[GSTN] - Deamidation’, ‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GN[FYTG] - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]

Glycosylation Liabilities (liability_choices_glyc): glycosylation liabilities to quantify

  • Type: string

  • Default: [‘NXT/S - Glycosylation’]

  • Choices: [‘NXT/S - Glycosylation’, ‘NXT - Glycosylation’, ‘NXS - Glycosylation’]

Hydrolysis Liabilities (liability_choices_hydrolysis): hydrolysis liabilities to quantify

  • Type: string

  • Default: [‘DP - Hydrolysis’]

  • Choices: [‘DP - Hydrolysis’]

Isomerization Liabilities (liability_choices_iso): isomerization liabilities to quantify

  • Type: string

  • Default: [‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]

  • Choices: [‘D[GSD] - Isomerization’, ‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]

Biophysical Liabilities (liability_choices_charge): Net charge or hydropathy liabilities to quantify

  • Type: string

  • Default: [‘Charge (>1)’]

  • Choices: [‘Charge (>-1)’, ‘Charge (>0)’, ‘Charge (>1)’, ‘Charge (>2)’, ‘Charge (>3)’, ‘Charge (>4)’, ‘Parker Hydropathy (<0.0)’, ‘Parker Hydropathy (<-0.1)’, ‘Parker Hydropathy (<-0.2)’, ‘Parker Hydropathy (<-0.3)’, ‘Parker Hydropathy (<-0.4)’, ‘Parker Hydropathy (<-0.5)’, ‘Parker Hydropathy (<-0.6)’, ‘Parker Hydropathy (<-0.7)’, ‘Parker Hydropathy (<-0.8)’, ‘Parker Hydropathy (<-0.9)’, ‘Parker Hydropathy (<-1.0)’, ‘Parker Hydropathy (<-2.0)’, ‘Parker Hydropathy (<-3.0)’, ‘Parker Hydropathy (<-4.0)’, ‘Parker Hydropathy (<-5.0)’]

Cysteine Liabilities (liability_choices_cysteine): cysteine-based liabilities to quantify

  • Type: string

  • Default: [‘Unpaired Cysteine’]

  • Choices: [‘Unpaired Cysteine’, ‘Any Cysteine’]

Upstream Long-Read (PacBio) OR Short-Read (Illumina)

Output Basename of the Long-Read/PacBio Upstream Datasets (consolidate_out): This dataset contains all NGS files processed immediately after IgMatcher and before downstream processing (if barcode used, will use the sample name)

  • Required

  • Type: dataset_out

  • Default: up.long_read

Output Basename of the Short-Read/Illumina Upstream Datasets (consolidate_out_ill): This dataset contains all NGS files processed immediately after IgMatcher and before downstream processing (if barcode used, will use the sample name)

  • Required

  • Type: dataset_out

  • Default: up.short_read

Downstream Output Names

Output Name of the Consolidated Dataset (output_out): This dataset will contain a consolidated dataset where all sample names and barcode groups belong to same dataset

  • Required

  • Type: dataset_out

  • Default: down.consolidated

Output Basename of the Downstream Long-Read/PacBio Datasets (process_out): All records are written to downstream datasets according to group name, with this base name appended to output

  • Required

  • Type: dataset_out

  • Default: down.long_read

Output Basename of the Downstream Short-Read/Illumina Datasets (process_out_ill): All records are written to downstream datasets according to group name, with this base name appended to output

  • Required

  • Type: dataset_out

  • Default: down.short_read

Output CSV Filename (file_out_csv): All records are written to downstream csv file, must contain the *.csv extension

  • Required

  • Type: file_out

  • Default: down.consolidated.csv

Failure Output

Failed Dataset Output Name (fout): Contains failed records from both upstream and downstream processes

  • Required

  • Type: dataset_out

  • Default: problematic.ngs_abxtract_process