NGS Pipeline - AbXtract¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Biologics/Antibody Design
Role-based/Bioinformatician
Role-based/Biologist
Product-based/AbXtract
Description
Process NGS, FASTQ or Datasets, for annotation, demultiplexing, relative abundance, enrichment, clustering and overlap assessment.
Promoted Parameters
Title in user interface (promoted name)
Long-Read (PacBio) FILE Inputs
NGS Input FASTQ (Long-Read/PacBio) (pacbio_input_file): Input FASTQ File
Type: file_in
Barcode Table (barcode_table_ngs): XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,
Type: file_in
Species Database to Select From (species_ngs): Species reference database to generate the db for igmatcher
Required
Type: string
Default: [‘Human’]
Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]
Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.
Type: file_in
Short-Read (Illumina) FILE Inputs
NGS Input FASTQ Forward (Short-Read/Illumina) (illumina_input_file1): Input FASTQ File
Type: file_in
NGS Input FASTQ Reverse (Short-Read/Illumina) (illumina_input_file2): Input FASTQ File
Type: file_in
Barcode Table (barcode_table_ngs_ill): XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,
Type: file_in
Species Database to Select From (species_ill): Species reference database to generate the db for igmatcher
Required
Type: string
Default: [‘Human’]
Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]
Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher_ill): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.
Type: file_in
Optional DATASET Inputs from NGS
Optional NGS Dataset for Input (typically upstream processed datasets) (optional_input): The optional dataset(s) to read records from
Type: data_source
Key Downstream Parameters
Region of Interest For Enrichment and Clustering (roi_cluster): Indicate the region of interest for processing, only top representative full-length sequence will be kept IF INPUT IS ILLUMINA WILL ONLY USE CDR3 (CHAIN_1/UPSTREAM CHAIN) FOR ENRICHMENT, RELATIVE ABUNDANCE, AND CLUSTERING.
Required
Type: string
Default: HCDR3 and LCDR3
Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]
Clustering Type (cluster_type_ngs): Cluster type to apply to sequencing dataset
Required
Type: string
Default: AbScan
Choices: [‘AbScan’, ‘Unique Only’, ‘Levenshtein Distance’, ‘Hamming Distance’]
Keep Only Functional Sequences (filter_functional): Eliminates non-functional sequences, truncations, stop-codons, frame-shifts
Required
Type: boolean
Default: True
Choices: [True, False]
Exclude Values That Did Not Match In-Line Barcode (exclude_unknown): If True, will exclude unknown values that did not have a barcode match, unless there is only one barcode for the entire NGS population.
Required
Type: boolean
Default: True
Choices: [True, False]
Key Liability Parameters
Polyspecificity Liabilities (liability_choices_poly): polyspecificity liabilities to quantify
Type: string
Default: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’, ‘YY - Polyspecificity’]
Choices: [‘Three Consecutive Aromatics - Polyspecificity’, ‘RR - Polyspecificity’, ‘VG - Polyspecificity’, ‘VV - Polyspecificity’, ‘YY - Polyspecificity’, ‘WW - Polyspecificity’, ‘GGG - Polyspecificity’, ‘WXW - Polyspecificity’]
Deamidation Liabilities (liability_choices_deam): deamidation liabilities to quantify
Type: string
Default: [‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]
Choices: [‘N[GSTN] - Deamidation’, ‘NG - Deamidation’, ‘NS - Deamidation’, ‘NT - Deamidation’, ‘NN - Deamidation’, ‘GN[FYTG] - Deamidation’, ‘GNF - Deamidation’, ‘GNY - Deamidation’, ‘GNT - Deamidation’, ‘GNG - Deamidation’, ‘QG - Glutamine Deamidation’]
Glycosylation Liabilities (liability_choices_glyc): glycosylation liabilities to quantify
Type: string
Default: [‘NXT/S - Glycosylation’]
Choices: [‘NXT/S - Glycosylation’, ‘NXT - Glycosylation’, ‘NXS - Glycosylation’]
Hydrolysis Liabilities (liability_choices_hydrolysis): hydrolysis liabilities to quantify
Type: string
Default: [‘DP - Hydrolysis’]
Choices: [‘DP - Hydrolysis’]
Isomerization Liabilities (liability_choices_iso): isomerization liabilities to quantify
Type: string
Default: [‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]
Choices: [‘D[GSD] - Isomerization’, ‘DG - Isomerization’, ‘DS - Isomerization’, ‘DD - Isomerization’]
Biophysical Liabilities (liability_choices_charge): Net charge or hydropathy liabilities to quantify
Type: string
Default: [‘Charge (>1)’]
Choices: [‘Charge (>-1)’, ‘Charge (>0)’, ‘Charge (>1)’, ‘Charge (>2)’, ‘Charge (>3)’, ‘Charge (>4)’, ‘Parker Hydropathy (<0.0)’, ‘Parker Hydropathy (<-0.1)’, ‘Parker Hydropathy (<-0.2)’, ‘Parker Hydropathy (<-0.3)’, ‘Parker Hydropathy (<-0.4)’, ‘Parker Hydropathy (<-0.5)’, ‘Parker Hydropathy (<-0.6)’, ‘Parker Hydropathy (<-0.7)’, ‘Parker Hydropathy (<-0.8)’, ‘Parker Hydropathy (<-0.9)’, ‘Parker Hydropathy (<-1.0)’, ‘Parker Hydropathy (<-2.0)’, ‘Parker Hydropathy (<-3.0)’, ‘Parker Hydropathy (<-4.0)’, ‘Parker Hydropathy (<-5.0)’]
Cysteine Liabilities (liability_choices_cysteine): cysteine-based liabilities to quantify
Type: string
Default: [‘Unpaired Cysteine’]
Choices: [‘Unpaired Cysteine’, ‘Any Cysteine’]
Upstream Long-Read (PacBio) OR Short-Read (Illumina)
Output Basename of the Long-Read/PacBio Upstream Datasets (consolidate_out): This dataset contains all NGS files processed immediately after IgMatcher and before downstream processing (if barcode used, will use the sample name)
Required
Type: dataset_out
Default: up.long_read
Output Basename of the Short-Read/Illumina Upstream Datasets (consolidate_out_ill): This dataset contains all NGS files processed immediately after IgMatcher and before downstream processing (if barcode used, will use the sample name)
Required
Type: dataset_out
Default: up.short_read
Downstream Output Names
Output Name of the Consolidated Dataset (output_out): This dataset will contain a consolidated dataset where all sample names and barcode groups belong to same dataset
Required
Type: dataset_out
Default: down.consolidated
Output Basename of the Downstream Long-Read/PacBio Datasets (process_out): All records are written to downstream datasets according to group name, with this base name appended to output
Required
Type: dataset_out
Default: down.long_read
Output Basename of the Downstream Short-Read/Illumina Datasets (process_out_ill): All records are written to downstream datasets according to group name, with this base name appended to output
Required
Type: dataset_out
Default: down.short_read
Output CSV Filename (file_out_csv): All records are written to downstream csv file, must contain the *.csv extension
Required
Type: file_out
Default: down.consolidated.csv
Failure Output
Failed Dataset Output Name (fout): Contains failed records from both upstream and downstream processes
Required
Type: dataset_out
Default: problematic.ngs_abxtract_process