NGS UMIs Extract and Annotation - AbXtract

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Biologics/Antibody Design

  • Role-based/Bioinformatician

  • Role-based/Biologist

  • Product-based/AbXtract

Description

NGS for UMI-guided consensus building and annotation. Sequences represented by at least two different UMIs are retained and the count represents the number of UMIs. Use either an Illumina or PacBio input, not both. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.

Promoted Parameters

Title in user interface (promoted name)

Long-Read (PacBio) FILE Inputs

Long-Read (PacBio) NGS Input FASTQ (pacbio_input_file): Input FASTQ File

  • Type: file_in

Barcode Table (barcode_table_ngs): File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Type: file_in

Species Database to Select From (species_ngs): Species reference database to generate the db for igmatcher

  • Required

  • Type: string

  • Default: [‘Human’]

  • Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

  • Type: file_in

Short-Read (Illumina) FILE Inputs

NGS Input FASTQ Forward (Short-Read/Illumina) (illumina_input_file1): Input FASTQ File

  • Type: file_in

NGS Input FASTQ Reverse (Short-Read/Illumina) (illumina_input_file2): Input FASTQ File

  • Type: file_in

Barcode Table (barcode_table_ngs_ill): File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Type: file_in

Species Database to Select From (species_illumina): Species reference database to generate the db for igmatcher

  • Required

  • Type: string

  • Default: [‘Human’]

  • Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher_illumina): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

  • Type: file_in

Upstream Long-Read (PacBio) AND/OR Short-Read (Illumina)

Output Basename of the Upstream Long-Read/PacBio Datasets (consolidate_out): This dataset contains all Long-Read/PacBio files processed immediately after IgMatcher

  • Required

  • Type: dataset_out

  • Default: up.long_read

Output Basename of the Upstream Short-Read/Illumina Datasets (consolidate_out_ill): This dataset contains all Short-Read/Illumina files processed immediately after IgMatcher

  • Required

  • Type: dataset_out

  • Default: up.short_read

Failure Output

Failed Dataset Output Name (fout): Contains failed records from both upstream and downstream processes

  • Required

  • Type: dataset_out

  • Default: problematic_ngs_annotation_only

UMI Processing

Unique molecular identifier extraction pattern (umi_regex): An extraction pattern for the unique molecular identifier (UMI), which may be a regular expression or a string using {N, C, X}. Be sure to include both 5’ and 3’ unique molecular identifiers. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as a region to be extracted

  • Required

  • Type: string

  • Default:

Minimum number of unique UMIs per consensus sequence (min_umi_count): Sequences are retained that are represented by at least this many UMIs.

  • Required

  • Type: integer

  • Default: 2

Read group size threshold (min_group_size): Minimum number of sequencing reads per UMI

  • Required

  • Type: integer

  • Default: 5

Directional reads (directional): If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they non-directional (UMI could be at either end).

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Reverse unique molecular identifier extraction pattern (umi_regex_rev): For use with non-directional reads only. Ignored if directional is set to True.

  • Type: string

  • Default: