NGS UMIs Extract and Annotation - AbXtract

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • AbXtract


NGS for UMI-guided consensus building and annotation. Sequences represented by at least two different UMIs are retained and the count represents the number of UMIs. Use either an Illumina or PacBio input, not both. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.

Promoted Parameters

Title in user interface (promoted name)

Long-Read (PacBio) FILE Inputs

Long-Read (PacBio) NGS Input FASTQ (pacbio_input_file): Input FASTQ File

  • Type: file_in

Barcode Table (barcode_table_ngs): XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Type: file_in

Species Database to Select From (species_ngs): Species reference database to generate the db for igmatcher

  • Required

  • Type: string

  • Default: [‘Human’]

  • Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

  • Type: file_in

Short-Read (Illumina) FILE Inputs

NGS Input FASTQ Forward (Short-Read/Illumina) (illumina_input_file1): Input FASTQ File

  • Type: file_in

NGS Input FASTQ Reverse (Short-Read/Illumina) (illumina_input_file2): Input FASTQ File

  • Type: file_in

Barcode Table (barcode_table_ngs_ill): XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Type: file_in

Species Database to Select From (species_illumina): Species reference database to generate the db for igmatcher

  • Required

  • Type: string

  • Default: [‘Human’]

  • Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher_illumina): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

  • Type: file_in

Upstream Long-Read (PacBio) AND/OR Short-Read (Illumina)

Output Basename of the Upstream Long-Read/PacBio Datasets (consolidate_out): This dataset contains all Long-Read/PacBio files processed immediately after IgMatcher

  • Required

  • Type: dataset_out

  • Default: up.long_read

Output Basename of the Upstream Short-Read/Illumina Datasets (consolidate_out_ill): This dataset contains all Short-Read/Illumina files processed immediately after IgMatcher

  • Required

  • Type: dataset_out

  • Default: up.short_read

Failure Output

Failed Dataset Output Name (fout): Contains failed records from both upstream and downstream processes

  • Required

  • Type: dataset_out

  • Default: problematic_ngs_annotation_only

UMI Processing

Unique molecular identifier extraction pattern (umi_regex): A regular expression extraction pattern for the unique molecular identifier (UMI). Be sure to include both 5’ and 3’ unique molecular identifiers if they exist. See docs for more information on specifying regex. If non-directional reads, provide regex for one orientation here and regex for reverse complement below. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.

  • Required

  • Type: string

  • Default:

Minimum number of unique UMIs per consensus sequence (min_umi_count): Sequences are retained that are represented by at least this many UMIs.

  • Required

  • Type: integer

  • Default: 2

Read group size threshold (min_group_size): Minimum number of sequencing reads per UMI

  • Required

  • Type: integer

  • Default: 5

Directional reads (directional): If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they are non-directional (UMI could be at either end).

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Reverse unique molecular identifier extraction pattern (umi_regex_rev): For use with non-directional reads only. Ignored if directional is set to True.

  • Type: string

  • Default: