NGS UMIs Extract and Annotation - AbXtract

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Solution-based/Biologics/Antibody Design

Role-based/Bioinformatician

Role-based/Biologist

Product-based/AbXtract

Description

NGS for UMI-guided consensus building and annotation. Sequences represented by at least two different UMIs are retained and the count represents the number of UMIs. Use either an Illumina or PacBio input, not both. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.

Promoted Parameters

Title in user interface (promoted name)

Long-Read (PacBio) FILE Inputs

Long-Read (PacBio) NGS Input FASTQ (pacbio_input_file): Input FASTQ File

Type: file_in

Barcode Table (barcode_table_ngs): File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

Type: file_in

Species Database to Select From (species_ngs): Species reference database to generate the db for igmatcher, must have value selected even if custom annotation file selected.

Required

Type: string

Default: [‘Human’]

Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

Type: file_in

Short-Read (Illumina) FILE Inputs

NGS Input FASTQ Forward (Short-Read/Illumina) (illumina_input_file1): Input FASTQ File

Type: file_in

NGS Input FASTQ Reverse (Short-Read/Illumina) (illumina_input_file2): Input FASTQ File

Type: file_in

Barcode Table (barcode_table_ngs_ill): File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

Type: file_in

Species Database to Select From (species_illumina): Species reference database to generate the db for igmatcher, must have value selected even if custom annotation file selected.

Required

Type: string

Default: [‘Human’]

Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher_illumina): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.

Type: file_in

Upstream Long-Read (PacBio) AND/OR Short-Read (Illumina)

Output Basename of the Upstream Long-Read/PacBio Datasets (consolidate_out): This dataset contains all Long-Read/PacBio files processed immediately after IgMatcher

Required

Type: dataset_out

Default: up.long_read

Output Basename of the Upstream Short-Read/Illumina Datasets (consolidate_out_ill): This dataset contains all Short-Read/Illumina files processed immediately after IgMatcher

Required

Type: dataset_out

Default: up.short_read

Failure Output

Failed Dataset Output Name (fout): Contains failed records from both upstream and downstream processes

Required

Type: dataset_out

Default: problematic_ngs_annotation_only

UMI Processing

Unique molecular identifier extraction pattern (umi_regex): An extraction pattern for the unique molecular identifier (UMI), which may be a regular expression or a string using {N, C, X}. Be sure to include both 5’ and 3’ unique molecular identifiers. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as a region to be extracted

Required

Type: string

Default:

Minimum number of unique UMIs per consensus sequence (min_umi_count): Sequences are retained that are represented by at least this many UMIs.

Required

Type: integer

Default: 2

Read group size threshold (min_group_size): Minimum number of sequencing reads per UMI

Required

Type: integer

Default: 5

Directional reads (directional): If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they non-directional (UMI could be at either end).

Required

Type: boolean

Default: False

Choices: [True, False]

Reverse unique molecular identifier extraction pattern (umi_regex_rev): For use with non-directional reads only. Ignored if directional is set to True.

Type: string

Default: