NGS UMIs Extract and Annotation - AbXtract
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Biologics/Antibody Design
Role-based/Bioinformatician
Role-based/Biologist
Product-based/AbXtract
Description
NGS for UMI-guided consensus building and annotation. Sequences represented by at least two different UMIs are retained and the count represents the number of UMIs. Use either an Illumina or PacBio input, not both. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.
Promoted Parameters
Title in user interface (promoted name)
Long-Read (PacBio) FILE Inputs
Long-Read (PacBio) NGS Input FASTQ (pacbio_input_file): Input FASTQ File
Type: file_in
Barcode Table (barcode_table_ngs): File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,
Type: file_in
Species Database to Select From (species_ngs): Species reference database to generate the db for igmatcher, must have value selected even if custom annotation file selected.
Required
Type: string
Default: [‘Human’]
Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]
Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.
Type: file_in
Short-Read (Illumina) FILE Inputs
NGS Input FASTQ Forward (Short-Read/Illumina) (illumina_input_file1): Input FASTQ File
Type: file_in
NGS Input FASTQ Reverse (Short-Read/Illumina) (illumina_input_file2): Input FASTQ File
Type: file_in
Barcode Table (barcode_table_ngs_ill): File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,
Type: file_in
Species Database to Select From (species_illumina): Species reference database to generate the db for igmatcher, must have value selected even if custom annotation file selected.
Required
Type: string
Default: [‘Human’]
Choices: [‘Alpaca’, ‘Human’, ‘Mouse’, ‘Rabbit’]
Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna_igmatcher_illumina): ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies. If provided, will override annotation and species/database selection settings.
Type: file_in
Upstream Long-Read (PacBio) AND/OR Short-Read (Illumina)
Output Basename of the Upstream Long-Read/PacBio Datasets (consolidate_out): This dataset contains all Long-Read/PacBio files processed immediately after IgMatcher
Required
Type: dataset_out
Default: up.long_read
Output Basename of the Upstream Short-Read/Illumina Datasets (consolidate_out_ill): This dataset contains all Short-Read/Illumina files processed immediately after IgMatcher
Required
Type: dataset_out
Default: up.short_read
Failure Output
Failed Dataset Output Name (fout): Contains failed records from both upstream and downstream processes
Required
Type: dataset_out
Default: problematic_ngs_annotation_only
UMI Processing
Unique molecular identifier extraction pattern (umi_regex): An extraction pattern for the unique molecular identifier (UMI), which may be a regular expression or a string using {N, C, X}. Be sure to include both 5’ and 3’ unique molecular identifiers. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as a region to be extracted
Required
Type: string
Default:
Minimum number of unique UMIs per consensus sequence (min_umi_count): Sequences are retained that are represented by at least this many UMIs.
Required
Type: integer
Default: 2
Read group size threshold (min_group_size): Minimum number of sequencing reads per UMI
Required
Type: integer
Default: 5
Directional reads (directional): If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they non-directional (UMI could be at either end).
Required
Type: boolean
Default: False
Choices: [True, False]
Reverse unique molecular identifier extraction pattern (umi_regex_rev): For use with non-directional reads only. Ignored if directional is set to True.
Type: string
Default: