Sequence Annotation with IgMatcher for PacBio¶
IgMatcher Cube Takes records (dna sequence and count) from an input cube Annotates each chain (PacBio sequencing) Emits annotated chain to a consolidation cube
Main Parameters¶
Parameter Name |
---|
Barcode Cutoff |
Heavy Chain CDR1 Annotation Scheme |
Heavy Chain CDR2 Annotation Scheme |
Heavy Chain CDR3 Annotation Scheme |
Length of K-mers for germline identification |
Length of query chain |
Light Chain CDR1 Annotation Scheme |
Light Chain CDR2 Annotation Scheme |
Light Chain CDR3 Annotation Scheme |
Minimum chain length |
Minimum votes for germline assignment |
Species Database to Select From |
Parameter Details¶
Calculation Parameters¶
Barcode Cutoff (barcode_cutoff) type: decimal: Minimum percentage of nucleotides matching the barcode for a read to assign to a sampleDefault: 0.7 , Min: 0.5, Max: 1.0 Barcode Table (barcode_table) type: file_in: XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group
Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,
Full/Partial alignment to annotate CDRs (cdr_method) type: string: Align query sequence to the entire germline or to partial regions to annotate. Full alignments may work better for natural antibodies and partial for synthetic/degenerate antibodiesDefault: partialChoices: partial, full CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Provide a Customized Amino Acid Annotation File with Alignment Scheme of Interest (for NGS) (custom_annotation_aa) type: file_in: ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library: OPTIONAL for natural antibodies.
NOT typically used for NGS (dna-based), but if provided, will override annotation and species/database selection settings.
Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna) type: file_in: ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies.
If provided, will override annotation and species/database selection settings.
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Heavy Chain CDR1 Annotation Scheme (heavy_cdr1) type: string: Indicate the annotation scheme for Heavy CDR1Default: IMGTChoices: IMGT, KABAT, CHOTHIA Heavy Chain CDR2 Annotation Scheme (heavy_cdr2) type: string: Indicate the annotation scheme for Heavy CDR2Default: IMGTChoices: IMGT, KABAT, CHOTHIA Heavy Chain CDR3 Annotation Scheme (heavy_cdr3) type: string: Indicate the annotation scheme for Heavy CDR3Default: IMGTChoices: IMGT, KABAT, CHOTHIA Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Length of K-mers for germline identification (k) type: integer: Length of K-mers for germline identification (default 9 for DNA). If sequences differ significantly from the germline, smaller (5-7) may help finding the correct result.Default: 9 , Min: 5, Max: 12 Length of query chain (len_chain_query) type: integer: How many nucleotides in each end of the read to use to query for a antibody chainDefault: 500 , Min: 400, Max: 600 Light Chain CDR1 Annotation Scheme (light_cdr1) type: string: Indicate the annotation scheme for Light CDR1Default: IMGTChoices: IMGT, KABAT, CHOTHIA Light Chain CDR2 Annotation Scheme (light_cdr2) type: string: Indicate the annotation scheme for Light CDR2Default: KABATChoices: IMGT, KABAT, CHOTHIA Light Chain CDR3 Annotation Scheme (light_cdr3) type: string: Indicate the annotation scheme for Light CDR3Default: IMGTChoices: IMGT, KABAT, CHOTHIA Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Minimum chain length (min_len_chain) type: integer: Minimum chain length for it to be considered functional and not truncatedDefault: 273 Minimum votes for germline assignment (min_votes) type: integer: Minimum number of matching K-mers for germline assignment. Higher numbers make the algorithm more stringent at the expense of not annotating some sequences (default 100 votes for DNA)Default: 150 Species Database to Select From (species) type: string: Species reference database to generate the db for igmatcherDefault: [‘Human’]Choices: Alpaca, Human, Mouse, Rabbit Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Field parameters¶
None (log_field) type: Field Type: String: Message log field.Default: Log Field None (read_fail_field) type: Field Type: String: Failed Read.Default: Read Fail
Hardware Parameters¶
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network
Parallel Sequence Annotation with IgMatcher for PacBio
The parallel version adds these extra parameters.
Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.Default: 1 , Min: 1, Max: 65535 Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work itemDefault: 10 , Min: 1, Max: 100 Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this CubeDefault: True Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this CubeDefault: 1000 , Min: 1 Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this CubeDefault: 0