Sequence Annotation with IgMatcher for PacBio

IgMatcher Cube Takes records (dna sequence and count) from an input cube Annotates each chain (PacBio sequencing) Emits annotated chain to a consolidation cube

Main Parameters

Parameter Name
Barcode Cutoff
Heavy Chain CDR1 Annotation Scheme
Heavy Chain CDR2 Annotation Scheme
Heavy Chain CDR3 Annotation Scheme
Length of K-mers for germline identification
Length of query chain
Light Chain CDR1 Annotation Scheme
Light Chain CDR2 Annotation Scheme
Light Chain CDR3 Annotation Scheme
Minimum chain length
Minimum votes for germline assignment
Species Database to Select From

Related Cubes

Parallel Sequence Annotation with IgMatcher for PacBio – parallel version of the cube

Calculation Parameters

Barcode Cutoff (barcode_cutoff) type: decimal: Minimum percentage of nucleotides matching the barcode for a read to assign to a sample

Default: 0.7 , Min: 0.5, Max: 1.0

Barcode Table (barcode_table) type: file_in: File without header, formatted as Name,5’barcode,3’barcode,barcode_round,barcode_group

If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

Full/Partial alignment to annotate CDRs (cdr_method) type: string: Align query sequence to the entire germline or to partial regions to annotate. Full alignments may work better for natural antibodies and partial for synthetic/degenerate antibodies

Default: partial

Choices: partial, full

CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128

Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network

Provide a Customized Amino Acid Annotation File with Alignment Scheme of Interest (for NGS) (custom_annotation_aa) type: file_in: ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library: OPTIONAL for natural antibodies.

NOT typically used for NGS (dna-based), but if provided, will override annotation and species/database selection settings.

Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna) type: file_in: ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies.

If provided, will override annotation and species/database selection settings.

Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592

GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16

Heavy Chain CDR1 Annotation Scheme (heavy_cdr1) type: string: Indicate the annotation scheme for Heavy CDR1

Default: IMGT

Choices: IMGT, KABAT, CHOTHIA

Heavy Chain CDR2 Annotation Scheme (heavy_cdr2) type: string: Indicate the annotation scheme for Heavy CDR2

Default: IMGT

Choices: IMGT, KABAT, CHOTHIA

Heavy Chain CDR3 Annotation Scheme (heavy_cdr3) type: string: Indicate the annotation scheme for Heavy CDR3

Default: IMGT

Choices: IMGT, KABAT, CHOTHIA

Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on

Length of K-mers for germline identification (k) type: integer: Length of K-mers for germline identification (default 9 for DNA). If sequences differ significantly from the germline, smaller (5-7) may help finding the correct result.

Default: 9 , Min: 5, Max: 12

Length of query chain (len_chain_query) type: integer: How many nucleotides in each end of the read to use to query for a antibody chain

Default: 500 , Min: 400, Max: 600

Light Chain CDR1 Annotation Scheme (light_cdr1) type: string: Indicate the annotation scheme for Light CDR1

Default: IMGT

Choices: IMGT, KABAT, CHOTHIA

Light Chain CDR2 Annotation Scheme (light_cdr2) type: string: Indicate the annotation scheme for Light CDR2

Default: KABAT

Choices: IMGT, KABAT, CHOTHIA

Light Chain CDR3 Annotation Scheme (light_cdr3) type: string: Indicate the annotation scheme for Light CDR3

Default: IMGT

Choices: IMGT, KABAT, CHOTHIA

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592

Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300

Minimum chain length (min_len_chain) type: integer: Minimum chain length for it to be considered functional and not truncated

Default: 273

Minimum votes for germline assignment (min_votes) type: integer: Minimum number of matching K-mers for germline assignment. Higher numbers make the algorithm more stringent at the expense of not annotating some sequences (default 100 votes for DNA)

Default: 150

Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address

Default: 64

Species Database to Select From (species) type: string: Species reference database to generate the db for igmatcher, must have value selected even if custom annotation file selected.

Default: [‘Human’]

Choices: Alpaca, Human, Mouse, Rabbit

Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Field parameters

None (log_field) type: Field Type: String: Message log field.

Default: Log Field

None (read_fail_field) type: Field Type: String: Failed Read.

Default: Read Fail

Hardware Parameters

Machine hardware requirements

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address

Default: 64
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592
GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16
CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Metrics Parameters

Cube Metric Parameters

Metric Period (None) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Cube Metrics (None) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network

Parallel Sequence Annotation with IgMatcher for PacBio

The parallel version adds these extra parameters.

Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.

Default: 1 , Min: 1, Max: 65535

Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work item

Default: 10 , Min: 1, Max: 100

Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this Cube

Default: True

Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this Cube

Default: 1000 , Min: 1

Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this Cube

Default: 0