Sequence Annotation with IgMatcher for PacBio

IgMatcher Cube Takes records (dna sequence and count) from an input cube Annotates each chain (PacBio sequencing) Emits annotated chain to a consolidation cube

Main Parameters

Parameter Name

Barcode Cutoff

Heavy Chain CDR1 Annotation Scheme

Heavy Chain CDR2 Annotation Scheme

Heavy Chain CDR3 Annotation Scheme

Length of K-mers for germline identification

Length of query chain

Light Chain CDR1 Annotation Scheme

Light Chain CDR2 Annotation Scheme

Light Chain CDR3 Annotation Scheme

Minimum chain length

Minimum votes for germline assignment

Species Database to Select From


Calculation Parameters

  • Barcode Cutoff (barcode_cutoff) type: decimal: Minimum percentage of nucleotides matching the barcode for a read to assign to a sample
    Default: 0.7 , Min: 0.5, Max: 1.0
  • Barcode Table (barcode_table) type: file_in: XLS/CSV/TSV file containing barcodes in the format Name,5’barcode,3’barcode,barcode_round(e.g., early/late),barcode_group

Do not include header. If you just have a 5’ barcode write name,5’barcode,,, If you just have a 3’ barcode write name,,3’barcode,,,

  • Full/Partial alignment to annotate CDRs (cdr_method) type: string: Align query sequence to the entire germline or to partial regions to annotate. Full alignments may work better for natural antibodies and partial for synthetic/degenerate antibodies
    Default: partial
    Choices: partial, full
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Provide a Customized Amino Acid Annotation File with Alignment Scheme of Interest (for NGS) (custom_annotation_aa) type: file_in: ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library: OPTIONAL for natural antibodies.

NOT typically used for NGS (dna-based), but if provided, will override annotation and species/database selection settings.

  • Provide a Customized DNA Annotation File with Alignment Scheme of Interest (for NGS). (custom_annotation_dna) type: file_in: ONLY REQUIRED for custom scaffolds like the Specifica Gen3 Library or codon optimized sequences: OPTIONAL for natural antibodies.

If provided, will override annotation and species/database selection settings.

  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Heavy Chain CDR1 Annotation Scheme (heavy_cdr1) type: string: Indicate the annotation scheme for Heavy CDR1
    Default: IMGT
    Choices: IMGT, KABAT, CHOTHIA
  • Heavy Chain CDR2 Annotation Scheme (heavy_cdr2) type: string: Indicate the annotation scheme for Heavy CDR2
    Default: IMGT
    Choices: IMGT, KABAT, CHOTHIA
  • Heavy Chain CDR3 Annotation Scheme (heavy_cdr3) type: string: Indicate the annotation scheme for Heavy CDR3
    Default: IMGT
    Choices: IMGT, KABAT, CHOTHIA
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Length of K-mers for germline identification (k) type: integer: Length of K-mers for germline identification (default 9 for DNA). If sequences differ significantly from the germline, smaller (5-7) may help finding the correct result.
    Default: 9 , Min: 5, Max: 12
  • Length of query chain (len_chain_query) type: integer: How many nucleotides in each end of the read to use to query for a antibody chain
    Default: 500 , Min: 400, Max: 600
  • Light Chain CDR1 Annotation Scheme (light_cdr1) type: string: Indicate the annotation scheme for Light CDR1
    Default: IMGT
    Choices: IMGT, KABAT, CHOTHIA
  • Light Chain CDR2 Annotation Scheme (light_cdr2) type: string: Indicate the annotation scheme for Light CDR2
    Default: KABAT
    Choices: IMGT, KABAT, CHOTHIA
  • Light Chain CDR3 Annotation Scheme (light_cdr3) type: string: Indicate the annotation scheme for Light CDR3
    Default: IMGT
    Choices: IMGT, KABAT, CHOTHIA
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Minimum chain length (min_len_chain) type: integer: Minimum chain length for it to be considered functional and not truncated
    Default: 273
  • Minimum votes for germline assignment (min_votes) type: integer: Minimum number of matching K-mers for germline assignment. Higher numbers make the algorithm more stringent at the expense of not annotating some seuqences (default 100 votes for DNA)
    Default: 150
  • Species Database to Select From (species) type: string: Species reference database to generate the db for igmatcher
    Default: [‘Human’]
    Choices: Alpaca, Human, Mouse, Rabbit
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Field parameters

  • None (log_field) type: Field Type: String: Message log field.
    Default: Log Field
  • None (read_fail_field) type: Field Type: String: Failed Read.
    Default: Read Fail

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network

Parallel Sequence Annotation with IgMatcher for PacBio

The parallel version adds these extra parameters.

  • Number of messages to distribute at a time (item_count) type: integer: The maximum number of messages to bundle together for a parallel cube.
    Default: 1 , Min: 1, Max: 65535
  • Maximum Failures (max_failures) type: integer: The maximum number of times to attempt processing a work item
    Default: 10 , Min: 1, Max: 100
  • Autoscale this Cube (autoscale) type: boolean: If True, let Orion manage the parallelism of this Cube
    Default: True
  • Maximum number of Cubes (max_parallel) type: integer: The maximum number of concurrently running copies of this Cube
    Default: 1000 , Min: 1
  • Minimum number of Cubes (min_parallel) type: integer: The minimum number of concurrently running copies of this Cube
    Default: 0