Overlap Among Different NGS Barcode Populations

A cube that takes in two (or more) separate datasets of records and identifies the overlapping region of interest (ROI) Datasets MUST HAVE different barcode groups (use Modify Sample Name / Barcode FLOE to add or modify). NOTE: This cube can overlap to SANGER populations but only indicates SANGER not the well_id information. If Well ID is desired, use the NGS (PacBio/Illumina) and Sanger Pipeline FLOE

Main Parameters

Parameter Name

Edit Distance Method For Overlap Among Different Barcode Groups

Edit Distance for Overlap by ROI of Different Barcode Groups

Keep Only Functional Sequences

Region of Interest For the Overlap

Write the Overlap Output to CSV File


Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Edit Distance Method For Overlap Among Different Barcode Groups (edit_distance_method_overlap) type: string: Indicate the type of edit distance method to apply for the overlap to complete population. NOTE: Only in effect if edit distance does not equal 0
    Default: Levenshstein Distance
    Choices: Hamming Distance, Levenshstein Distance
  • Edit Distance for Overlap by ROI of Different Barcode Groups (edit_distance_overlap) type: integer: If there are multiple downstream barcode groups, these will be compared to one another.
    Default: 0 , Max: 100
  • Keep Only Functional Sequences (filter_functional) type: boolean: Eliminates non-functional sequences, truncations, stop-codons, frame-shifts
    Default: False
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Region of Interest For the Overlap (roi) type: string: Indicate the region of interest (ROI) for identifying regions of overlap among different barcode groups.
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Write the Overlap Output to CSV File (write_to_csv_file) type: boolean: Allows the option to write to CSV after the AbXtract Processing at the cost of additional time. If not, can do this in separate step. Writes to empty file if turned off.
    Default: True

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network