FASTQ Parser for Reads With UMIs

A Cube that takes in FASTQ reads with UMIs, corrects sequence errors, and stores the sequence and count for downstream use.

Main Parameters

Parameter Name

Directional reads

Hamming distance threshold for clustering UMIs

Read group size threshold

Minimum number of unique UMIs per consensus sequence

UMI extraction method

Unique molecular identifier extraction pattern


Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Directional reads (directional) type: boolean: If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they non-directional (UMI could be at either end).
    Default: False
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Hamming distance threshold for clustering UMIs (ed) type: integer:
    Default: 2 , Max: 100
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Read group size threshold (min_seq_group_size) type: integer: Minimum number of sequencing reads per UMI
    Default: 5 , Min: 1
  • Minimum number of unique UMIs per consensus sequence (min_umi_count) type: integer: Sequences are retained that are represented by at least this many UMIs.
    Default: 2 , Min: 1
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • UMI extraction method (umi_extract_method) type: string: Method to use with regular expression string to extract UMI
    Default: string
    Choices: regex, string
  • Unique molecular identifier extraction pattern (umi_regex) type: string: An extraction pattern for the unique molecular identifier (UMI), which may be a regular expression or a string using {N, C, X}. Be sure to include both 5’ and 3’ unique molecular identifiers. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as a region to be extracted
    Default: “”
  • Reverse unique molecular identifier extraction pattern (umi_regex_rev) type: string: For use with non-directional reads only. Ignored if directional is set to True.
    Default: “”

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network