FASTQ Parser for Reads With UMIs

A Cube that takes in FASTQ reads with UMIs, corrects sequence errors, and stores the sequence and count for downstream use.

Main Parameters

Parameter Name

Directional reads

Hamming distance threshold for clustering UMIs

Read group size threshold

Minimum number of unique UMIs per consensus sequence

Unique molecular identifier extraction pattern


Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Directional reads (directional) type: boolean: If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they are non-directional (UMI could be at either end).
    Default: True
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Hamming distance threshold for clustering UMIs (ed) type: integer:
    Default: 2 , Max: 100
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Read group size threshold (min_seq_group_size) type: integer: Minimum number of sequencing reads per UMI
    Default: 5 , Min: 1
  • Minimum number of unique UMIs per consensus sequence (min_umi_count) type: integer: Sequences are retained that are represented by at least this many UMIs.
    Default: 2 , Min: 1
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Unique molecular identifier extraction pattern (umi_regex) type: string: A regular expression extraction pattern for the unique molecular identifier (UMI). Be sure to include both 5’ and 3’ unique molecular identifiers if they exist. See docs for more information on specifying regex. If non-directional reads, provide regex for one orientation here and regex for reverse complement below. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.
    Default: “”
  • Reverse unique molecular identifier extraction pattern (umi_regex_rev) type: string: For use with non-directional reads only. Ignored if directional is set to True.
    Default: “”

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network