FASTQ Parser for Reads With UMIs¶
A Cube that takes in FASTQ reads with UMIs, corrects sequence errors, and stores the sequence and count for downstream use.
Main Parameters¶
Parameter Name |
---|
Directional reads |
Hamming distance threshold for clustering UMIs |
Read group size threshold |
Minimum number of unique UMIs per consensus sequence |
Unique molecular identifier extraction pattern |
Parameter Details¶
Calculation Parameters¶
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Directional reads (directional) type: boolean: If True, reads are oriented 5’ to 3’ with respect to the UMI extraction pattern. If False, they are non-directional (UMI could be at either end).Default: True Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Hamming distance threshold for clustering UMIs (ed) type: integer:Default: 2 , Max: 100 GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Read group size threshold (min_seq_group_size) type: integer: Minimum number of sequencing reads per UMIDefault: 5 , Min: 1 Minimum number of unique UMIs per consensus sequence (min_umi_count) type: integer: Sequences are retained that are represented by at least this many UMIs.Default: 2 , Min: 1 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required Unique molecular identifier extraction pattern (umi_regex) type: string: A regular expression extraction pattern for the unique molecular identifier (UMI). Be sure to include both 5’ and 3’ unique molecular identifiers if they exist. See docs for more information on specifying regex. If non-directional reads, provide regex for one orientation here and regex for reverse complement below. If you would like to demultiplex samples using a barcode table, DO NOT mark the sample barcode in the UMI extraction pattern as part of the UMI.Default: “” Reverse unique molecular identifier extraction pattern (umi_regex_rev) type: string: For use with non-directional reads only. Ignored if directional is set to True.Default: “”
Hardware Parameters¶
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network