Inputting FASTQ for Illumina
Takes a FASTQ files and quality parameters. Returns a filtered FASTA file.
Main Parameters
Parameter Name |
---|
Assemble FASTQ together |
Output Floe Report Name |
Maximum number of nucleotides for a read to be kept for assembled read (NOTE: defaulted to PacBio, change for illumina (typically 600)) |
Maximum Length |
Minimum length of the assembled read |
Minimum length |
Minimum Quality |
Minimum Quality Pre-Assembled |
Minimum Quality Fraction |
Minimum Quality Fraction, Pre-Assembled |
Parameter Details
Calculation Parameters
Assemble FASTQ together (assembled_fastq) type: boolean: Assemble forward and reverse FASTQ together. Must have enough overlap between reads to work properly.
NOTE: this is not applied to PacBio sequences that take on just a single FASTQ file. Important: NovaSeq 2x150 for VH or VL typically does not assemble properly, so this should be set to False.
Default: True
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Output Floe Report Name (data_out) type: dataset_out: Name of the Floe Report for FASTQ Quality statistics, if desiredDefault: NGS Floe Report Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Input FASTQ 1 (input_ill_1) type: file_in: Input FASTQ File 1 Input FASTQ 2 (input_ill_2) type: file_in: Input FASTQ File 2 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Maximum number of nucleotides for a read to be kept for assembled read (NOTE: defaulted to PacBio, change for illumina (typically 600)) (max_length) type: integer: Maximum Length (max_length_pre_assembled) type: integer: Maximum number of nucleotides for a read to be kept only applies if filtering is performed on forward and reverse reverse in illumina.Default: 0 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Minimum length of the assembled read (min_length) type: integer: Minimum number of nucleotides for a read to be kept for assembled sequence(NOTE: defaulted to PacBio, change for illumina (typically 273)) Minimum length (min_length_pre_assembled) type: integer: Minimum number of nucleotides for a read to be kept only applies if filtering is performed on forward and reverse reverse in illumina.Default: 10 Minimum Quality (min_q) type: integer: Minimum quality score for a base to be considered high quality (NOTE: defaulted to PacBio, change for illumina (typically 25)) Minimum Quality Pre-Assembled (min_q_pre_assembled) type: integer: Minimum quality score of pre-assembled reads to be accepted.Default: 5 Minimum Quality Fraction (min_q_share) type: decimal: Fraction of the total read that has to be high-quality for it to be kept (NOTE: defaulted to PacBio, change for illumina (typically 0.7)) Minimum Quality Fraction, Pre-Assembled (min_q_share_pre_assembled) type: decimal: Fraction of the total read that has to be high-quality in pre-assembled reads.Default: 0.0 Number of Files to Split Into (output_split) type: integer: Indicates the Number of Files NovaSeq will be split into.Default: 999 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required trim front of read 1 (trim_front1) type: integer: How many bases to trim from front for read 1, default = 0Default: 0 trim front of read 2 (trim_front2) type: integer: How many bases to trim from front for read 2, default = 0Default: 0 trim tail of read 1 (trim_tail1) type: integer: How many bases to trim from front for read 1, default = 0Default: 0 trim tail of read 2 (trim_tail2) type: integer: How many bases to trim from front for read 2, default = 0Default: 0 Write Floe Report for FASTQ Statistics (write_floe_report) type: boolean: If Turned ON, will output a Report that Summarizes the quality statsDefault: False Write Floe Report for Both Forward & Reverse Reads (Illumina Only) (write_floe_report_ill) type: boolean: This will produce stats that are filtered on the individual forward (R1) and reverse (R2) readsDefault: False
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network