Inputting FASTQ for PacBio with Compressed Files
Takes a FASTQ file and quality parameters. Returns a filtered FASTA file
Main Parameters
Parameter Name |
---|
Assemble FASTQ together |
Output Floe Report Name |
Input FASTQ 1 |
Maximum number of nucleotides for a read to be kept for assembled read (NOTE: defaulted to PacBio, change for illumina (typically 600)) |
Maximum Length |
Minimum length of the assembled read |
Minimum length |
Minimum Quality |
Minimum Quality Pre-Assembled |
Minimum Quality Fraction |
Minimum Quality Fraction, Pre-Assembled |
Parameter Details
Calculation Parameters
Assemble FASTQ together (assembled_fastq) type: boolean: Assemble forward and reverse FASTQ together. Must have enough overlap between reads to work properly.
NOTE: this is not applied to PacBio sequences that take on just a single FASTQ file. Important: NovaSeq 2x150 for VH or VL typically does not assemble properly, so this should be set to False.
Default: True
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Output Floe Report Name (data_out) type: dataset_out: Name of the Floe Report for FASTQ Quality statistics, if desiredDefault: NGS Floe Report Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Input FASTQ 1 (input_pb1) type: file_in: Input FASTQ File Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Maximum number of nucleotides for a read to be kept for assembled read (NOTE: defaulted to PacBio, change for illumina (typically 600)) (max_length) type: integer: Maximum Length (max_length_pre_assembled) type: integer: Maximum number of nucleotides for a read to be kept only applies if filtering is performed on forward and reverse reverse in illumina.Default: 0 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Minimum length of the assembled read (min_length) type: integer: Minimum number of nucleotides for a read to be kept for assembled sequence(NOTE: defaulted to PacBio, change for illumina (typically 273)) Minimum length (min_length_pre_assembled) type: integer: Minimum number of nucleotides for a read to be kept only applies if filtering is performed on forward and reverse reverse in illumina.Default: 10 Minimum Quality (min_q) type: integer: Minimum quality score for a base to be considered high quality (NOTE: defaulted to PacBio, change for illumina (typically 25)) Minimum Quality Pre-Assembled (min_q_pre_assembled) type: integer: Minimum quality score of pre-assembled reads to be accepted.Default: 5 Minimum Quality Fraction (min_q_share) type: decimal: Fraction of the total read that has to be high-quality for it to be kept (NOTE: defaulted to PacBio, change for illumina (typically 0.7)) Minimum Quality Fraction, Pre-Assembled (min_q_share_pre_assembled) type: decimal: Fraction of the total read that has to be high-quality in pre-assembled reads.Default: 0.0 Number of Files to Split Into (output_split) type: integer: Indicates the Number of Files NovaSeq will be split into. If set to 0 or 1 than splitting will not occurDefault: 999 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required trim front of read 1 (trim_front1) type: integer: How many bases to trim from front for read 1, default = 0Default: 0 trim tail of read 1 (trim_tail1) type: integer: How many bases to trim from front for read 1, default = 0Default: 0 Write Floe Report for FASTQ Statistics (write_floe_report) type: boolean: If Turned ON, will output a Report that Summarizes the quality statsDefault: False
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network