Inputting FASTQ for PacBio with Compressed Files

Takes a FASTQ file and quality parameters. Returns a filtered FASTA file

Main Parameters

Parameter Name

Assemble FASTQ together

Output Floe Report Name

Input FASTQ 1

Maximum number of nucleotides for a read to be kept for assembled read (NOTE: defaulted to PacBio, change for illumina (typically 600))

Maximum Length

Minimum length of the assembled read

Minimum length

Minimum Quality

Minimum Quality Pre-Assembled

Minimum Quality Fraction

Minimum Quality Fraction, Pre-Assembled


Parameter Details

Calculation Parameters

  • Assemble FASTQ together (assembled_fastq) type: boolean: Assemble forward and reverse FASTQ together. Must have enough overlap between reads to work properly.

NOTE: this is not applied to PacBio sequences that take on just a single FASTQ file. Important: NovaSeq 2x150 for VH or VL typically does not assemble properly, so this should be set to False.

Default: True
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Output Floe Report Name (data_out) type: dataset_out: Name of the Floe Report for FASTQ Quality statistics, if desired
    Default: NGS Floe Report
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Input FASTQ 1 (input_pb1) type: file_in: Input FASTQ File
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Maximum number of nucleotides for a read to be kept for assembled read (NOTE: defaulted to PacBio, change for illumina (typically 600)) (max_length) type: integer:
  • Maximum Length (max_length_pre_assembled) type: integer: Maximum number of nucleotides for a read to be kept only applies if filtering is performed on forward and reverse reverse in illumina.
    Default: 0
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Minimum length of the assembled read (min_length) type: integer: Minimum number of nucleotides for a read to be kept for assembled sequence(NOTE: defaulted to PacBio, change for illumina (typically 273))
  • Minimum length (min_length_pre_assembled) type: integer: Minimum number of nucleotides for a read to be kept only applies if filtering is performed on forward and reverse reverse in illumina.
    Default: 10
  • Minimum Quality (min_q) type: integer: Minimum quality score for a base to be considered high quality (NOTE: defaulted to PacBio, change for illumina (typically 25))
  • Minimum Quality Pre-Assembled (min_q_pre_assembled) type: integer: Minimum quality score of pre-assembled reads to be accepted.
    Default: 5
  • Minimum Quality Fraction (min_q_share) type: decimal: Fraction of the total read that has to be high-quality for it to be kept (NOTE: defaulted to PacBio, change for illumina (typically 0.7))
  • Minimum Quality Fraction, Pre-Assembled (min_q_share_pre_assembled) type: decimal: Fraction of the total read that has to be high-quality in pre-assembled reads.
    Default: 0.0
  • Number of Files to Split Into (output_split) type: integer: Indicates the Number of Files NovaSeq will be split into. If set to 0 or 1 than splitting will not occur
    Default: 999
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • trim front of read 1 (trim_front1) type: integer: How many bases to trim from front for read 1, default = 0
    Default: 0
  • trim tail of read 1 (trim_tail1) type: integer: How many bases to trim from front for read 1, default = 0
    Default: 0
  • Write Floe Report for FASTQ Statistics (write_floe_report) type: boolean: If Turned ON, will output a Report that Summarizes the quality stats
    Default: False

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network