Consolidating and Writing Datasets, PacBio

Cube to consolidate and write results Received records from any upstream or downstream cube. Consolidate all input into different sample_names (upstream) or barcode_group (downstream). Consolidate all input into chain amino acid sequence. Generates FLOE report at the end.

Main Parameters

Parameter Name

Output Dataset Name

Provides Report of the Selected Antibody Leads

Metrics to Assess Sanger in Presence of NGS

Are these already processed records?

Is This A Downstream Processed File?

Is This A Sanger Processed File?

Split by cluster? Only applies to downstream records.

ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data)

Write Records to Dataset

Write Barcode Group to Their Own Dataset After Processing

Write Report


Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Output Dataset Name (data_out) type: dataset_out: Output dataset to write to
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Provides Report of the Selected Antibody Leads (downstream_ngs_selection) type: boolean: Provides detailed information on the biophysical characteristics of the selected antibodies.
    Default: False
  • Metrics to Assess Sanger in Presence of NGS (downstream_sanger) type: boolean: Indicates whether additional metrics are to be included to identify Sanger sequences in NGS and vice-versa
    Default: False
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • interfix (interfix) type: string: name to add in the middle of the file for identification (e.g. ‘cdr3’)
    Default: “”
  • Are these already processed records? (is_analyzed) type: boolean: Indicates whether input are to be analyzed post-processing for generating specific plots.
    Default: False
  • Is This A Downstream Processed File? (is_downstream) type: boolean: Indicates whether the input contains data for downstream processing.
    Default: False
  • Is This A Sanger Processed File? (is_sanger) type: boolean: Indicates whether the input contains Sanger (low-throughput) Sequencing Data
    Default: False
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Split by cluster? Only applies to downstream records. (sequence_logo_by_cluster) type: boolean: Indicates whether to split sequences by cluster before creating sequence logos. Cluster logos are output only if not more than 500 records.
    Default: False
  • ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data) (sequence_logo_roi) type: string: Name of regions to be aligned for sequence logo. Logo is output only if not more than 500 records.
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Write Records to Dataset (write_dataset) type: boolean: Write out a records to dataset
    Default: True
  • Write Barcode Group to Their Own Dataset After Processing (write_group) type: boolean: Write barcode group (if provided) to their own dataset after processing, Note: if only a single barcode group then no separate dataset will be written.
    Default: False
  • Write Report (write_report) type: boolean: Write out a floe report after consolidation
    Default: False

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network