Consolidating and Writing Datasets, PacBio

Cube to consolidate and write results Received records from any upstream or downstream cube. Consolidate all input into different sample_names (upstream) or barcode_group (downstream). Consolidate all input into chain amino acid sequence. Generates Floe Report at the end.

Main Parameters

Parameter Name

Output Dataset Name

Provides Report of the Selected Antibody Leads

Metrics to Assess Sanger in Presence of NGS

Are these already processed records?

Is This A Downstream Processed File?

Is This A Sanger Processed File?

Split by cluster? Only applies to downstream records.

ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data)

Emit to CSV Port

Write Records to Dataset

Write Barcode Group to Their Own Dataset After Processing

Write Report


Parameter Details

Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Output Dataset Name (data_out) type: dataset_out: Output dataset to write to
  • Clone name delimiter (delimiter) type: string: Use this delimiter to identify population from clone name
    Default: _
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Provides Report of the Selected Antibody Leads (downstream_ngs_selection) type: boolean: Provides detailed information on the biophysical characteristics of the selected antibodies.
    Default: False
  • Metrics to Assess Sanger in Presence of NGS (downstream_sanger) type: boolean: Indicates whether additional metrics are to be included to identify Sanger sequences in NGS and vice-versa
    Default: False
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • interfix (interfix) type: string: name to add in the middle of the file for identification (e.g. ‘cdr3’)
    Default: “”
  • Are these already processed records? (is_analyzed) type: boolean: Indicates whether input are to be analyzed post-processing for generating specific plots.
    Default: False
  • Is This A Downstream Processed File? (is_downstream) type: boolean: Indicates whether the input contains data for downstream processing.
    Default: False
  • Is This A Sanger Processed File? (is_sanger) type: boolean: Indicates whether the input contains Sanger (low-throughput) Sequencing Data
    Default: False
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Last part of clone name defining population (population_end) type: integer: Use a 1-indexed integer to indicate end of population name after splitting on delimiter
    Default: -1
  • First part of clone name defining population (population_start) type: integer: Use a 1-indexed integer to indicate start of population name after splitting on delimiter
    Default: 1
  • Region of Interest (ROI) For Condensing Sequences (roi) type: string: This will condense the Sanger sequences based on the ROI based rank ordered on abundance.

IMPORTANT: this will remove full-length sequences and only keep most abundant full-length count. If two sequences have same full-length count, then it will pick one or the other.

Default: Full-Length
Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length
  • Split by cluster? Only applies to downstream records. (sequence_logo_by_cluster) type: boolean: Indicates whether to split sequences by cluster before creating sequence logos. Cluster logos are output only if not more than 500 records.
    Default: False
  • ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data) (sequence_logo_roi) type: string: Name of regions to be aligned for sequence logo. Logo is output only if not more than 500 records.
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Shared Region of Interest (ROI) Sequences (shared_roi) type: string: This will provide an overlap_roi output that shows all the individual wells that share the same id.
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length
  • Skip consolidation based on ‘Region of Interest (ROI) for Condensing Sequences’ (skip_clone_consolidation) type: boolean: This will return every clone in a separate row of the resulting CSV file
    Default: False
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Emit to CSV Port (write_csv) type: boolean: Emit output to a CSV Port that will convert Records to CSV
    Default: False
  • Write Records to Dataset (write_dataset) type: boolean: Write out a records to dataset
    Default: True
  • Write Barcode Group to Their Own Dataset After Processing (write_group) type: boolean: Write barcode group (if provided) to their own dataset after processing, Note: if only a single barcode group then no separate dataset will be written.
    Default: False
  • Write Report (write_report) type: boolean: Write out a floe report after consolidation
    Default: False

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network