Consolidating and Writing Datasets, PacBio¶

Cube to consolidate and write results. Received records from any upstream or downstream cube. Consolidate all input into different sample_names (upstream) or barcode_group (downstream). Consolidate all input into chain amino acid sequence. Generates FLOE report at the end.

Main Parameters¶

Parameter Name
Output Dataset Name
Provides Report of the Selected Antibody Leads
Metrics to Assess Sanger in Presence of NGS
Are these already processed records?
Is This A Downstream Processed File?
Is This A Sanger Processed File?
Split by cluster? Only applies to downstream records.
ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data)
Write IgMatcher to File after Processing
Write Records to Dataset
Write Barcode Group to Their Own Dataset After Processing
Write Report

Calculation Parameters¶

CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128

Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network

Output Dataset Name (data_out) type: dataset_out: Output dataset to write to

Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592

Provides Report of the Selected Antibody Leads (downstream_ngs_selection) type: boolean: Provides detailed information on the biophysical characteristics of the selected antibodies.

Default: False

Metrics to Assess Sanger in Presence of NGS (downstream_sanger) type: boolean: Indicates whether additional metrics are to be included to identify Sanger sequences in NGS and vice-versa

Default: False

GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16

Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on

interfix (interfix) type: string: name to add in the middle of the file for identification (e.g. ‘cdr3’)

Default: “”

Are these already processed records? (is_analyzed) type: boolean: Indicates whether input are to be analyzed post-processing for generating specific plots.

Default: False

Is This A Downstream Processed File? (is_downstream) type: boolean: Indicates whether the input contains data for downstream processing.

Default: False

Is This A Sanger Processed File? (is_sanger) type: boolean: Indicates whether the input contains Sanger (low-throughput) Sequencing Data

Default: False

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592

Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300

Split by cluster? Only applies to downstream records. (sequence_logo_by_cluster) type: boolean: Indicates whether to split sequences by cluster before creating sequence logos. Cluster logos are output only if not more than 500 records.

Default: False

ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data) (sequence_logo_roi) type: string: Name of regions to be aligned for sequence logo. Logo is output only if not more than 500 records.

Default: CDR3 Chain_2 (Downstream Chain)

Choices: CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3

Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Write IgMatcher to File after Processing (write_csv) type: boolean: Write barcode group (if provided) to their own dataset after processing, Note: if only a single barcode group then no separate dataset will be written.

Default: True

Write Records to Dataset (write_dataset) type: boolean: Write out a records to dataset

Default: True

Write Barcode Group to Their Own Dataset After Processing (write_group) type: boolean: Write barcode group (if provided) to their own dataset after processing, Note: if only a single barcode group then no separate dataset will be written.

Default: False

Write Report (write_report) type: boolean: Write out a floe report after consolidation

Default: False

Hardware Parameters¶

Machine hardware requirements

Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 1800 , Min: 256.0, Max: 8589934592
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.

Default: 5120.0 , Min: 128.0, Max: 8589934592
GPUs (gpu_count) type: integer: The number of GPUs to run this cube with

Default: 0 , Max: 16
CPUs (cpu_count) type: integer: The number of CPUs to run this cube with

Default: 1 , Min: 1, Max: 128
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Spot policy (spot_policy) type: string: Control cube placement on spot market instances

Default: Prohibited

Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)

Default: “”

Metrics Parameters¶

Cube Metric Parameters

Metric Period (None) type: decimal: How often to sample metrics, in seconds

Default: 60

Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Cube Metrics (None) type: string: Set of metrics to be collected

Choices: cpu, disk, memory, network