Consolidating and Writing Datasets, PacBio
Cube to consolidate and write results Received records from any upstream or downstream cube. Consolidate all input into different sample_names (upstream) or barcode_group (downstream). Consolidate all input into chain amino acid sequence. Generates Floe Report at the end.
Main Parameters
Parameter Name |
---|
Output Dataset Name |
Provides Report of the Selected Antibody Leads |
Metrics to Assess Sanger in Presence of NGS |
Are these already processed records? |
Is This A Downstream Processed File? |
Is This A Sanger Processed File? |
Split by cluster? Only applies to downstream records. |
ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data) |
Emit to CSV Port |
Write Records to Dataset |
Write Barcode Group to Their Own Dataset After Processing |
Write Report |
Parameter Details
Calculation Parameters
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Output Dataset Name (data_out) type: dataset_out: Output dataset to write to Clone name delimiter (delimiter) type: string: Use this delimiter to identify population from clone nameDefault: _ Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Provides Report of the Selected Antibody Leads (downstream_ngs_selection) type: boolean: Provides detailed information on the biophysical characteristics of the selected antibodies.Default: False Metrics to Assess Sanger in Presence of NGS (downstream_sanger) type: boolean: Indicates whether additional metrics are to be included to identify Sanger sequences in NGS and vice-versaDefault: False GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on interfix (interfix) type: string: name to add in the middle of the file for identification (e.g. ‘cdr3’)Default: “” Are these already processed records? (is_analyzed) type: boolean: Indicates whether input are to be analyzed post-processing for generating specific plots.Default: False Is This A Downstream Processed File? (is_downstream) type: boolean: Indicates whether the input contains data for downstream processing.Default: False Is This A Sanger Processed File? (is_sanger) type: boolean: Indicates whether the input contains Sanger (low-throughput) Sequencing DataDefault: False Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Last part of clone name defining population (population_end) type: integer: Use a 1-indexed integer to indicate end of population name after splitting on delimiterDefault: -1 First part of clone name defining population (population_start) type: integer: Use a 1-indexed integer to indicate start of population name after splitting on delimiterDefault: 1 Region of Interest (ROI) For Condensing Sequences (roi) type: string: This will condense the Sanger sequences based on the ROI based rank ordered on abundance.
IMPORTANT: this will remove full-length sequences and only keep most abundant full-length count. If two sequences have same full-length count, then it will pick one or the other.
Default: Full-LengthChoices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length
Split by cluster? Only applies to downstream records. (sequence_logo_by_cluster) type: boolean: Indicates whether to split sequences by cluster before creating sequence logos. Cluster logos are output only if not more than 500 records.Default: False ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data) (sequence_logo_roi) type: string: Name of regions to be aligned for sequence logo. Logo is output only if not more than 500 records.Default: CDR3 Chain_2 (Downstream Chain)Choices: CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Shared Region of Interest (ROI) Sequences (shared_roi) type: string: This will provide an overlap_roi output that shows all the individual wells that share the same id.Default: CDR3 Chain_2 (Downstream Chain)Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length Skip consolidation based on ‘Region of Interest (ROI) for Condensing Sequences’ (skip_clone_consolidation) type: boolean: This will return every clone in a separate row of the resulting CSV fileDefault: False Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required Emit to CSV Port (write_csv) type: boolean: Emit output to a CSV Port that will convert Records to CSVDefault: False Write Records to Dataset (write_dataset) type: boolean: Write out a records to datasetDefault: True Write Barcode Group to Their Own Dataset After Processing (write_group) type: boolean: Write barcode group (if provided) to their own dataset after processing, Note: if only a single barcode group then no separate dataset will be written.Default: False Write Report (write_report) type: boolean: Write out a floe report after consolidationDefault: False
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network