Condense Total Number of Sequences by Region of Interest (ROI)

This will remove redundancy by rank ordering the sequences by full-length count in their respective populations (sample_name + barcode_group) by either cluster or roi. User gets the option to select how many representatives by roi or cluster they want and cube will remove others NOTE: if cluster option is selected and no cluster performed it will default to unique amino acid sequence for given roi.

Main Parameters

Parameter Name

Condense by Cluster or 100% Homology (Unique Only)

Keep Only Functional Sequences

Minimum Count for the Full-Length Sequence

Keep All Representatives by Group

Number of representatives to keep from given roi/cluster.

Population to consider for condensation

Region of Interest For Condensing

Minimum Count for the Region of Interest (ROI)

Minimum Percent for the Region of Interest (ROI)

Write the Condense Total Number of Sequence Output to CSV File


Calculation Parameters

  • Condense by Cluster or 100% Homology (Unique Only) (condense_by) type: string: Indicate whether top representative should be kept by defined cluster, if provided, or by 100% homology (unique only) by given roi.
    Default: Unique Only
    Choices: Unique Only, Cluster
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Keep Only Functional Sequences (filter_functional) type: boolean: Eliminates non-functional sequences, truncations, stop-codons, frame-shifts
    Default: True
  • Minimum Count for the Full-Length Sequence (fl_count) type: integer: This will set the minimum count for the full-length sequences (uses the value ‘count’ that is already contained within the dataset)
    Default: 1 , Min: 1, Max: 10000000000
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Keep All Representatives by Group (keep_all_reps) type: boolean: If turned on, will only filter by count, percent_roi_final or count_roi_final, if specified, by the desired count or percent abundance cutoffs. Will not consider other parameters like ‘Number of Representative for Given ROI/Cluster
    Default: False
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Number of representatives to keep from given roi/cluster. (number_of_top_reps) type: integer: Select the number of representatives to keep after rank order by count in respective

populations (sample_name+barcode_group). If value is 0, will keep all sequences.

Default: 3 , Max: 100000000
  • Population to consider for condensation (population_to_condense) type: string: Will consider these fields to condense the population, so full-length sequences that differ by this population will be kept separate.
    Default: sample_name+barcode_group
    Choices: sample_name+barcode_group, sample_name, barcode_group, none, condense entire population
  • Region of Interest For Condensing (roi) type: string: Indicate the region of interest (ROI) for processing, any full-length sequence below number specified are eliminated. If ‘Cluster’ selected for ‘Condense By’ option, then uses the cluster (if clustering performed) and not this particular ROI
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length
  • Minimum Count for the Region of Interest (ROI) (roi_count) type: integer: This will set the minimum count for a given region of interest, all below will be removed.
    Default: 1 , Min: 1, Max: 10000000000
  • Minimum Percent for the Region of Interest (ROI) (roi_percent) type: decimal: This will set the minimum percent for a given region of interest, all below will be removed.
    Default: 1e-12 , Min: 1e-12, Max: 100
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Write the Condense Total Number of Sequence Output to CSV File (write_to_csv_file) type: boolean: Allows the option to write to CSV after the AbXtract Processing file at the cost of additional time. If not, can do this in separate step. Writes an empty file if turned off.
    Default: True

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network