Condense Dataset by Region of Interest by Most Abundant - AbXtract

This will remove redundancy by rank ordering the sequences by full-length count in their respective populations (sample_name + barcode_group) by either cluster or roi. User gets the option to select how many representatives by roi or cluster they want and cube will remove others. NOTE: if cluster option is selected and no clustering was performed (e.g., Upstream annotation only) this will default to the user-specied region of interest (ROI).

Promoted Parameters

  • Select Dataset (Can Be Multiple) To Be Filtered (data_source) : The dataset(s) to read records from.
  • Keep All Representatives by Group (boolean) : If turned on, will only filter by percent_roi_final or count_roi_final filtering by the desired count or percent abundance cutoffs.
    Default: False
  • Condense By Cluster Or 100% Homology (Unique Only) (string) : Indicate whether top representative should be kept by defined cluster, if provided, or by 100% homology (unique only) by given ROI.
    Default: Unique Only
    Choices: Unique Only, Cluster
  • Keep Only Functional Sequences (boolean) : Eliminates non-functional sequences, truncations, stop-codons, frame-shifts.
    Default: True
  • Number of Representatives to Keep from Given ROI/cluster (integer) : Select the number of representatives to keep after rank order by count in respective populations (sample_name+barcode_group). If value is 0, will keep all sequences.
    Default: 3 Max: 100000000
  • Minimum Count for the Full-Length Sequence (integer) : This will set the minimum count for the full-length sequences (uses the value ‘count’ that is already contained within the dataset).
    Default: 1 Max: 10000000000
  • Minimum Count for the Region of Interest (ROI) (integer) : This will set the minimum count for a given region of interest, all below will be removed.
    Default: 1 Max: 10000000000
  • Minimum Percent for the Region of Interest (ROI) (integer) : This will set the minimum percent for a given region of interest, all below will be removed.
    Default: 0.000000000001 Max: 100
  • Population to Consider for Condensation (string) : Will consider these fields to condense the population, so full-length sequences that differ by this population will be kept separate.
    Default: sample_name+barcode_group
    Choices: sample_name+barcode_group, sample_name, barcode_group, none, condense entire population
  • Region of Interest For Filtering (string) : Indicate the region of interest for processing, any full-length sequence below number specified are eliminated
    Default: HCDR3 and LCDR3
    Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length, Including Framework
  • Output name of dataset to be condensed (dataset_out) : Dataset name which will contain single dataset of filtered sequences by most abundant roi.
    Default: filtered.
  • Failed Dataset Output Name (dataset_out) : Contains failed records from both upstream and downstream processes.
    Default: problematic.
Hidden Parameters
  • Minimum Count for the Region of Interest (ROI) (integer) : This will set the minimum count for a given region of interest, all below will be removed.
    Default: 1 Min: 1 Max: 10000000000
  • Minimum Percent for the Region of Interest (ROI) (decimal) : This will set the minimum percent for a given region of interest, all below will be removed.
    Default: 1e-12 Min: 1e-12 Max: 100