Condense Dataset by Region of Interest by Most Abundant - AbXtract¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Solution-based/Biologics/Antibody Design
Role-based/Bioinformatician
Role-based/Biologist
Product-based/AbXtract
Description
This will remove redundancy by rank ordering the sequences by full-length count in their respective populations (sample_name + barcode_group) by either cluster or ROI. User gets the option to select how many representatives by ROI or cluster they want and cube will remove others. NOTE: if cluster option is selected and no clustering was performed (e.g., Upstream annotation only), this will default to the user-specied region of interest (ROI).
Promoted Parameters
Title in user interface (promoted name)
Key Condensing Parameters
Keep All Representatives by Group (keep_all_reps): If turned on, will only filter by count, percent_roi_final or count_roi_final, if specified, by the desired count or percent abundance cutoffs. Will not consider other parameters like ‘Number of Representative for Given ROI/Cluster
Required
Type: boolean
Default: False
Choices: [True, False]
Population to consider for condensation (population_to_condense): Will consider these fields to condense the population, so full-length sequences that differ by this population will be kept separate.
Required
Type: string
Default: sample_name+barcode_group
Choices: [‘sample_name+barcode_group’, ‘sample_name’, ‘barcode_group’, ‘none, condense entire population’]
Condense by Cluster or 100% Homology (Unique Only) (condense_by): Indicate whether top representative should be kept by defined cluster, if provided, or by 100% homology (unique only) by given roi.
Required
Type: string
Default: Unique Only
Choices: [‘Unique Only’, ‘Cluster’]
Region of Interest For Condensing (roi): Indicate the region of interest (ROI) for processing, any full-length sequence below number specified are eliminated. If ‘Cluster’ selected for ‘Condense By’ option, then uses the cluster (if clustering performed) and not this particular ROI
Required
Type: string
Default: CDR3 Chain_2 (Downstream Chain)
Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]
Number of representatives to keep from given roi/cluster. (number_of_top_reps): Select the number of representatives to keep after rank order by count in respective populations (sample_name+barcode_group). If value is 0, will keep all sequences.
Required
Type: integer
Default: 3
Minimum Count for the Full-Length Sequence (fl_count): This will set the minimum count for the full-length sequences (uses the value ‘count’ that is already contained within the dataset)
Required
Type: integer
Default: 1
Minimum Count for the Region of Interest (ROI) (roi_count): This will set the minimum count for a given region of interest, all below will be removed.
Required
Type: integer
Default: 1
Minimum Percent for the Region of Interest (ROI) (roi_percent): This will set the minimum percent for a given region of interest, all below will be removed.
Required
Type: decimal
Default: 1e-12
Keep Only Functional Sequences (filter_functional): Eliminates non-functional sequences, truncations, stop-codons, frame-shifts
Required
Type: boolean
Default: True
Choices: [True, False]