Condense Dataset by Region of Interest by Most Abundant - AbXtract

Category Paths

  • Solution-based/Biologics/Antibody Design

  • Role-based/Bioinformatician

  • Role-based/Biologist

  • Product-based/AbXtract

Description

This will remove redundancy by rank ordering the sequences by full-length count in their respective populations (sample_name + barcode_group) by either cluster or roi. User gets the option to select how many representatives by roi or cluster they want and cube will remove others. NOTE: if cluster option is selected and no clustering was performed (e.g., Upstream annotation only) this will default to the user-specied region of interest (ROI).

Promoted Parameters

Title in user interface (promoted name)

Key Condensing Parameters

Keep All Representatives by Group (keep_all_reps): If turned on, will only filter by count, percent_roi_final or count_roi_final, if specified, by the desired count or percent abundance cutoffs. Will not consider other parameters like ‘Number of Representative for Given ROI/Cluster

  • Required

  • Type: boolean

  • Default: False

  • Choices: [True, False]

Population to consider for condensation (population_to_condense): Will consider these fields to condense the population, so full-length sequences that differ by this population will be kept separate.

  • Required

  • Type: string

  • Default: sample_name+barcode_group

  • Choices: [‘sample_name+barcode_group’, ‘sample_name’, ‘barcode_group’, ‘none, condense entire population’]

Condense by Cluster or 100% Homology (Unique Only) (condense_by): Indicate whether top representative should be kept by defined cluster, if provided, or by 100% homology (unique only) by given roi.

  • Required

  • Type: string

  • Default: Unique Only

  • Choices: [‘Unique Only’, ‘Cluster’]

Region of Interest For Condensing (roi): Indicate the region of interest (ROI) for processing, any full-length sequence below number specified are eliminated. If ‘Cluster’ selected for ‘Condense By’ option, then uses the cluster (if clustering performed) and not this particular ROI

  • Required

  • Type: string

  • Default: CDR3 Chain_2 (Downstream Chain)

  • Choices: [‘Merged CDRs’, ‘CDR3 Chain_1 (Upstream Chain)’, ‘CDR3 Chain_2 (Downstream Chain)’, ‘HCDR3 and LCDR3’, ‘Full-Length’]

Number of representatives to keep from given roi/cluster. (number_of_top_reps): Select the number of representatives to keep after rank order by count in respective populations (sample_name+barcode_group). If value is 0, will keep all sequences.

  • Required

  • Type: integer

  • Default: 3

Minimum Count for the Full-Length Sequence (fl_count): This will set the minimum count for the full-length sequences (uses the value ‘count’ that is already contained within the dataset)

  • Required

  • Type: integer

  • Default: 1

Minimum Count for the Region of Interest (ROI) (roi_count): This will set the minimum count for a given region of interest, all below will be removed.

  • Required

  • Type: integer

  • Default: 1

Minimum Percent for the Region of Interest (ROI) (roi_percent): This will set the minimum percent for a given region of interest, all below will be removed.

  • Required

  • Type: decimal

  • Default: 1e-12

Keep Only Functional Sequences (filter_functional): Eliminates non-functional sequences, truncations, stop-codons, frame-shifts

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]