Overlap Among Different Datasets - AbXtract

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Solution-based/Biologics/Antibody Design

  • Role-based/Bioinformatician

  • Role-based/Biologist

  • Product-based/AbXtract

Description

Insert all the datasets from different source populations (e.g., barcode group) and the region of interest (ROI) and the floe will create an overlap_population field that indicates all of the populations to which a given ROI is found. One can use the Modify the Sample Name/Barcode Group Floe. May also specify a relaxed stringency for the overlap among populations by increasing the edit distance for given Levenshtein distance or Hamming distance method.

Parameter title in user interface (promoted name)

  • Output CSV Filename (file_name) type: file_out: All records are written to downstream csv file, must contain the *.csv extension
    Default: ngs_overlap.csv

Parameter title in user interface (promoted name)

  • Edit Distance Method For Overlap Among Different Barcode Groups (edit_distance_method_overlap) type: string: Indicate the type of edit distance method to apply for the overlap to complete population. NOTE: Only in effect if edit distance does not equal 0
    Default: Levenshstein Distance
    Choices: Hamming Distance, Levenshstein Distance

Parameter title in user interface (promoted name)

  • Edit Distance for Overlap by ROI of Different Barcode Groups (edit_distance_overlap) type: integer: If there are multiple downstream barcode groups, these will be compared to one another.
    Default: 0 , Max: 100

Parameter title in user interface (promoted name)

  • Region of Interest For the Overlap (roi) type: string: Indicate the region of interest (ROI) for identifying regions of overlap among different barcode groups.
    Default: CDR3 Chain_2 (Downstream Chain)
    Choices: Merged CDRs, CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3, Full-Length

Parameter title in user interface (promoted name)

  • Failed Dataset Output Name (data_out) type: dataset_out: Contains failed records from both upstream and downstream Processes
    Default: problematic.ngs_overlap

Parameter title in user interface (promoted name)

  • Output Name of the Overlapped Dataset (data_out) type: dataset_out: This dataset will contain a consolidated dataset where all overlap populations

by their id are overlapped to different dataset to field called overlap_population NOTE: populations are also compared to themselves, so overlap contains values N>=2

Default: ngs_overlap