Custom SANGER Select of Additional NGS Representatives by Group - AbXtract

This will identify all non-redundant antibodies that share the same cluster or region of interest (ROI), user-specified, as the SANGER. If an OPTIONAL Custom SEQ ID file is provided (containing seq_id in column A and max number of clones in column B) it can indicate the number of desired non-redundant clones by the given SANGER seq_id. If no custom file is provided, then the FLOE will automatically select desired number of leads for ALL non-redundant SANGER sequences.

Promoted Parameters

  • Select the Consolidated Dataset Containing SANGER and NGS Records (data_source) : This consolidated downstream dataset typically contains cluster or annotated records, and will select populations based on the region of interest (may be cluster). MUST Contain SANGER identified in the ‘id’ field.
  • Custom Input File with SEQ ID, OPTIONAL (file_in) : Input a file (column A = seq_id, column B = number of sequences desired) to indicate number of reps to select by given cluster or unique region of interest.
  • Max Number Of Unique NGS Desired? (integer) : Please choose the maximum number full-length unique sequences desired per clone. If custom file input, will override this parameter.
    Default: 10 Min: 1 Max: 1000000
  • Metrics for Ranking (string) : Place metrics in order of ranking (if nothing, ranks by full-length count).
    Default: [‘ROI Percent, Final Round Only’, ‘Full Length (Corrects for Illumina or PacBio), Percent’, ‘Liabilities Both Chains’, ‘Liabilities CDR3_2’]
    Choices: Full Length (Corrects for Illumina or PacBio), Count, Full Length (Corrects for Illumina or PacBio), Percent, ROI Count, Final Round Only, ROI Percent, Final Round Only, ROI Fold Enrichment, Final Round Only, ROI Log2 Enrichment, Final Round Only, Liabilities Both Chains, Liabilities Chain_2, Liabilities Chain_1, Liabilities CDR1_1, Liabilities CDR2_1, Liabilities CDR3_1, Liabilities CDR1_2, Liabilities CDR2_2, Liabilities CDR3_2, ROI Count, Early Round Only, ROI Percent, Early Round Only, Cluster Count (e.g. unique sequences per cluster), Cluster Percent (e.g. unique rep per cluster)
  • Region Of Interest (ROI) To Select Top Representatives (string) : Select the region of interest (ROI) or cluster that match desired sequence ID. IMPORTANT, if cluster is selected then all sequences should come from the dataset that was clustered at the same time.
    Default: Cluster
    Choices: Cluster, Cluster_CDR3_1, Cluster_CDR3_2, Merged CDRs, CDR3 Chain_1, CDR3 Chain_2, HCDR3 and LCDR3, Full-Length, Including Framework
  • Remove Non-Functional or Aberrant Sequences (boolean) :
    Default: True
  • Output Name of the Sanger Selected Population (dataset_out) : Output dataset to write to.
    Default: sanger_selected
  • Failed Dataset Output Name (dataset_out) : Contains failed records from both upstream and downstream processes.
    Default: problematic
Hidden Parameters
  • Edit Distance (Option Not Available At Moment) (integer) : Only 100% homology by ROI or cluster is available.
    Default: 0 Max: 100
  • Edit Distance Method For Overlap (string) : Only 100% identity is available for this option.
    Default: 100% Identity
    Choices: 100% Identity
  • Eliminate Region Of Interest (string) : If a given ROI is selected, it will remove the exact region of interest of the sequence of interest (e.g., SANGER ID or Sequence ID) from the final population so user can explore different ROIs not initially identified within the sequence id population.
    Default: Full-Length, Including Framework
    Choices: KEEP ALL MATCHING ROIs, Merged CDRs, CDR3 Chain_1, CDR3 Chain_2, HCDR3 and LCDR3, Full-Length, Including Framework