Key Fields¶
Liability Metric Fields¶
Field Name |
Type |
Description |
---|---|---|
liability_string_cdr1_aa_1 |
string |
‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_1 CDR1 (e.g., LCDR1 if orientation is 5’ VL and 3’ VH). |
liability_string_cdr2_aa_1 |
string |
‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_1 CDR2 (e.g., LCDR2 if orientation is 5’ VL and 3’ VH). |
liability_string_cdr3_aa_1 |
string |
‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). |
liability_string_cdr1_aa_2 |
string |
‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_2 CDR1 (e.g., HCDR1 if orientation is 5’ VL and 3’ VH). |
liability_string_cdr2_aa_2 |
string |
‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_2 CDR2 (e.g., HCDR2 if orientation is 5’ VL and 3’ VH). |
liability_string_cdr3_aa_2 |
string |
‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_2 CDR3 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). |
liability_quant_cdr1_aa_1 |
integer |
Total count of liabilities identified within chain_1 CDR1 (e.g., LCDR1 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs. |
liability_quant_cdr2_aa_1 |
integer |
Total count of liabilities identified within chain_1 CDR2 (e.g., LCDR2 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs. |
liability_quant_cdr3_aa_1 |
integer |
Total count of liabilities identified within chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs. |
liability_quant_cdr1_aa_2 |
integer |
Total count of liabilities identified within chain_1 CDR1 (e.g., LCDR1 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs. |
liability_quant_cdr2_aa_2 |
integer |
Total count of liabilities identified within chain_1 CDR2 (e.g., LCDR2 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs. |
liability_quant_cdr3_aa_2 |
integer |
Total count of liabilities identified within chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs. |
liability_quant_chain_1 |
integer |
Total count of liabilities identified across all chain_1 CDRs (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH). |
liability_quant_chain_2 |
integer |
Total count of liabilities identified across all chain_2 CDRs (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH). |
liability_quant_lcdr1_3_hcdr1_3 |
integer |
Total count of liabilities identified across all VH and VL CDRs, only in SANGER/PacBio. |
Biophysical Metric Fields¶
Field Name |
Type |
Description |
---|---|---|
cdr3_aa_1_charge |
float |
Net charge of chain_1 CDR3 at pH 7(e.g., LCDR3 if orientation is 5’ VL and 3’ VH). |
cdr3_aa_2_charge |
float |
Net charge of chain_2 CDR3 at pH 7 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_charge |
float |
Net charge of chain_1 CDR1-3 (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_2_charge |
float |
Net charge of chain_2 CDR1-3 (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_2_charge |
float |
Net charge of LCDR1-3 + HCDR1-3, only relevant to PacBio/SANGER sequencing. |
cdr3_aa_1_hydropathy |
float |
Parker hydropathy of chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). |
cdr3_aa_2_hydropathy |
float |
Parker hydropathy of chain_2 CDR3 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_hydropathy |
float |
Parker hydropathy of chain_1 CDR1-3 (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_2_hydropathy |
float |
Parker hydropathy of chain_2 CDR1-3 (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_2_hydropathy |
float |
Parker hydropathy LCDR1-3 + HCDR1-3, only relevant to PacBio/SANGER sequencing. |
cdr3_aa_1_hydropathy |
float |
Parker hydropathy of chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). |
cdr3_aa_1_length |
integer |
Length of chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). |
cdr3_aa_2_length |
integer |
Length of chain_2 CDR3 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_length |
integer |
Length of chain_1 CDR1-3 (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_2_length |
integer |
Length of chain_2 CDR1-3 (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_2_length |
integer |
Length LCDR1-3 + HCDR1-3, only relevant to PacBio/SANGER sequencing. |
Identifier Fields¶
Field Name |
Type |
Description |
---|---|---|
id |
string |
If SANGER, will be a ‘:’ separated concatenated list of all sequences that match by given region of interest (default = ‘Full-Length, Including Framework’). If NGS, the ‘id’ contains the string ‘NGS’ + ‘sample_name’ + ‘barcode_group’ (e.g., ‘NGS-tri3-tri’). |
sample_name |
string |
If NGS, is derived from the ‘barcode file’, if supplied, under the 1st column. This is used to identify the unique barcode population. If the dataset passes through any downstream processing that conducts enrichment calculation from two distinct populations, sample_name takes on single value from either the early (less_stringent) or late (more_stringent) round population. If SANGER, the sample_name takes on name ‘Sanger’, which should not be modified. |
barcode_round |
string |
If NGS, this is derived from the ‘barcode file’, if supplied, under the 4th column. Takes on values of either ‘early, ‘late’, or ‘’. This field is used to assess enrichment from ‘early’ (less stringent) to ‘late’ (more stringent) rounds of selection by the ‘barcode_group’. |
barcode_group |
string |
If NGS, this is derived from the ‘barcode file’, if supplied, under the 5th column. This is how individual populations are grouped together for enrichment or relative abundance calculations. If SANGER, the barcode_group will always be assigned the name “Sanger”, which should not be modified. |
well_id |
string |
‘:’ separated concatenated string of all ‘id’ field values from SANGER that overlap to given NGS clone by the region of interest (ROI). Only relevant in context of SANGER. |
seq_id |
string |
‘_’ separated string of the enumerated list of sequences by unique integer by sequence and the barcode_group, if the number of unique barcode_groups > 1 then it takes on value of unique_integer and barcode_group (e.g., 33_tri). If the number of barcode_groups <= 1 then it is assigned name unique integer + ‘empty’ (e.g., ‘21_nan’ or ‘21_id’). If barcode_group updatd by Modify Sample Name or Barcode Group FLOE, the seq_id is updated as well. |
Overlap Fields of NGS to SANGER or NGS¶
Field Name |
Type |
Description |
---|---|---|
overlap_to_sanger |
boolean |
True/False indicates whether a given NGS sequence overlaps to SANGER. |
overlap_to_ngs |
boolean |
True/False indicates whether a given SANGER (and NGS, but less relevant) sequence overlaps to NGS. |
overlay_roi |
string |
If used in context of SANGER population this ROI reflects the overlap region of interest (ROI) used to map to SANGER populations (e.g., ‘CDR3 Chain_2 (Downstream Chain)’). |
Enrichment, Abundance, and Relative Abundance Fields¶
Field Name |
Type |
Description |
---|---|---|
count |
integer |
Non-redundant VH+VL (PacBio) [VL or VH if Illumina] count of aa sequences by sample_name. If fold enrichment is calculated using ‘NGS Pipeline’ or ‘Enrichment and Relative Abundance Calculation’ then count takes on sum of the barcode. |
processed_roi |
string |
Indicates the region of interest (ROI) that was processed for enrichment and clustering and different from overlay_roi. |
count_roi_early |
float |
Early (less stringent) round region of interest (ROI) count or pseudo count as specified in ‘barcode_group’ and/or ‘barcode_round’, if specified, across the entire population otherwise (if given value found in ‘late’ but not in ‘early’, assigned a pseudo count calculated by min(‘late’ round roi count) / correction factor called the ‘Early Round Absence Penalty’). |
count_roi_final |
float |
Late/Final (more stringent) round region of interest (ROI) count or pseudo count as specified in ‘barcode_group’ and/or ‘barcode_round’, or pseudo count, if specified, across the entire population otherwise (if given value found in ‘early’ but not in ‘late’, assigned a pseudo count calculated by min(‘early’ round roi count) / correction factor called the ‘Late Round Absence Penalty’). |
percent_roi_early |
float |
Early (less stringent) round region of interest (ROI) relative abundance calculated by count_roi_early * 100 / sum(count_roi_early) using the barcode_group and/or barcode_round, if specified, or across entire population otherwise. Distinct full-length sequences sharing the same ROI will have the same value. |
percent_roi_final |
float |
Late/Final (more stringent) round region of interest (ROI) relative abundance calculated by count_roi_final * 100 / sum(count_roi_final) using the barcode_group and/or barcode_round, if specified, or across entire population otherwise. Distinct full-length sequences sharing the same ROI will have the same value. |
fold_enrichment_roi |
float |
Relative fold enrichment of the region of interest (ROI) calculated by percent_roi_final / percent_roi_early. Distinct full-length sequences sharing the same ROI will have the same value. Full-length sequences only keep a single copy of the full-length sequence from early or late. The relative enrichment by ROI, e.g., percent_roi_final and percent_roi_early, will be retained for each full-length, but results in reduced dataset relative to combined input. |
log2_enrichment |
float |
log2(fold_enrichment_roi). |
round_enrich |
string |
takes on value of ‘early’, ‘late’, or ‘both’. If assigned ‘early’ the given region of interest (ROI) is only found in early but not late and assigned a pseudo count for count_roi_late and percent_roi_late based on the correction factor ‘Early Round Absence Penalty’. If assigned ‘late’ the given ROI is only found in late but not early and assigned a pseudo count for count_roi_early and percent_roi_early based on the correction factor ‘Late Round Absence Penalty’. If assigned ‘both’ the given ROI is found in both rounds and no pseudo values assigned to early or late. |
Scaffold / Germline Call Fields¶
Field Name |
Type |
Description |
---|---|---|
match_name_1 |
string |
The scaffold of chain_1 (upstream/5’ chain) receiving the highest number of votes in either the specified species database (e.g., human, mouse, alpaca or rabbit) or, if provided, closest match to user-provided custom database file. |
match_name_2 |
string |
The scaffold of chain_2 (downstream/3’ chain) receiving the highest number of votes in either the specified species database (e.g., human, mouse, alpaca or rabbit) or, if provided, user provides the custom database file. |
match_name_1_2 |
string |
The scaffold of chain_1 and chain_2. |
Clustering Fields¶
Annotation Fields¶
Field Name |
Type |
Description |
---|---|---|
read |
string |
DNA of the read from the NGS or SANGER source. If only AA sequence processed (SANGER), this field will contain AA not DNA. |
fr1_1 |
string |
DNA Framework 1 of the 5’ (upstream) chain (e.g., Light Chain Framework 1 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
fr1_aa_1 |
string |
Amino Acid Framework 1 of the 5’ (upstream) chain (e.g., Light Chain Framework 1 if orientation is 5’ VL and 3’ VH). |
cdr1_1 |
string |
DNA CDR1 of the 5’ (upstream) chain (e.g., LCDR1 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
cdr1_aa_1 |
string |
Amino Acid CDR1 of the 5’ (upstream) chain (e.g., LCDR1 if orientation is 5’ VL and 3’ VH). |
fr2_1 |
string |
DNA Framework 2 of the 5’ (upstream) chain (e.g., Light Chain Framework 2 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
fr2_aa_1 |
string |
Amino Acid Framework 2 of the 5’ (upstream) chain (e.g., Light Chain Framework 2 if orientation is 5’ VL and 3’ VH). |
cdr2_1 |
string |
DNA CDR2 of the 5’ (upstream) chain (e.g., LCDR2 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
cdr2_aa_1 |
string |
Amino Acid CDR2 of the 5’ (upstream) chain (e.g., LCDR2 if orientation is 5’ VL and 3’ VH). |
fr3_1 |
string |
DNA Framework 3 of the 5’ (upstream) chain (e.g., Light Chain Framework 3 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
fr3_aa_1 |
string |
Amino Acid Framework 3 of the 5’ (upstream) chain (e.g., Light Chain Framework 3 if orientation is 5’ VL and 3’ VH). |
cdr3_1 |
string |
DNA CDR3 of the 5’ (upstream) chain (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
cdr3_aa_1 |
string |
Amino Acid CDR3 of the 5’ (upstream) chain (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). |
fr4_1 |
string |
DNA Framework 4 of the 5’ (upstream) chain (e.g., Light Chain Framework 4 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
fr4_aa_1 |
string |
Amino Acid Framework 4 of the 5’ (upstream) chain (e.g., Light Chain Framework 4 if orientation is 5’ VL and 3’ VH). |
cdr1_2 |
string |
DNA CDR1 of the 3’ (downstream) chain (e.g., HCDR1 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
cdr1_aa_2 |
string |
Amino Acid CDR1 of the 3’ (downstream) chain (e.g., HCDR1 if orientation is 5’ VL and 3’ VH). |
cdr2_2 |
string |
DNA CDR2 of the 3’ (downstream) chain (e.g., HCDR2 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
cdr2_aa_2 |
string |
Amino Acid CDR2 of the 3’ (downstream) chain (e.g., HCDR2 if orientation is 5’ VL and 3’ VH). |
cdr3_2 |
string |
DNA CDR3 of the 3’ (downstream) chain (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
cdr3_aa_2 |
string |
Amino Acid CDR3 of the 3’ (downstream) chain (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). |
sequence_1 |
string |
DNA of the 5’ (upstream) chain (e.g., Light Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
sequence_aa_1 |
string |
Amino Acid of the 5’ (upstream) chain (e.g., Light Chain if orientation is 5’ VL and 3’ VH). |
sequence_2 |
string |
DNA of the 3’ (downstream) chain (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
sequence_aa_2 |
string |
Amino Acid of the 3’ (downstream) chain (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1 |
string |
DNA of the 5’ (upstream) concatenated chain 1 CDRs (e.g., Light Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
merged_cdrs_aa_1 |
string |
Amino Acid of the 5’ (upstream) concatenated chain 1 CDRs (e.g., Light Chain if orientation is 5’ VL and 3’ VH). |
merged_cdrs_2 |
string |
DNA of the 3’ (downstream) concatenated chain 2 CDRs (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA. |
merged_cdrs_aa_2 |
string |
Amino Acid of the 3’ (downstream) concatenated chain 2 CDRs (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH). |
merged_cdrs_1_2 |
string |
DNA of the 5’ and 3’ (upstream and downstream) concatenated chain 1+2 CDRs. If only AA sequence processed (SANGER), this field will contain AA not DNA. |
merged_cdrs_aa_1_2 |
string |
Amino Acid of the 5’ and 3’ (upstream and downstream) concatenated chain 1+2 CDRs. |
Sequence Quality Fields¶
Field Name |
Type |
Description |
---|---|---|
votes_1 |
integer |
IgMather annotation score for chain 1. Minimum number of matching K-mers for germline assignment. Increased numbers make the algorithm more stringent at the expense of not annotating some sequences (default is votes for DNA). |
votes_2 |
integer |
IgMather annotation score for chain 2. Minimum number of matching K-mers for germline assignment. Increased numbers make the algorithm more stringent at the expense of not annotating some sequences (default is votes for DNA). |
functional_1 |
string |
IgMather-based functionality assessment at the 5’ upstream, chain 1. Takes on values of ‘functional’, ‘truncation, ‘frame-shift’, or ‘stop-codon’. Looks for truncations (values below the ‘Minimum chain length’ threshold), frame-shifts (non-zero % modulus value across the VH and VL specified DNA sequence), and stop codons (translated dna resulting in stop codon). |
functional_2 |
string |
IgMather-based functionality assessment at the 3’ downstream, chain 2. Takes on values of ‘functional’, ‘truncation, ‘frame-shift’, or ‘stop-codon’. Looks for truncations (values below the ‘Minimum chain length’ threshold), frame-shifts (non-zero % modulus value across the VH and VL specified DNA sequence), and stop codons (translated dna resulting in stop codon). |
sequence_issue |
string |
Downstream (post-igmatcher) functionality assessment for missing regions (e.g., no cdr1_aa_1 present) or aberrant letters (e.g., ‘X’). |
Special Fields to Add to Upload (Use in Analyze Tool Only)¶
Field Name |
Type |
Description |
---|---|---|
on_rate |
float |
Any KD values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘on_rate’ (case-sensitive). This value is typically derived from kinetics binding experiment (e.g., SPR). The first phase in molecular interaction wherein binding occurs when analyte and ligand collide from diffusion. Occurs when the two molecules have appropriate orientation and sufficient energy to form the interaction. The rate ka describes rate of complex formation (number of complexes formed per second in a one molar solution of ligand and analyte) in units M^-1s^-1. This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |
off_rate |
float |
Any KD values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘off_rate’ (case-sensitive). This value is typically derived from kinetics binding experiment (e.g., SPR). After binding the ligand and analyte remain bound, and when flow over surface of chip is replaced by buffer only, free concentration of analyte drops to zero and complex starts to dissociate at given rate. This describes the stability of the complex (fraction that decays per second) in units of s^-1. This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |
KD |
float |
Any KD values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘KD’ (case-sensitive). This value is typically derived from kinetics binding experiment (e.g., SPR). After a long enough period for analyte binds to ligand a steady state is attained, with net rate of binding is zero. Kd is the dissociation equilibrium constant of kd/ka = KD and in unities of Molar concentration (M). This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |
integer_field |
int |
Any integer values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘integer_field’ (case-sensitive). This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |
float_field |
float |
Any float values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘float_field’ (case-sensitive). This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |
string_field |
string |
Any string values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘string_field’ (case-sensitive). This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |
bool_field |
bool |
Any bool (True/False) values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘bool_field’ (case-sensitive). This modified AbXtract file may be uploaded using the ‘Upload AbXtract Compatible File’ FLOE. |