Key Fields

Liability Metric Fields

Field Name	Type	Description
liability_string_cdr1_aa_1	string	‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_1 CDR1 (e.g., LCDR1 if orientation is 5’ VL and 3’ VH).
liability_string_cdr2_aa_1	string	‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_1 CDR2 (e.g., LCDR2 if orientation is 5’ VL and 3’ VH).
liability_string_cdr3_aa_1	string	‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH).
liability_string_cdr1_aa_2	string	‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_2 CDR1 (e.g., HCDR1 if orientation is 5’ VL and 3’ VH).
liability_string_cdr2_aa_2	string	‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_2 CDR2 (e.g., HCDR2 if orientation is 5’ VL and 3’ VH).
liability_string_cdr3_aa_2	string	‘+’ concatenated string of identified liabilities by type (e.g., ‘YYY - Polyspecificity + ‘DG - Isomerization’) within chain_2 CDR3 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH).
liability_quant_cdr1_aa_1	integer	Total count of liabilities identified within chain_1 CDR1 (e.g., LCDR1 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs.
liability_quant_cdr2_aa_1	integer	Total count of liabilities identified within chain_1 CDR2 (e.g., LCDR2 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs.
liability_quant_cdr3_aa_1	integer	Total count of liabilities identified within chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs.
liability_quant_cdr1_aa_2	integer	Total count of liabilities identified within chain_1 CDR1 (e.g., LCDR1 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs.
liability_quant_cdr2_aa_2	integer	Total count of liabilities identified within chain_1 CDR2 (e.g., LCDR2 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs.
liability_quant_cdr3_aa_2	integer	Total count of liabilities identified within chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH), each liability counted only once per CDR, even if multiple of same liability found in single CDRs.
liability_quant_chain_1	integer	Total count of liabilities identified across all chain_1 CDRs (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH).
liability_quant_chain_2	integer	Total count of liabilities identified across all chain_2 CDRs (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH).
liability_quant_lcdr1_3_hcdr1_3	integer	Total count of liabilities identified across all VH and VL CDRs, only in SANGER/PacBio.

Biophysical Metric Fields

Field Name	Type	Description
cdr3_aa_1_charge	float	Net charge of chain_1 CDR3 at pH 7(e.g., LCDR3 if orientation is 5’ VL and 3’ VH).
cdr3_aa_2_charge	float	Net charge of chain_2 CDR3 at pH 7 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_charge	float	Net charge of chain_1 CDR1-3 (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_2_charge	float	Net charge of chain_2 CDR1-3 (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_2_charge	float	Net charge of LCDR1-3 + HCDR1-3, only relevant to PacBio/SANGER sequencing.
cdr3_aa_1_hydropathy	float	Parker hydropathy of chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH).
cdr3_aa_2_hydropathy	float	Parker hydropathy of chain_2 CDR3 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_hydropathy	float	Parker hydropathy of chain_1 CDR1-3 (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_2_hydropathy	float	Parker hydropathy of chain_2 CDR1-3 (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_2_hydropathy	float	Parker hydropathy LCDR1-3 + HCDR1-3, only relevant to PacBio/SANGER sequencing.
cdr3_aa_1_hydropathy	float	Parker hydropathy of chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH).
cdr3_aa_1_length	integer	Length of chain_1 CDR3 (e.g., LCDR3 if orientation is 5’ VL and 3’ VH).
cdr3_aa_2_length	integer	Length of chain_2 CDR3 (e.g., HCDR3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_length	integer	Length of chain_1 CDR1-3 (e.g., LCDR1-3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_2_length	integer	Length of chain_2 CDR1-3 (e.g., HCDR1-3 if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_2_length	integer	Length LCDR1-3 + HCDR1-3, only relevant to PacBio/SANGER sequencing.

Identifier Fields

Field Name	Type	Description
id	string	If SANGER, will be a ‘:’ separated concatenated list of all sequences that match by given region of interest (default = ‘Full-Length, Including Framework’). If NGS, the ‘id’ contains the string ‘NGS’ + ‘sample_name’ + ‘barcode_group’ (e.g., ‘NGS-tri3-tri’).
clone_id	string	If SANGER, will be the top full-length sequence id in population (‘Full-Length, Including Framework’).
sample_name	string	If NGS, is derived from the ‘barcode file’, if supplied, under the 1st column. This is used to identify the unique barcode population. If the dataset passes through any downstream processing that conducts enrichment calculation from two distinct populations, sample_name takes on single value from either the early (less_stringent) or late (more_stringent) round population. If SANGER, the sample_name takes on name ‘Sanger’, which should not be modified.
barcode_round	string	If NGS, this is derived from the ‘barcode file’, if supplied, under the 4th column. Takes on values of either ‘early, ‘late’, or ‘’. This field is used to assess enrichment from ‘early’ (less stringent) to ‘late’ (more stringent) rounds of selection by the ‘barcode_group’.
barcode_group	string	If NGS, this is derived from the ‘barcode file’, if supplied, under the 5th column. This is how individual populations are grouped together for enrichment or relative abundance calculations. If SANGER, the barcode_group will always be assigned the name “Sanger”, which should not be modified.
well_id	string	‘:’ separated concatenated string of all ‘id’ field values from SANGER that overlap to given NGS clone by the region of interest (ROI). Only relevant in context of SANGER.
seq_id	string	‘_’ separated string of the enumerated list of sequences by unique integer by sequence and the barcode_group, if the number of unique barcode_groups > 1 then it takes on value of unique_integer and barcode_group (e.g., 33_tri). If the number of barcode_groups <= 1 then it is assigned name unique integer + ‘empty’ (e.g., ‘21_nan’ or ‘21_id’). If barcode_group is updated by the Modify Sample Name/Barcode Group for Downstream Processing - AbXtract Floe, the seq_id is updated as well.

Overlap Fields of NGS to SANGER or NGS

Field Name	Type	Description
overlap_to_sanger	boolean	True/False indicates whether a given NGS sequence overlaps to SANGER.
overlap_to_ngs	boolean	True/False indicates whether a given SANGER (and NGS, but less relevant) sequence overlaps to NGS.
overlay_roi	string	If used in context of SANGER population this ROI reflects the overlap region of interest (ROI) used to map to SANGER populations (e.g., ‘CDR3 Chain_2 (Downstream Chain)’).

Enrichment, Abundance and Relative Abundance Fields

Field Name	Type	Description
count	integer	Non-redundant VH+VL (PacBio) [VL or VH if Illumina] count of aa sequences by sample_name. If fold enrichment is calculated using ‘NGS Pipeline’ or ‘Enrichment and Relative Abundance Calculation’ then count takes on sum of the barcode.
processed_roi	string	Indicates the region of interest (ROI) that was processed for enrichment and clustering and different from overlay_roi.
count_roi_early	float	Early (less stringent) round region of interest (ROI) count or pseudo count as specified in ‘barcode_group’ and/or ‘barcode_round’, if specified, across the entire population otherwise (if given value found in ‘late’ but not in ‘early’, assigned a pseudo count calculated by min(‘late’ round roi count) / correction factor called the ‘Early Round Absence Penalty’).
count_roi_final	float	Late/Final (more stringent) round region of interest (ROI) count or pseudo count as specified in ‘barcode_group’ and/or ‘barcode_round’, or pseudo count, if specified, across the entire population otherwise (if given value found in ‘early’ but not in ‘late’, assigned a pseudo count calculated by min(‘early’ round roi count) / correction factor called the ‘Late Round Absence Penalty’).
percent_roi_early	float	Early (less stringent) round region of interest (ROI) relative abundance calculated by count_roi_early * 100 / sum(count_roi_early) using the barcode_group and/or barcode_round, if specified, or across entire population otherwise. Distinct full-length sequences sharing the same ROI will have the same value.
percent_roi_final	float	Late/Final (more stringent) round region of interest (ROI) relative abundance calculated by count_roi_final * 100 / sum(count_roi_final) using the barcode_group and/or barcode_round, if specified, or across entire population otherwise. Distinct full-length sequences sharing the same ROI will have the same value.
fold_enrichment_roi	float	Relative fold enrichment of the region of interest (ROI) calculated by percent_roi_final / percent_roi_early. Distinct full-length sequences sharing the same ROI will have the same value. Full-length sequences only keep a single copy of the full-length sequence from early or late. The relative enrichment by ROI, e.g., percent_roi_final and percent_roi_early, will be retained for each full-length, but results in reduced dataset relative to combined input.
log2_enrichment	float	log2(fold_enrichment_roi).
round_enrich	string	takes on value of ‘early’, ‘late’, or ‘both’. If assigned ‘early’ the given region of interest (ROI) is only found in early but not late and assigned a pseudo count for count_roi_late and percent_roi_late based on the correction factor ‘Early Round Absence Penalty’. If assigned ‘late’ the given ROI is only found in late but not early and assigned a pseudo count for count_roi_early and percent_roi_early based on the correction factor ‘Late Round Absence Penalty’. If assigned ‘both’ the given ROI is found in both rounds and no pseudo values assigned to early or late.

Scaffold / Germline Call Fields

Field Name	Type	Description
match_name_1	string	The scaffold of chain_1 (upstream/5’ chain) receiving the highest number of votes in either the specified species database (e.g., human, mouse, alpaca or rabbit) or, if provided, closest match to user-provided custom database file.
match_name_2	string	The scaffold of chain_2 (downstream/3’ chain) receiving the highest number of votes in either the specified species database (e.g., human, mouse, alpaca or rabbit) or, if provided, user provides the custom database file.
match_name_1_2	string	The scaffold of chain_1 and chain_2.

Clustering Fields

Field Name	Type	Description
cluster	string	This indicates the cluster assignment by user-defined region of interest (ROI). The letters ‘AB’ indicate CDR1 and CDR2 chain_1. ‘C’ indicates CDR3 chain_1. ‘DE’ indicates CDR1 and CDR2 chain_2. ‘F’ indicates CDR3 chain_2.
cluster_numeric	integer	Is the numeric representation of the ‘cluster’ assignment. Used to plot in interactive tool for large number of clusters N > 300.
cluster_cdr3_1	string	This indicates the cluster assignment to CDR3 chain 1 if included within by user-defined region of interest (ROI). The letter ‘C’ indicates CDR3 chain_1 (e.g., Light Chain CDR3 if orientation is 5’ VL and 3’ VH). This value will be -1C if CDR3 chain 1 not specified (e.g., if ‘CDR3 Chain_2 (Downstream Chain)’ selected).
cluster_cdr3_2	string	This indicates the cluster assignment to CDR3 chain 2 if included within by user-defined region of interest (ROI). The letter ‘F’ indicates CDR3 chain_1 (e.g., Light Chain CDR3 if orientation is 5’ VL and 3’ VH). This value will be -1F if CDR3 chain 2 not specified (e.g., if ‘CDR3 Chain_1 (Upstream Chain)’ selected).
noise_cluster_1	bool	If AbScan is being used there some of the population may not be assigned to a cluster (e.g, noise or outlier). Nonetheless, these noise points are still assigned a cluster_cdr3_1 (if region included the region of interest). This field indicates the CDRs assigned a noise value.
noise_cluster_2	bool	If AbScan is being used there some of the population may not be assigned to a cluster (e.g, noise or outlier). Nonetheless, these noise points are still assigned a cluster_cdr3_2 (if region included the region of interest). This field indicates the CDRs assigned a noise value.

Annotation Fields

Field Name	Type	Description
read	string	DNA of the read from the NGS or SANGER source. If only AA sequence processed (SANGER), this field will contain AA not DNA.
fr1_1	string	DNA Framework 1 of the 5’ (upstream) chain (e.g., Light Chain Framework 1 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
fr1_aa_1	string	Amino Acid Framework 1 of the 5’ (upstream) chain (e.g., Light Chain Framework 1 if orientation is 5’ VL and 3’ VH).
cdr1_1	string	DNA CDR1 of the 5’ (upstream) chain (e.g., LCDR1 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
cdr1_aa_1	string	Amino Acid CDR1 of the 5’ (upstream) chain (e.g., LCDR1 if orientation is 5’ VL and 3’ VH).
fr2_1	string	DNA Framework 2 of the 5’ (upstream) chain (e.g., Light Chain Framework 2 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
fr2_aa_1	string	Amino Acid Framework 2 of the 5’ (upstream) chain (e.g., Light Chain Framework 2 if orientation is 5’ VL and 3’ VH).
cdr2_1	string	DNA CDR2 of the 5’ (upstream) chain (e.g., LCDR2 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
cdr2_aa_1	string	Amino Acid CDR2 of the 5’ (upstream) chain (e.g., LCDR2 if orientation is 5’ VL and 3’ VH).
fr3_1	string	DNA Framework 3 of the 5’ (upstream) chain (e.g., Light Chain Framework 3 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
fr3_aa_1	string	Amino Acid Framework 3 of the 5’ (upstream) chain (e.g., Light Chain Framework 3 if orientation is 5’ VL and 3’ VH).
cdr3_1	string	DNA CDR3 of the 5’ (upstream) chain (e.g., LCDR3 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
cdr3_aa_1	string	Amino Acid CDR3 of the 5’ (upstream) chain (e.g., LCDR3 if orientation is 5’ VL and 3’ VH).
fr4_1	string	DNA Framework 4 of the 5’ (upstream) chain (e.g., Light Chain Framework 4 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
fr4_aa_1	string	Amino Acid Framework 4 of the 5’ (upstream) chain (e.g., Light Chain Framework 4 if orientation is 5’ VL and 3’ VH).
cdr1_2	string	DNA CDR1 of the 3’ (downstream) chain (e.g., HCDR1 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
cdr1_aa_2	string	Amino Acid CDR1 of the 3’ (downstream) chain (e.g., HCDR1 if orientation is 5’ VL and 3’ VH).
cdr2_2	string	DNA CDR2 of the 3’ (downstream) chain (e.g., HCDR2 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
cdr2_aa_2	string	Amino Acid CDR2 of the 3’ (downstream) chain (e.g., HCDR2 if orientation is 5’ VL and 3’ VH).
cdr3_2	string	DNA CDR3 of the 3’ (downstream) chain (e.g., HCDR3 if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
cdr3_aa_2	string	Amino Acid CDR3 of the 3’ (downstream) chain (e.g., HCDR3 if orientation is 5’ VL and 3’ VH).
sequence_1	string	DNA of the 5’ (upstream) chain (e.g., Light Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
sequence_aa_1	string	Amino Acid of the 5’ (upstream) chain (e.g., Light Chain if orientation is 5’ VL and 3’ VH).
sequence_2	string	DNA of the 3’ (downstream) chain (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
sequence_aa_2	string	Amino Acid of the 3’ (downstream) chain (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH).
merged_cdrs_1	string	DNA of the 5’ (upstream) concatenated chain 1 CDRs (e.g., Light Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
merged_cdrs_aa_1	string	Amino Acid of the 5’ (upstream) concatenated chain 1 CDRs (e.g., Light Chain if orientation is 5’ VL and 3’ VH).
merged_cdrs_2	string	DNA of the 3’ (downstream) concatenated chain 2 CDRs (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH). If only AA sequence processed (SANGER), this field will contain AA not DNA.
merged_cdrs_aa_2	string	Amino Acid of the 3’ (downstream) concatenated chain 2 CDRs (e.g., Heavy Chain if orientation is 5’ VL and 3’ VH).
merged_cdrs_1_2	string	DNA of the 5’ and 3’ (upstream and downstream) concatenated chain 1+2 CDRs. If only AA sequence processed (SANGER), this field will contain AA not DNA.
merged_cdrs_aa_1_2	string	Amino Acid of the 5’ and 3’ (upstream and downstream) concatenated chain 1+2 CDRs.

Sequence Quality Fields

Field Name	Type	Description
votes_1	integer	IgMather annotation score for chain 1. Minimum number of matching K-mers for germline assignment. Increased numbers make the algorithm more stringent at the expense of not annotating some sequences (default is votes for DNA).
votes_2	integer	IgMather annotation score for chain 2. Minimum number of matching K-mers for germline assignment. Increased numbers make the algorithm more stringent at the expense of not annotating some sequences (default is votes for DNA).
functional_1	string	IgMather-based functionality assessment at the 5’ upstream, chain 1. Takes on values of ‘functional’, ‘truncation, ‘frame-shift’, or ‘stop-codon’. Looks for truncations (values below the ‘Minimum chain length’ threshold), frame-shifts (non-zero % modulus value across the VH and VL specified DNA sequence), and stop codons (translated dna resulting in stop codon).
functional_2	string	IgMather-based functionality assessment at the 3’ downstream, chain 2. Takes on values of ‘functional’, ‘truncation, ‘frame-shift’, or ‘stop-codon’. Looks for truncations (values below the ‘Minimum chain length’ threshold), frame-shifts (non-zero % modulus value across the VH and VL specified DNA sequence), and stop codons (translated dna resulting in stop codon).
sequence_issue	string	Downstream (post-igmatcher) functionality assessment for missing regions (e.g., no cdr1_aa_1 present) or aberrant letters (e.g., ‘X’).

Special Fields to Add to Upload (Use in Analyze Tool Only)

Field Name	Type	Description
on_rate	float	Any KD values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘on_rate’ (case-sensitive). This value is typically derived from kinetics binding experiment (e.g., SPR). The first phase in molecular interaction wherein binding occurs when analyte and ligand collide from diffusion. Occurs when the two molecules have appropriate orientation and sufficient energy to form the interaction. The rate ka describes rate of complex formation (number of complexes formed per second in a one molar solution of ligand and analyte) in units M^-1s^-1. This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.
off_rate	float	Any KD values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘off_rate’ (case-sensitive). This value is typically derived from kinetics binding experiment (e.g., SPR). After binding the ligand and analyte remain bound, and when flow over surface of chip is replaced by buffer only, free concentration of analyte drops to zero and complex starts to dissociate at given rate. This describes the stability of the complex (fraction that decays per second) in units of s^-1. This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.
KD	float	Any KD values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘KD’ (case-sensitive). This value is typically derived from kinetics binding experiment (e.g., SPR). After a long enough period for analyte binds to ligand a steady state is attained, with net rate of binding is zero. Kd is the dissociation equilibrium constant of kd/ka = KD and in unities of Molar concentration (M). This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.
integer_field	int	Any integer values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘integer_field’ (case-sensitive). This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.
float_field	float	Any float values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘float_field’ (case-sensitive). This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.
string_field	string	Any string values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘string_field’ (case-sensitive). This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.
bool_field	bool	Any bool (True/False) values can be included in a downloaded AbXtract Excel, CSV, or TSV if provided column name matches ‘bool_field’ (case-sensitive). This modified AbXtract file may be uploaded using the Upload AbXtract Compatible File Floe.

AIRR Fields

sequence	string	required, nullable	The query nucleotide sequence. Usually, this is the unmodified input sequence, which may be reverse complemented if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences if these steps are performed prior to alignment.
rev_comp	boolean	“T” or “F”, required, nullable	True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True then all output data, such as alignment coordinates and sequences, are based on the reverse complement of ‘sequence’.
productive	boolean	“T” or “F”, required, nullable	True if the V(D)J sequence is predicted to be productive.
v_call	string	required, nullable	V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB).
d_call	string	required, nullable	First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).
j_call	string	required, nullable	J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB).
sequence_alignment	string	required, nullable	Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement.
germline_alignment	string	required, nullable	Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any).
junction	string	required, nullable	Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons.
junction_aa	string	required, nullable	Amino acid translation of the junction.
v_cigar	string	required, nullable	CIGAR string for the V gene alignment.
d_cigar	string	required, nullable	CIGAR string for the first or only D gene alignment.
j_cigar	string	required, nullable	CIGAR string for the J gene alignment.
sequence_id	string	required, identifier, nullable	Unique query sequence identifier for the Rearrangment. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. When downloaded from an AIRR Data Commons repository, this will usually be a universally unique record locator for linking with other objects in the AIRR Data Model.
sequence_aa	string	optional, nullable	Amino acid translation of the query nucleotide sequence.
vj_in_frame	boolean	“T” or “F”, optional, nullable	True if the V and J gene alignments are in-frame.
stop_codon	boolean	“T” or “F”, optional, nullable	True if the aligned sequence contains a stop codon.
complete_vdj	boolean	“T” or “F”, optional, nullable	True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment.
locus	string	optional, nullable	Gene locus (chain type). Note that this field uses a controlled vocabulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature.
d2_call	string	optional, nullable	Second D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).
c_call	string	optional, nullable	Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB).
sequence_alignment_aa	string	optional, nullable	Amino acid translation of the aligned query sequence.
germline_alignment_aa	string	optional, nullable	Amino acid translation of the assembled germline sequence.
np1	string	optional, nullable	Nucleotide sequence of the combined N/P region between the V gene and first D gene alignment or between the V gene and J gene alignments.
np1_aa	string	optional, nullable	Amino acid translation of the np1 field.
np2	string	optional, nullable	Nucleotide sequence of the combined N/P region between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
np2_aa	string	optional, nullable	Amino acid translation of the np2 field.
np3	string	optional, nullable	Nucleotide sequence of the combined N/P region between the second D gene and J gene alignments.
np3_aa	string	optional, nullable	Amino acid translation of the np3 field.
cdr1	string	optional, nullable	Nucleotide sequence of the aligned CDR1 region.
cdr1_aa	string	optional, nullable	Amino acid translation of the cdr1 field.
cdr2	string	optional, nullable	Nucleotide sequence of the aligned CDR2 region.
cdr2_aa	string	optional, nullable	Amino acid translation of the cdr2 field.
cdr3	string	optional, nullable	Nucleotide sequence of the aligned CDR3 region.
cdr3_aa	string	optional, nullable	Amino acid translation of the cdr3 field.
fwr1	string	optional, nullable	Nucleotide sequence of the aligned FWR1 region.
fwr1_aa	string	optional, nullable	Amino acid translation of the fwr1 field.
fwr2	string	optional, nullable	Nucleotide sequence of the aligned FWR2 region.
fwr2_aa	string	optional, nullable	Amino acid translation of the fwr2 field.
fwr3	string	optional, nullable	Nucleotide sequence of the aligned FWR3 region.
fwr3_aa	string	optional, nullable	Amino acid translation of the fwr3 field.
fwr4	string	optional, nullable	Nucleotide sequence of the aligned FWR4 region.
fwr4_aa	string	optional, nullable	Amino acid translation of the fwr4 field.
v_score	number	optional, nullable	Alignment score for the V gene.
v_identity	number	optional, nullable	Fractional identity for the V gene alignment.
v_support	number	optional, nullable	V gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the V gene assignment as defined by the alignment tool.
d_score	number	optional, nullable	Alignment score for the first or only D gene alignment.
d_identity	number	optional, nullable	Fractional identity for the first or only D gene alignment.
d_support	number	optional, nullable	D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the first or only D gene as defined by the alignment tool.
d2_score	number	optional, nullable	Alignment score for the second D gene alignment.
d2_identity	number	optional, nullable	Fractional identity for the second D gene alignment.
d2_support	number	optional, nullable	D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the second D gene as defined by the alignment tool.
d2_cigar	string	optional, nullable	CIGAR string for the second D gene alignment.
j_score	number	optional, nullable	Alignment score for the J gene alignment.
j_identity	number	optional, nullable	Fractional identity for the J gene alignment.
j_support	number	optional, nullable	J gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the J gene assignment as defined by the alignment tool.
c_score	number	optional, nullable	Alignment score for the C gene alignment.
c_identity	number	optional, nullable	Fractional identity for the C gene alignment.
c_support	number	optional, nullable	C gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the C gene assignment as defined by the alignment tool.
c_cigar	string	optional, nullable	CIGAR string for the C gene alignment.
v_sequence_start	integer	optional, nullable	Start position of the V gene in the query sequence (1-based closed interval).
v_sequence_end	integer	optional, nullable	End position of the V gene in the query sequence (1-based closed interval).
v_germline_start	integer	optional, nullable	Alignment start position in the V gene reference sequence (1-based closed interval).
v_germline_end	integer	optional, nullable	Alignment end position in the V gene reference sequence (1-based closed interval).
v_alignment_start	integer	optional, nullable	Start position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
v_alignment_end	integer	optional, nullable	End position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
d_sequence_start	integer	optional, nullable	Start position of the first or only D gene in the query sequence. (1-based closed interval).
d_sequence_end	integer	optional, nullable	End position of the first or only D gene in the query sequence. (1-based closed interval).
d_germline_start	integer	optional, nullable	Alignment start position in the D gene reference sequence for the first or only D gene (1-based closed interval).
d_germline_end	integer	optional, nullable	Alignment end position in the D gene reference sequence for the first or only D gene (1-based closed interval).
d_alignment_start	integer	optional, nullable	Start position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
d_alignment_end	integer	optional, nullable	End position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
d2_sequence_start	integer	optional, nullable	Start position of the second D gene in the query sequence (1-based closed interval).
d2_sequence_end	integer	optional, nullable	End position of the second D gene in the query sequence (1-based closed interval).
d2_germline_start	integer	optional, nullable	Alignment start position in the second D gene reference sequence (1-based closed interval).
d2_germline_end	integer	optional, nullable	Alignment end position in the second D gene reference sequence (1-based closed interval).
d2_alignment_start	integer	optional, nullable	Start position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
d2_alignment_end	integer	optional, nullable	End position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
j_sequence_start	integer	optional, nullable	Start position of the J gene in the query sequence (1-based closed interval).
j_sequence_end	integer	optional, nullable	End position of the J gene in the query sequence (1-based closed interval).
j_germline_start	integer	optional, nullable	Alignment start position in the J gene reference sequence (1-based closed interval).
j_germline_end	integer	optional, nullable	Alignment end position in the J gene reference sequence (1-based closed interval).
j_alignment_start	integer	optional, nullable	Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
j_alignment_end	integer	optional, nullable	End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
cdr1_start	integer	optional, nullable	CDR1 start position in the query sequence (1-based closed interval).
cdr1_end	integer	optional, nullable	CDR1 end position in the query sequence (1-based closed interval).
cdr2_start	integer	optional, nullable	CDR2 start position in the query sequence (1-based closed interval).
cdr2_end	integer	optional, nullable	CDR2 end position in the query sequence (1-based closed interval).
cdr3_start	integer	optional, nullable	CDR3 start position in the query sequence (1-based closed interval).
cdr3_end	integer	optional, nullable	CDR3 end position in the query sequence (1-based closed interval).
fwr1_start	integer	optional, nullable	FWR1 start position in the query sequence (1-based closed interval).
fwr1_end	integer	optional, nullable	FWR1 end position in the query sequence (1-based closed interval).
fwr2_start	integer	optional, nullable	FWR2 start position in the query sequence (1-based closed interval).
fwr2_end	integer	optional, nullable	FWR2 end position in the query sequence (1-based closed interval).
fwr3_start	integer	optional, nullable	FWR3 start position in the query sequence (1-based closed interval).
fwr3_end	integer	optional, nullable	FWR3 end position in the query sequence (1-based closed interval).
fwr4_start	integer	optional, nullable	FWR4 start position in the query sequence (1-based closed interval).
fwr4_end	integer	optional, nullable	FWR4 end position in the query sequence (1-based closed interval).
v_sequence_alignment	string	optional, nullable	Aligned portion of query sequence assigned to the V gene, including any indel corrections or numbering spacers.
v_sequence_alignment_aa	string	optional, nullable	Amino acid translation of the v_sequence_alignment field.
d_sequence_alignment	string	optional, nullable	Aligned portion of query sequence assigned to the first or only D gene, including any indel corrections or numbering spacers.
d_sequence_alignment_aa	string	optional, nullable	Amino acid translation of the d_sequence_alignment field.
d2_sequence_alignment	string	optional, nullable	Aligned portion of query sequence assigned to the second D gene, including any indel corrections or numbering spacers.
d2_sequence_alignment_aa	string	optional, nullable	Amino acid translation of the d2_sequence_alignment field.
j_sequence_alignment	string	optional, nullable	Aligned portion of query sequence assigned to the J gene, including any indel corrections or numbering spacers.
j_sequence_alignment_aa	string	optional, nullable	Amino acid translation of the j_sequence_alignment field.
c_sequence_alignment	string	optional, nullable	Aligned portion of query sequence assigned to the constant region, including any indel corrections or numbering spacers.
c_sequence_alignment_aa	string	optional, nullable	Amino acid translation of the c_sequence_alignment field.
v_germline_alignment	string	optional, nullable	Aligned V gene germline sequence spanning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any).
v_germline_alignment_aa	string	optional, nullable	Amino acid translation of the v_germline_alignment field.
d_germline_alignment	string	optional, nullable	Aligned D gene germline sequence spanning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any).
d_germline_alignment_aa	string	optional, nullable	Amino acid translation of the d_germline_alignment field.
d2_germline_alignment	string	optional, nullable	Aligned D gene germline sequence spanning the same region as the d2_sequence_alignment field and including the same set of corrections and spacers (if any).
d2_germline_alignment_aa	string	optional, nullable	Amino acid translation of the d2_germline_alignment field.
j_germline_alignment	string	optional, nullable	Aligned J gene germline sequence spanning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any).
j_germline_alignment_aa	string	optional, nullable	Amino acid translation of the j_germline_alignment field.
c_germline_alignment	string	optional, nullable	Aligned constant region germline sequence spanning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any).
c_germline_alignment_aa	string	optional, nullable	Amino acid translation of the c_germline_aligment field.
junction_length	integer	optional, nullable	Number of nucleotides in the junction sequence.
junction_aa_length	integer	optional, nullable	Number of amino acids in the junction sequence.
np1_length	integer	optional, nullable	Number of nucleotides between the V gene and first D gene alignments or between the V gene and J gene alignments.
np2_length	integer	optional, nullable	Number of nucleotides between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
np3_length	integer	optional, nullable	Number of nucleotides between the second D gene and J gene alignments.
n1_length	integer	optional, nullable	Number of untemplated nucleotides 5’ of the first or only D gene alignment.
n2_length	integer	optional, nullable	Number of untemplated nucleotides 3’ of the first or only D gene alignment.
n3_length	integer	optional, nullable	Number of untemplated nucleotides 3’ of the second D gene alignment.
p3v_length	integer	optional, nullable	Number of palindromic nucleotides 3’ of the V gene alignment.
p5d_length	integer	optional, nullable	Number of palindromic nucleotides 5’ of the first or only D gene alignment.
p3d_length	integer	optional, nullable	Number of palindromic nucleotides 3’ of the first or only D gene alignment.
p5d2_length	integer	optional, nullable	Number of palindromic nucleotides 5’ of the second D gene alignment.
p3d2_length	integer	optional, nullable	Number of palindromic nucleotides 3’ of the second D gene alignment.
p5j_length	integer	optional, nullable	Number of palindromic nucleotides 5’ of the J gene alignment.
consensus_count	integer	optional, nullable	Number of reads contributing to the (UMI) consensus for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence.
duplicate_count	integer	optional, nullable	Copy number or number of duplicate observations for the query sequence. For example, the number of UMIs sharing an identical sequence or the number of identical observations of this sequence absent UMIs.
cell_id	string	optional, identifier, nullable	Identifier defining the cell of origin for the query sequence.
clone_id	string	optional, identifier, nullable	Clonal cluster assignment for the query sequence.
repertoire_id	string	optional, identifier, nullable	Identifier to the associated repertoire in study metadata.
sample_processing_id	string	optional, identifier, nullable	Identifier to the sample processing object in the repertoire metadata for this rearrangement. If the repertoire has a single sample then this field may be empty or missing. If the repertoire has multiple samples then this field may be empty or missing if the sample cannot be differentiated or the relationship is not maintained by the data processing.
data_processing_id	string	optional, identifier, nullable	Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed.
rearrangement_id	string	DEPRECATED	Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications.
rearrangement_set_id	string	DEPRECATED	Identifier for grouping Rearrangement objects.
germline_database	string	DEPRECATED	Source of germline V(D)J genes with version number or date accessed.