Automated Top Lead Selection - AbXtract¶

Automatically select top X number of sequences across defined cluster space using rank ordered, user-specified, metrics.

Promoted Parameters

Illumina/PacBio AbXtract Datasets (data_source) : The DOWNSTREAM dataset(s) to read records from, must pass through any of the NGS (PacBio/Illumina) Pipeline or the AbScan FLOE.

Maximum Number of Full-Length Sequences (integer) : Indicate the maximum number of full-length, non-redundant sequences. This value also depends on the total number of non-redundant sequences, maximum # of clusters and the maximum # of sequences per cluster. If the total number is below desired, try adjusting max # of clusters or max number of sequences per cluster. Alternatively, if you want to fill quota with additional sequences per cluster turn on the ‘Attempt to Fulfill the Desired Number of Sequences Quota’ parameter.

Default: 100 Min: 1 Max: 1000000

Maximum Number Sequences Per Cluster (integer) : Indicate the maximum number of unique full-length sequences per given cluster (NOTE: if clustering not performed, will select full-length sequences that share common region of interest (ROI)).

Default: 10 Min: 1 Max: 10000

Maximum Number of Clusters Preferred (integer) : Indicate the maximum number of unique full-length sequences per given cluster (NOTE: if clustering not performed, will select full-length sequences that share common region of interest (ROI)).

Default: 40 Min: 1 Max: 1000000

Metrics for Ranking (string) : Place metrics in order of ranking (if nothing, ranks by full-length count)

Default: [‘ROI Percent, Final Round Only’, ‘Full Length (Corrects for Illumina or PacBio), Percent’, ‘Liabilities Both Chains’, ‘Liabilities CDR3_2’]

Choices: Full Length (Corrects for Illumina or PacBio), Count, Full Length (Corrects for Illumina or PacBio), Percent, ROI Count, Final Round Only, ROI Percent, Final Round Only, ROI Fold Enrichment, Final Round Only, ROI Log2 Enrichment, Final Round Only, Liabilities Both Chains, Liabilities Chain_2, Liabilities Chain_1, Liabilities CDR1_1, Liabilities CDR2_1, Liabilities CDR3_1, Liabilities CDR1_2, Liabilities CDR2_2, Liabilities CDR3_2, ROI Count, Early Round Only, ROI Percent, Early Round Only, Cluster Count (e.g. unique sequences per cluster), Cluster Percent (e.g. unique rep per cluster)

Rank Sanger Clones First in Population (boolean) : Sanger clones, if present, will be ranked first.

Default: True

Attempt to Fulfill the Desired Number of Sequences Quota (boolean) : This will attempt to fulfill the total number of sequences quota if goal is not reached by selecting across clusters by prioritizing top clones. NOTE: this is likely to result in a greater number of sequences per cluster.

Default: False

Output Name of the Picked Consolidated Dataset (dataset_out) : This dataset will contain a consolidated dataset where all sample names and barcode groups belong to same dataset.

Default: picked.consolidated

Output Basename of the PacBio or Illumina Selected Population (dataset_out) : This dataset contains picked clones, separated by group.

Default: picked.long/short_read

Hidden Parameters

NGS Picking Strategy (string) : Indicate the type of picking strategy.

Default: Uniform

Choices: Uniform, Most Abundant
Eliminate Sequences That Fall Below Thresholds (boolean) : If True, will remove sequences that are below or above given threshold, otherwise will mark sequences under ‘warnings’ field.

Default: False
Maximum Number of CDR1_1 Liabilities (integer) : Sets max liability allowed in downstream cdr1_1.

Default: 10 Max: 100
Maximum Number of CDR1_2 Liabilities (integer) : Sets max liability allowed in downstream cdr1_2, ignored if using illumina.

Default: 10 Max: 100
Maximum Number of CDR2_1 Liabilities (integer) : Sets max liability allowed in downstream cdr2_1.

Default: 10 Max: 100
Maximum Number of CDR2_2 Liabilities (integer) : Sets max liability allowed in downstream cdr2_2, ignored if using illumina.

Default: 10 Max: 100
Maximum Number of CDR3_1 Liabilities (integer) : Sets max liability allowed in downstream cdr3_1.

Default: 10 Max: 100
Maximum Number of CDR3_2 Liabilities (integer) : Sets max liability allowed in downstream cdr3_2, ignored if using illumina.

Default: 10 Max: 100
Maximum Number of Chain_1 Liabilities Across All CDRs (integer) : Sets max liability allowed in downstream cdrs in chain_1.

Default: 10 Max: 100
Maximum Number of Chain_2 Liabilities Across All CDRs (integer) : Sets max liability allowed in downstream cdrs in chain_2, ignored if using illumina.

Default: 10 Max: 100
Maximum Number of Full-Length HCDR1-3 and LCDR1-3 Liabilities (PacBio Only). (integer) : Sets max liability allowed hcdr1-3 and lcdr1-3, ignored if using illumina.

Default: 10 Max: 100
Minimum Count Required for Full-Length Sequence (integer) : Sets minimum count allowed for full-length (illumina: vh or vl), (PacBio: vl+vh).

Default: 1 Min: 1
Minimum Percent Required for Full-Length Sequence (decimal) : Sets minimum percent allowed for full-length (illumina: vh or vl), (PacBio: vl+vh).

Default: 0.0001 Min: 1e-18 Max: 100.0
Minimum Log2 Enrichment (decimal) : Sets minimum allowed log2 enrichment value, ignored if early vs late enrichment not performed.

Default: -10 Min: -1000000 Max: 1000000
Minimum Count Required for Indicated Region of Interest (ROI) (integer) : Sets minimum count allowed for the region of interest (e.g. HCDR3+LCDR3), based on how the dataset was processed upstream.

Default: 2 Min: 1
Minimum Percent Required for Indicated Region of Interest (ROI) (decimal) : Sets minimum percent allowed for the region of interest (e.g. HCDR3+LCDR3), based on how the dataset was processed upstream.

Default: 0.001 Min: 1e-10 Max: 100.0
Write Records to Dataset (boolean) : Write out records to a dataset.

Default: True
Write Group to Their Own Dataset After Processing (boolean) : Write group (if provided) to their own dataset after processing, Note: if only a single group then no separate dataset will be written.

Default: False
Write Report (boolean) : Write out a floe report after consolidation.

Default: False
ROI for sequence logo (choose Chain1 CDR3 if short-read/single-chain data) (string) : Name of regions to be aligned for sequence logo. Logo is output only if not more than 500 records.

Default: CDR3 Chain_2 (Downstream Chain)

Choices: CDR3 Chain_1 (Upstream Chain), CDR3 Chain_2 (Downstream Chain), HCDR3 and LCDR3
Split by cluster? Only applies to downstream records for Weblogo (boolean) : Indicates whether to split sequences by cluster before creating sequence logos. Cluster logos are output only if not more than 500 records.

Default: False