Tutorial 2: NGS Pipeline with Automated Top Lead Selection (PacBio), In-Vitro Library


See the Background section from Tutorial 1.

The goal of this tutorial is to utilize the automated selection pipeline for lead selection straight straight from the FASTQ input.

STEP 1 - Log in to Orion, Set Up a Directory, and Locate Tutorial Files

  1. Follow Step 1 from Tutorial 1, #1 and #2.

  2. Create a general tutorial directory and tutorial 2 subdirectory under Project Directory / Tutorials / TUTORIAL_2. (This is your base directory and should be used for all outputs for this tutorial.)

STEP 2 - Select the NGS Pipeline with Automated Lead Selection Floe

Migrate to the Appropriate Floe

Figure 1. Select the appropriate floe.

  1. Select the Floe page from the blue navigation bar.

  2. Click the Floes Tab.

  3. Under Packages, choose the OpenEye Specifica AbXtract Module.

  4. Select the NGS Pipeline with Automated Top Lead Selection Floe.

STEP 3 - Prepare PacBio Run and Start Job

PacBio Input Files

Figure 2. Job Form showing the PacBio inputs.

Input the following files as shown in Figure 2.

  1. Load the FASTQ file: pacbio_small_codon_optimized.fastq.

  2. Load barcode XLSX file: barcode_file_abxtract.xlsx.

  3. Load alignment TXT file: scaffold_ref_db_codon_optimized_dna.txt.

Important Note

All remaining parameters can be kept as the default values. Scroll through remaining Promoted Parameters to review the options.

Automated Lead Selection Parameters

Figure 3. NGS Key Selection Parameters.

Description of the NGS Key Selection Parameters

Maximum Number Of Full-Length Sequences: The maximum number of full-length, nonredundant sequences to output.
Maximum Number Sequences per Cluster: The maximum number of unique full-length sequences per given cluster.
Maximum Number Of Clusters Preferred: The maximum number of clusters that we want to select from.
Metrics For Ranking: Metrics that determine how the sequences will be sorted in output.
Attempt To Fill The Desired Number Of Full-Length Sequences Quota: Attempts to fulfill the total number of sequences (from Maximum Number Of Full-Length Sequences) by selecting additional full-length sequences from the same clusters followed by selecting the remaining top-ranked clones from different clusters.
  1. Click the “Start Job” button.

STEP 4 - Open the Floe Report to Get a Detailed Understanding of the Selected Population

  1. Under the Jobs Tab, find the NGS Pipeline with Automated Top Lead Selection.

  2. In the Job Panel, you can find job details, a results drop-down (connecting to datasets and collections), and a report drop-down (Floe Report). These are shown in Figure 4.

List of Datasets, Floe Reports and Files

Figure 4. Job Panel.

  1. Click on the NGS Downstream-picked.long_read Report file under the Reports section to see the Floe Report. Alternatively, you can click on the Floe Panel to the right.

  2. The General Stats table provides a snapshot of the number of nonredundant full-length LCDR3, HCDR3, LCDR3 and HCDR3 sequences. Chain_1 represents the variable light chain (VL) and chain_2 represents the variable heavy chain (VH). The overlap shows the overlap of the region of interest (HCDR3) across the different populations.

General Stats in Floe Report

Figure 5. General stats in the Floe Report.

  1. Each subpanel separates out the ‘barcode_group’ and shows the population-specific (e.g., trimer, S1, RBD) statistics.

  2. Return to Job Panel (see Figure 4).

  3. Under the Results drop-down, find the dataset called ‘picked.consolidated’ and click “Show in Project Data” to go directly to the Data page.

  4. Make the dataset active and open it in the Analyze page (see Figure 5 in Tutorial 1).

STEP 5 - Select Populations with Net Negative Charge of the CDRs

  1. Select ‘cluster’ on the x-axis (there should be 40 clusters since we selected this option from the Maximum Number Of Clusters Preferred parameter).

  2. On the y-axis, select ‘merged_cdrs_1_2_charge,’ which looks at the net charge (pH 7.0) of all the CDRs.

  3. Select only records below zero net charge across all cluster groups. On the Active Data Bar at the top of the page, choose Send to Workfloe from the ‘Selected’ drop-down.

Select Cluster Populations of Net Negative Charge

Figure 6. How to select populations with a net negative charge.

Important Note

It is often easier to rank numeric parameters in the spreadsheet by clicking the “Menu” icon in a column and choosing the Rank Descending option.

STEP 6 - Subset the Output Fields

  1. On the Floe page, find the Subset the Number of Fields for Export Floe under the AbXtract package. Click the “Launch Floe” button.

  2. Keep these fields for the following parameters (for more details, see key fields reference).

    A) Identifier Fields to Keep: seq_id
    B) Sequence Fields to Keep: sequence_aa_1, sequence_aa_2, match_name_1, match_name_2, cdr3_aa_2
    C) Cluster Fields to Keep: cluster
  3. Click the “Start Job” button.

STEP 7 - Download the CSV

  1. On the Floe page, choose the Jobs Tab and locate the Job Name for the Subset the Number of Fields for Export Floe.

  2. Select the checkbox titled ‘Show non-dataset files’ and download the file.

Select Cluster Populations of Net Negative Charge

Figure 7. Job Panel.