Tutorial 2: NGS Pipeline with Automated Top Lead Selection (PacBio), In-Vitro Library


See Background from Tutorial 1

The goal of this tutorial is to utilize the automated selection pipeline for lead selection straight straight from the FASTQ input.

STEP 1 - Login to Orion, Set-Up Directory, Locate Tutorial Files

  1. Follow Step 1 from Tutorial 1, #1-2.

  2. Create a general tutorial directory and tutorial 2 subdirectory under PROJECT DIRECTORY / TUTORIALS / TUTORIAL_2 (This is your BASE DIRECTORY and should be used for all outputs for this Tutorial 2 below).

STEP 2 - Select the ‘NGS Pipeline with Automated Lead Selection’ Floe

Migrate to the Appropriate Floe
  1. Select the tab along the left side tab titled ‘Floe’

  2. Click the ‘Floes’ tab

  3. Choose the ‘OpenEye Specifica AbXtract Module’

  4. Select the Floe ‘NGS Pipeline with Automated Top Lead Selection’

STEP 3 - Prepare PacBio Run and Start Job

PacBio Input Files
  1. Load FASTQ file - ‘pacbio_small_codon_optimized.fastq’

  2. Load barcode XLSX file - ‘barcode_file_abxtract.xlsx’

  3. Load alignment TXT file - ‘scaffold_ref_db_codon_optimized_dna.txt’

Important Note

All the remaining parameters can be kept as the default values. Scroll through remaining “Promoted” parameters to understand these in greater detail.

Automated Lead Selection Parameters


Maximum Number Of Full-Length Sequences - The maximum number of full-length, non-redundant sequences to output.
Maximum Number Sequences per Cluster - The maximum number of unique full-length sequences per given cluster.
Maximum Number Of Clusters Preferred - The maximum number of clusters that we want to select from.
Metrics For Ranking - Metrics that determine how the sequences will be sorted in output.
Attempt To Fill The Desired Number Of Full-Length Sequences Quota - Attempts to fulfill the total number of sequences Maximum Number Of Full-Length Sequences by selecting additional full-length sequences from same clusters followed by selecting the remaining top ranked clones from different clusters.
  1. Click ‘Start Job’

STEP 4 - Open the Floe Report to Get a Detailed Understanding of the Selected or ‘Picked’ Population

  1. Under the ‘Jobs’ tab find the ‘NGS Pipeline with Automated Top Lead Select’.

  2. Click the ‘Show non-dataset files’ checkbox. The breakdown of datasets, collections (Floe reports) and files (CSV) are depicted here:

List of Datasets, Floe Reports and Files
  1. Click on the ‘NGS Downstream-picked.long_read Report’ under the ‘Reports’ section to launch in browser. Give it some time to load in browser.

  2. See the ‘General stats’, which provides a snapshot of the number of non-redundant full-length, LCDR3, HCDR3, LCDR3 and HCDR3 sequences. Chain_1 represents the variable light chain (VL) and chain_2 represents the variable heavy chain (VH). The overlap shows the overlap of the region of interest (HCDR3) across the different populations.

General Stats in Floe Report
  1. Each subpanel separates out the ‘barcode_group’ and shows the population specific (e.g., trimer, S1, RBD) statistics.

  2. Return to Job overview of all the datasets, floe reports and files - see above

  3. Find the file called ‘picked.consolidated’ and click ‘Show in Project Data’.

  4. Make the dataset active and open dataset in Analyze tool, see making dataset active.

STEP 5 - Select Populations with Net Negative Charge of the CDRs

  1. Plot the ‘cluster’ on x-axis (there should be 40 clusters since we selected this option from the ‘Maximum Number Of Clusters Preferred’ Option.

  2. On the y-axis select the ‘merged_cdrs_1_2_charge’ option, which looks at the net charge (pH 7.0) of all the CDRs.

  3. Select only records below 0 net charge across all cluster groups then select the ‘Selected Tab’ near the top to choose ‘Send to Workfloe’ option, like this:

Select Cluster Populations of Net Negative Charge

Important Note

It is often easier to rank numeric parameters in the table by clicking the down arrow and click ‘Rank Descending’.

STEP 6 - Subset the Output Fields

  1. Under Search Bar, search for ‘Subset the Number of Fields for Export’ and click ‘View all Workfloe options’

  2. Keep following fields (for more details, see key fields reference):

    A) seq_id
    B) match_name_1
    C) match_name_2
    D) cdr3_aa_2
    E) sequence_aa_1
    F) sequence_aa_2
    G) cluster
  3. Click ‘Start Job’

STEP 7 - Download the CSV

  1. Click ‘Floe’ > ‘Jobs’ and identify the Job Name e.g., ‘Subset the Number of Fields for Export’.

  2. Select the checkbox titled ‘Show non-dataset files’ and download the file.

Select Cluster Populations of Net Negative Charge