Select X Number of Non-Redundant Sequences by Cluster or ROI
Takes in an AbXtract dataset and write a dataset of user-defined number of leads per cluster or ROI rank ordered by user preference.
Main Parameters
Parameter Name |
---|
Lead Selection Report Name |
Eliminate Sequences That Fall Below Thresholds |
Maximum Number of CDR1_1 Liabilities |
Maximum Number of CDR1_2 Liabilities |
Maximum Number of CDR2_1 Liabilities |
Maximum Number of CDR2_2 Liabilities |
Maximum Number of CDR3_1 Liabilities |
Maximum Number of CDR3_2 Liabilities |
Maximum Number of Chain_1 Liabilities Across All CDRs |
Maximum Number of Chain_2 Liabilities Across All CDRs |
Maximum Number of Full-Length HCDR1-3 and LCDR1-3 Liabilities (PacBio Only). |
Maximum Number Sequences Per Cluster |
Minimum Count Required for Full-Length Sequence |
Minimum Percent Required for Full-Length Sequence |
Minimum Log2 Enrichment |
Minimum Count Required for Indicated Region of Interest (ROI) |
Minimum Percent Required for Indicated Region of Interest (ROI) |
Maximum Number of Full-Length Sequences |
NGS Picking Strategy |
Metrics for Ranking |
Attempt to Fill the Desired Number of Full-Length Sequences Quota |
Rank Sanger Clones First in Population |
Write Report Summarizing Ranking Stats of Top Clones |
Write the Automated Top Clones Output to CSV File |
Parameter Details
Calculation Parameters
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Lead Selection Report Name (data_out) type: dataset_out: Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Eliminate Sequences That Fall Below Thresholds (eliminate_thresholds) type: boolean: If turned ON, will remove sequences that are below or above given threshold. NOTE must be turned ON to remove sequences. If set too high may result in sequences below quota.Default: False GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Maximum Number of CDR1_1 Liabilities (max_cdr1_1_liabilities) type: integer: Sets max liability allowed in downstream cdr1_1. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of CDR1_2 Liabilities (max_cdr1_2_liabilities) type: integer: Sets max liability allowed in downstream cdr1_2, ignored if using illumina. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of CDR2_1 Liabilities (max_cdr2_1_liabilities) type: integer: Sets max liability allowed in downstream cdr2_1. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of CDR2_2 Liabilities (max_cdr2_2_liabilities) type: integer: Sets max liability allowed in downstream cdr2_2, ignored if using illumina. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of CDR3_1 Liabilities (max_cdr3_1_liabilities) type: integer: Sets max liability allowed in downstream cdr3_1. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of CDR3_2 Liabilities (max_cdr3_2_liabilities) type: integer: Sets max liability allowed in downstream cdr3_2, ignored if using illumina. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of Chain_1 Liabilities Across All CDRs (max_chain_1_liabilities) type: integer: Sets max liability allowed in downstream cdrs in chain_1. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of Chain_2 Liabilities Across All CDRs (max_chain_2_liabilities) type: integer: Sets max liability allowed in downstream cdrs in chain_2, ignored if using illumina. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number of Full-Length HCDR1-3 and LCDR1-3 Liabilities (PacBio Only). (max_fl_liabilities) type: integer: Sets max liability allowed hcdr1-3 and lcdr1-3, ignored if using illumina. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 10 , Max: 100 Maximum Number Sequences Per Cluster (max_seq_per_cluster) type: integer: Indicate the maximum number of unique full-length sequences per given cluster. This value may be exceeded if the ‘Write the Automated Top Clones Output to CSV File’ is turned ONDefault: 10 , Min: 1, Max: 10000 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Minimum Count Required for Full-Length Sequence (minimum_fl_count) type: integer: Sets minimum count allowed for full-length (illumina: vh or vl), (pacbio: vl+vh). IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 1 , Min: 1 Minimum Percent Required for Full-Length Sequence (minimum_fl_percent) type: decimal: Sets minimum percent allowed for full-length (illumina: vh or vl), (pacbio: vl+vh). IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 0.0001 , Min: 1e-18, Max: 100.0 Minimum Log2 Enrichment (minimum_log2_enrichment) type: decimal: Sets minimum allowed log2 enrichment value. Only use if log2_enrichment compared a distinct ‘early’ to ‘late’ round population. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: -10 , Min: -1000000, Max: 1000000 Minimum Count Required for Indicated Region of Interest (ROI) (minimum_roi_count) type: integer: Sets minimum count allowed for the region of interest (e.g. HCDR3+LCDR3), based on how the dataset was processed. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 2 , Min: 1 Minimum Percent Required for Indicated Region of Interest (ROI) (minimum_roi_percent) type: decimal: Sets minimum percent allowed for the region of interest (e.g. HCDR3+LCDR3), based on how the dataset was processed. IMPORTANT: must also turn ON ‘Eliminate Sequences That Fall Below Thresholds’ to remove sequence from population.Default: 0.001 , Min: 1e-10, Max: 100.0 Maximum Number of Clusters Preferred? (number_of_clusters_to_select_from) type: integer: Indicate the maximum number of clusters that you want to choose from. This value may be exceeded if the ‘Write the Automated Top Clones Output to CSV File’ is turned ONDefault: 40 , Min: 1, Max: 1000000 Maximum Number of Full-Length Sequences (number_of_sequences_total) type: integer: Indicate the maximum number of full-length, non-redundant sequences. This value depends on the total number of non-redundant sequences, maximum # of clusters and the maximum # of sequences per cluster. If the total number is below desired, try adjusting max # of clusters or max number of sequences per cluster. Alternatively, if you want to fill quota with additional sequences per cluster turn on the ‘Attempt to Fulfill the Desired Number of Sequences Quota’ parameterDefault: 100 , Min: 1, Max: 1000000 NGS Picking Strategy (picking_strategy) type: string: Indicate the type of picking strategy.Default: UniformChoices: Uniform, Most Abundant Metrics for Ranking (predict_choices) type: string: Place metrics in order of ranking (if nothing, ranks by full-length count)Default: [‘ROI Percent, Final Round Only’, ‘Full Length (Corrects for Illumina or PacBio), Percent’, ‘Liabilities Both Chains’, ‘Liabilities CDR3_2’]Choices: Full Length (Corrects for Illumina or PacBio), Count, Full Length (Corrects for Illumina or PacBio), Percent, ROI Count, Final Round Only, ROI Percent, Final Round Only, ROI Fold Enrichment, Final Round Only, ROI Log2 Enrichment, Final Round Only, Liabilities Both Chains, Liabilities Chain_2, Liabilities Chain_1, Liabilities CDR1_1, Liabilities CDR2_1, Liabilities CDR3_1, Liabilities CDR1_2, Liabilities CDR2_2, Liabilities CDR3_2, ROI Count, Early Round Only, ROI Percent, Early Round Only, Cluster Count (e.g. unique sequences per cluster), Cluster Percent (e.g. unique rep per cluster) Attempt to Fill the Desired Number of Full-Length Sequences Quota (quota_attempt) type: boolean: This will attempt to fulfill the total number of sequences quota if goal of desired number of full-length sequences is not reached by 1) selecting additional clones across lower in rank aross designated clusters followed by 2) selecting the remaining top ranked clones from different clusters by prioritizing top clones. NOTE: if turned on, this is likely to result in a greater number of sequences per cluster and if quota still not met, more clusters then designated in total to reach the desired number of full-length sequences goalDefault: True Rank Sanger Clones First in Population (rank_sanger) type: boolean: Sanger clones, if present, will be ranked first.Default: True Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required Write Report Summarizing Ranking Stats of Top Clones (write_report) type: boolean: Allows the option the write a report that provides stats for every selected clone associated with the selected ranking criteria, relative to the cluster and population as a whole.Default: True Write the Automated Top Clones Output to CSV File (write_to_csv_file) type: boolean: Allows the option to write to CSV after the AbXtract Processing, NGS Only file at the cost of additional time. If not, can do this in separate step. Writes to empty file if turned off.Default: True
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network