MSA Align and Search

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Product-based/SiteHopper

Role-based/Computational Chemist

Role-based/Structural Biologist

Task-based/Target Prep & Analysis/Protein Similarity Search

Solution-based/Small Molecule Lead-opt/3D Similarity

Description

This floe runs a distributed multiple sequence alignment (MSA) search using MMSeq2 and can take one or many sequences as input. For each input sequence, an MSA .a3m file will be generated and saved as a file on Orion.

Potential Input Sources MSA Collection Setup from FASTA

Computation Scaling This floe is optimized for search. The size or number of MSA FASTA collections will not significantly impact run time.

Promoted Parameters

Title in user interface (promoted name)

Inputs

System Name (system_name): Name to be used to identify the input sequence. Examples include PDB codes or UniRef sequence IDs. If multiple systems are detected, this value will be used to prefix those systems.

Required

Type: string

Default: MSA_Sequences

Input Sequence (query_sequence): Sequence title and primary sequence input for structure prediction delineated with a colon. Multiple sequences can be added by using ‘Add more’ input option. The sequence title is important to be a unique identifier that can be used as a reference for other parts of the job form. Example: ‘2MG4_1:MEKRPRTEFSEEQ’

Type: string

Input Sequence FASTA File (query_fasta_file): Input fasta file containing the system for folding. Multiple sequences in this input indicate a multimeric prediction. The sequence title is a unique identifier for input sequences and is indicated inside the fasta file. This title is defined to be between the fasta title (‘>’) and the first pipe delineator (‘|’). For example a fasta title: ‘>2MG4_1|Drosophila melanogaster’ will be automatically assigned the 2MG4_1 sequence title. Any place on this floe job form requiring a sequence title will match to the 2MG4_1 value.

Type: file_in

Input MSA Collection (msa_shards): Collection with FASTA files used for MSA search.

Required

Type: collection_source

Outputs

MSA Search Failures (fout): Output dataset to which to write.

Required

Type: dataset_out

Default: MSA_failures

Output MSA Name (file_out): Name for the MSA file that will be exported.

Required

Type: string

Default: MSA_Result_File

MSA Search Options

Max Sequences Cutoff (seq_num_cutoff): Maximum number of results to be saved in an MSA search result per query sequence. Increasing value will increase sensitivity. A value of 0 will save all sequences from prefilter steps.

Required

Type: integer

Default: 500

MSA Search Sensitivity (msa_search_sensitivity): MSA Search sensitivity for the mmseqs2 sequence search. Default is the default for mmseqs2

Required

Type: decimal

Default: 5.7

Sequence Identity Cutoff (seq_id_cutoff): Reject any search hits with sequence identity score lower than this value. Setting value to 0 will accept all search hits. Ranges for this cutoff are: [0, 0.93].

Type: decimal

Default: 0.0