MSA Align and Search
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/SiteHopper
Role-based/Computational Chemist
Role-based/Structural Biologist
Task-based/Target Prep & Analysis/Protein Similarity Search
Solution-based/Small Molecule Lead-opt/3D Similarity
Description
This floe runs a distributed multiple sequence alignment (MSA) search using MMSeq2 and can take one or many sequences as input. For each input sequence, an MSA .a3m file will be generated and saved as a file on Orion.
Potential Input Sources MSA Collection Setup from FASTA
Computation Scaling This floe is optimized for search. The size or number of MSA FASTA collections will not significantly impact run time.
Promoted Parameters
Title in user interface (promoted name)
Inputs
System Name (system_name): Name to be used to identify the input sequence. Examples include PDB codes or UniRef sequence IDs. If multiple systems are detected, this value will be used to prefix those systems.
Required
Type: string
Default: MSA_Sequences
Input Sequence (query_sequence): Sequence title and primary sequence input for structure prediction delineated with a colon. Multiple sequences can be added by using ‘Add more’ input option. The sequence title is important to be a unique identifier that can be used as a reference for other parts of the job form. Example: ‘2MG4_1:MEKRPRTEFSEEQ’
Type: string
Input Sequence FASTA File (query_fasta_file): Input fasta file containing the system for folding. Multiple sequences in this input indicate a multimeric prediction. The sequence title is a unique identifier for input sequences and is indicated inside the fasta file. This title is defined to be between the fasta title (‘>’) and the first pipe delineator (‘|’). For example a fasta title: ‘>2MG4_1|Drosophila melanogaster’ will be automatically assigned the 2MG4_1 sequence title. Any place on this floe job form requiring a sequence title will match to the 2MG4_1 value.
Type: file_in
Input MSA Collection (msa_shards): Collection with FASTA files used for MSA search.
Required
Type: collection_source
Outputs
MSA Search Failures (fout): Output dataset to which to write.
Required
Type: dataset_out
Default: MSA_failures
Output MSA Name (file_out): Name for the MSA file that will be exported.
Required
Type: string
Default: MSA_Result_File
MSA Search Options
Max Sequences Cutoff (seq_num_cutoff): Maximum number of results to be saved in an MSA search result per query sequence. Increasing value will increase sensitivity. A value of 0 will save all sequences from prefilter steps.
Required
Type: integer
Default: 500
MSA Search Sensitivity (msa_search_sensitivity): MSA Search sensitivity for the mmseqs2 sequence search. Default is the default for mmseqs2
Required
Type: decimal
Default: 5.7
Sequence Identity Cutoff (seq_id_cutoff): Reject any search hits with sequence identity score lower than this value. Setting value to 0 will accept all search hits. Ranges for this cutoff are: [0, 0.93].
Type: decimal
Default: 0.0