How-To Guides for MSA Search Floes
What is the best input MSA collection to use?
Choosing the correct MSA collection can affect the accuracy of downstream applications. These collections are based on publicly available datasets of sequences. The specifics of each will not be discussed here, but these databases primarily vary in how sequences were sourced and how the sequences are curated. In general, a larger collection will result in increased prediction accuracy but comes at a higher computational cost.
The AI structure prediction floes do not require an MSA search to make a prediction, though it is recommended. UniRef90 provides a good balance of sequence diversity and low search costs, whereas BFD provides a higher level of evolutionary diversity that might be important. For example, the Protein Sequence to AI Folded Structure Ligand Affinities Floe benefits from a more diverse MSA search to more accurately rank ligand affinities.
Does Orion MSA search use any public or external MSA servers?
No. MSA search and Boltz predictions are fully contained within Orion. Input sequences do not leave Orion, and databases used for searches are contained inside Orion and stored as collections. MSA generation is implemented with MMseqs2 function calls optimized for Orion’s distributed architecture.
How can I use a custom sequence database for a search?
Orion currently has a handful of sequence databases that have been converted into collections to be used in the MSA Align and Search Floe. This conversion is necessary to optimize search for a cloud environment and improve searching speed. Users can provide their own sequence databases to run a search in the cloud and run the MSA Collection Setup from FASTA Floe:
Convert the custom database into FASTA format (e.g., mmseqs convert2fasta <input_db> <output_fasta>).
Upload the custom database to Orion using orionclient: ocli files upload <file_name>.
Follow the prompts in the MSA Collection Setup from FASTA Floe by selecting the file that was uploaded and giving the collection a name.
Once the collection creation floe has completed, you can use the output collection from the MSA Align and Search Floe as the Input MSA Collection parameter.