How-To Guides for MSA Search Floes

What is the best input MSA collection to use?

Choosing the correct MSA collection can affect the accuracy of downstream applications. These collections are based on publicly available datasets of sequences. The specifics of each will not be discussed here, but these databases primarily vary in how sequences were sourced and how the sequences are curated. In general, a larger collection will result in increased prediction accuracy but comes at a higher computational cost.

The AI structure prediction floes do not require an MSA search to make a prediction, though it is recommended. UniRef90 provides a good balance of sequence diversity and low search costs, whereas BFD provides a higher level of evolutionary diversity that might be important. For example, the Protein Sequence to AI Folded Structure Ligand Affinities Floe benefits from a more diverse MSA search to more accurately rank ligand affinities.

Does Orion MSA search use any public or external MSA servers?

No. MSA search and Boltz predictions are fully contained within Orion. Input sequences do not leave Orion, and databases used for searches are contained inside Orion and stored as collections. MSA generation is implemented with MMseqs2 function calls optimized for Orion’s distributed architecture.