Protein Sequence to AI Folded Structure Prediction¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/SPRUCE
Role-based/Computational Chemist
Role-based/Structural Biologist
Role-based/Bioinformatician
Solution-based/Target Identification
Solution-based/Target Identification/Target Preparation
Solution-based/Hit to Lead/Target Preparation
Task-based/Target Prep & Analysis/Protein Preparation
Description
Protein sequence(s) are used as an input to predict protein structures using AI folding models. This floe supports the OmegaFold model for structure prediction.
OmegaFold is a third-party sequence-to-structure protein folding method that uses a Large Language Model (LLM) to predict protein structure without the use of Multiple Sequence Alignments (MSA). This floe and its defaults are based on the standard folding practices that are outlined by OmegaFold.
Limitations: OmegaFold does not currently support predictions of multiple sequences, also known as multimers. If a multimeric sequence is identified in the input, the sequence will be skipped.
Longer sequences can be computationally demanding. If you want to run longer sequences, it is common practice to split the sequence with around 200 residues of overlap and do multiple sequence runs.
You can read more background information about OmegaFold .
Related Floes: SPRUCE - Protein Preparation, DU to PDB
Computational Cost Scaling For the most optimized performance, it is better to batch all sequences at once than it would be to run many small jobs.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Sequence to run AI Folding (sequence): Sequence(s) to run structure prediction. If providing a sequence that contains a multimer, delineate chains with ‘:’ punctuation. To perform independent runs of multiple sequences, delineate different sequences with a ‘^’ and ensure sequences respectively match their counterpart in the ‘Sequence Title’ parameter. OmegaFold does not support multimers. If a multimer sequence is detected, the sequence will be skipped.
Required
Type: string
Sequence Title (titles): Title(s) to the independent runs from the ‘Sequence to run AI Folding’ parameter. Delineate each run with a ‘^’ and ensure titles match their counterpart in the ‘Sequence to run AI Folding’ parameter. If a mismatch is observed, a distinct title will be generated in the format: ‘Sequence_1^Sequence_2^…’
Type: string
Parameter File(s) on Orion (params): Select the parameter weights to be used for the AIFold. Selecting a large number of parameter weights files will increase the required disk space for the cube. Selection are from the default set of parameters. Note that OmegaFold_model2 is much more heavy-weight, and you will likely need to increase GPU memory requirements.
Type: string
Default: [‘OmegaFold_model1.pt’]
Choices: [‘OmegaFold_model1.pt’, ‘OmegaFold_model2.pt’]
Outputs
OmegaFold Results (out): Output dataset to write to
Required
Type: dataset_out
Default: OmegaFold_Predictions
OmegaFold Failures (fout): Output dataset to write to
Required
Type: dataset_out
Default: OmegaFold_Failures
Pocket Finding Parameters
Use OEPocket Finding (run_oepocket): Option to use OEPocket to generate design units.
Required
Type: boolean
Default: True
Choices: [True, False]
Use F-Pocket Finding (run_fpocket): Option to use f-pocket to generate design units.
Required
Type: boolean
Default: True
Choices: [True, False]
Save Biological Units (output_bio_designunits): If no pocket finding method is used or no valid pockets are found, option to save the structure as a biological unit.
Required
Type: boolean
Default: True
Choices: [True, False]