Protein Sequence to AI Folded Structure Prediction

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

Product-based/SPRUCE

Role-based/Computational Chemist

Role-based/Structural Biologist

Role-based/Bioinformatician

Solution-based/Target Identification

Solution-based/Target Identification/Target Preparation

Solution-based/Hit to Lead/Target Preparation

Task-based/Target Prep & Analysis/Protein Preparation

Description

Protein sequence(s) are used as an input to predict protein structures using AI folding models. This floe supports the OmegaFold model for structure prediction.

OmegaFold is a third-party sequence-to-structure protein folding method that uses a Large Language Model (LLM) to predict protein structure without the use of Multiple Sequence Alignments (MSA). This floe and its defaults are based on the standard folding practices that are outlined by OmegaFold.

Limitations: OmegaFold does not currently support predictions of multiple sequences, also known as multimers. If a multimeric sequence is identified in the input, the sequence will be skipped.

Longer sequences can be computationally demanding. If you want to run longer sequences, it is common practice to split the sequence with around 200 residues of overlap and do multiple sequence runs.

You can read more background information about OmegaFold .

Related Floes: SPRUCE - Protein Preparation, DU to PDB

Computational Cost Scaling For the most optimized performance, it is better to batch all sequences at once than it would be to run many small jobs.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Sequence to run AI Folding (sequence): Sequence(s) to run structure prediction. If providing a sequence that contains a multimer, delineate chains with ‘:’ punctuation. To perform independent runs of multiple sequences, delineate different sequences with a ‘^’ and ensure sequences respectively match their counterpart in the ‘Sequence Title’ parameter. OmegaFold does not support multimers. If a multimer sequence is detected, the sequence will be skipped.

Required

Type: string

Sequence Title (titles): Title(s) to the independent runs from the ‘Sequence to run AI Folding’ parameter. Delineate each run with a ‘^’ and ensure titles match their counterpart in the ‘Sequence to run AI Folding’ parameter. If a mismatch is observed, a distinct title will be generated in the format: ‘Sequence_1^Sequence_2^…’

Type: string

Parameter File(s) on Orion (params): Select the parameter weights to be used for the AIFold. Selecting a large number of parameter weights files will increase the required disk space for the cube. Selection are from the default set of parameters. Note that OmegaFold_model2 is much more heavy-weight, and you will likely need to increase GPU memory requirements.

Type: string

Default: [‘OmegaFold_model1.pt’]

Choices: [‘OmegaFold_model1.pt’, ‘OmegaFold_model2.pt’]

Outputs

OmegaFold Results (out): Output dataset to which to write.

Required

Type: dataset_out

Default: OmegaFold_Predictions

OmegaFold Failures (fout): Output dataset to which to write.

Required

Type: dataset_out

Default: OmegaFold_Failures

Pocket Finding Parameters

Use OEPocket Finding (run_oepocket): Option to use OEPocket to generate design units.

Required

Type: boolean

Default: True

Choices: [True, False]

Use F-Pocket Finding (run_fpocket): Option to use f-pocket to generate design units.

Required

Type: boolean

Default: True

Choices: [True, False]

Save Biological Units (output_bio_designunits): If no pocket finding method is used or no valid pockets are found, option to save the structure as a biological unit.

Required

Type: boolean

Default: True

Choices: [True, False]