Protein Sequence to AI Folded Structure Prediction

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/SPRUCE

  • Role-based/Computational Chemist

  • Role-based/Structural Biologist

  • Role-based/Bioinformatician

  • Solution-based/Target Identification

  • Solution-based/Target Identification/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation

  • Task-based/Target Prep & Analysis/Protein Preparation

Description

Protein sequence(s) are used as an input to predict protein structures using AI folding models. This floe supports the OmegaFold model for structure prediction.

OmegaFold is a third-party sequence-to-structure protein folding method that uses a Large Language Model (LLM) to predict protein structure without the use of Multiple Sequence Alignments (MSA). This floe and its defaults are based on the standard folding practices that are outlined by OmegaFold.

Limitations: OmegaFold does not currently support predictions of multiple sequences, also known as multimers. If a multimeric sequence is identified in the input, the sequence will be skipped.

Longer sequences can be computationally demanding. If you want to run longer sequences, it is common practice to split the sequence with around 200 residues of overlap and do multiple sequence runs.

You can read more background information about OmegaFold .

Related Floes: SPRUCE - Protein Preparation, DU to PDB

Computational Cost Scaling For the most optimized performance, it is better to batch all sequences at once than it would be to run many small jobs.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Sequence to run AI Folding (sequence): Sequence(s) to run structure prediction. If providing a sequence that contains a multimer, delineate chains with ‘:’ punctuation. To perform independent runs of multiple sequences, delineate different sequences with a ‘^’ and ensure sequences respectively match their counterpart in the ‘Sequence Title’ parameter. OmegaFold does not support multimers. If a multimer sequence is detected, the sequence will be skipped.

  • Required

  • Type: string

Sequence Title (titles): Title(s) to the independent runs from the ‘Sequence to run AI Folding’ parameter. Delineate each run with a ‘^’ and ensure titles match their counterpart in the ‘Sequence to run AI Folding’ parameter. If a mismatch is observed, a distinct title will be generated in the format: ‘Sequence_1^Sequence_2^…’

  • Type: string

Parameter File(s) on Orion (params): Select the parameter weights to be used for the AIFold. Selecting a large number of parameter weights files will increase the required disk space for the cube. Selection are from the default set of parameters. Note that OmegaFold_model2 is much more heavy-weight, and you will likely need to increase GPU memory requirements.

  • Type: string

  • Default: [‘OmegaFold_model1.pt’]

  • Choices: [‘OmegaFold_model1.pt’, ‘OmegaFold_model2.pt’]

Outputs

OmegaFold Results (out): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: OmegaFold_Predictions

OmegaFold Failures (fout): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: OmegaFold_Failures

Pocket Finding Parameters

Use OEPocket Finding (run_oepocket): Option to use OEPocket to generate design units.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Use F-Pocket Finding (run_fpocket): Option to use f-pocket to generate design units.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Save Biological Units (output_bio_designunits): If no pocket finding method is used or no valid pockets are found, option to save the structure as a biological unit.

  • Required

  • Type: boolean

  • Default: True

  • Choices: [True, False]