Import Antibody FASTA Files

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/AbXtract

  • Role-based/Computational Chemist

  • Role-based/Bioinformatician

  • Solution-based/Virtual-screening/DB Preparation

  • Solution-based/Biologics/Antibody Design

Description

In this floe, input FASTA files of antibody sequences will be put into a dataset with records containing antibody H and L sequences and the antibody name/identifier. Because multiple antibody systems are allowed in a single FASTA file, the sequence titles are used to link the Fv chains. The identifying H and L chain IDs must also be present in the sequence title. See the Input FASTA File parameter for more detail on proper formatting.

Related Floes: Antibody Sequences to 3D Models Floe

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input FASTA File (fasta_files): Input FASTA Files containing sequence information. Multiple protein systems are allowed, and all input sequences must follow appropriate formatting. Sequence titles are used to match multiple sequences into the same protein system and need to conserve a case-sensitive antibody name. This name is the string found after the carrot ‘>’ in the FASTA file (e.g. >Gag-Pol Polyprotein). If titles do not match, then they will be assumed to be unique systems. If comments or other annotations are wanted on the FASTA title, consider using the ‘Separator’ parameter to identify the conserved antibody name in the title. This floe only looks for Fv Antibodies and thus must have both H and L chains in a protein’s system. Any chain IDs other than heavy or light chains will be ignored, and duplicate entries will fail all sequences for that antibody system.

  • Required

  • Type: file_in

Separator (separator): Use this character to delineate antibody name. Anything before this delineator is the conserved antibody name. Anything after this delineator will be ignored. For example, if the delineator isan underscore, ‘>1A14_2|Chain B[auth H]’ will match with ‘>1A14_3|Chain C[auth L]’ because the code ‘1A14’ before the underscore is conserved for both sequences. The protein system will use this conserved value as its title.

  • Type: string

VH Sequence Field (vh_seq): The heavy chain sequences from the FASTA file will save sequence data to this field.

  • Required

  • Type: field_parameter::string

  • Default: VH

VL Sequence Field (vl_seq): The light chain sequences from the FASTA file will save sequence data to this field.

  • Required

  • Type: field_parameter::string

  • Default: VL

Outputs

Output dataset of imported sequences (out): Imported sequences in a dataset ready for sequence 2 model floe

  • Required

  • Type: dataset_out

  • Default: Antibody_Sequences

Failed Sequence Output (failed_out): Any sequences that cannot adhere to the selected sequence numbering scheme will fail.

  • Required

  • Type: dataset_out

  • Default: failed_Antibody_Sequences