MMDS 01. Make/Update RCSB PDB Collection

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/SPRUCE

  • Product-based/MMDS

  • Role-based/MMDS Staff User/MMDS Data Prep

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation/Structural Data Preparation

  • Task-based/Target Prep & Analysis/Protein Preparation

Description

MMDS requires an up-to-date collection of protein structure files. The .pdb format is preferred, but will read .mmcif file formats if available. This floe will generate or append (if exists) a collection that contains protein structure files.

Missing structures from the RCSB are downloaded from the RCSB database and added to the collection. If the provided structures are newer a version or new entrant, they are added to the collection.

AlphaFold structures are also saved in PDB format and updated as new versions are released, but these structures can be optionally excluded from the collection.

Limitations: Due to the number of internet calls to the RCSB, the floe has been throttled so that we do not overwhelm the servers at the protein data bank.

Related Floes: MMDS 02. Generate Target and Family Dataset, MMDS 03. Structure Prep, MMDS 06. Add structures to MMDS

Computational Cost Scaling Creating a new PDB collection requires significantly more compute resource then if this floe were used to update a preexisting collection.

Parameter title in user interface (promoted name)

  • Output Dataset (data_out) type: dataset_out: Output dataset to write to
    Default: retrieve_failures

Parameter title in user interface (promoted name)

  • Collection Name (coll_name) type: string: Name of a new or existing collection for biomolecular source data.For existing collections, an ID can also be used. When supplying a name of an existing collection the latest with that name is updated (if multiple exist).
    Default: RCSB PDB Collection