MMDS 02. Generate Target and Family Dataset

Category Paths

  • Product-based/SPRUCE

  • Product-based/MMDS

  • Role-based/MMDS Staff User/MMDS Data Prep

  • Solution-based/Virtual-screening/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation

  • Solution-based/Hit to Lead/Target Preparation/Structural Data Preparation

  • Task-based/Data Science/Clustering

  • Task-based/Target Prep & Analysis/Protein Preparation

  • Task-based/Target Prep & Analysis/Protein Similarity Search

Description

MMDS 02. Generate Target and Family Dataset makes a Target Dataset and Family Dataset of proteins using UniProt to group similar protein sequences with PDB structures according to UniProt KBID.

Target categorization is based on ‘Guide to Pharmacology’ families. Any remaining UniProt targets are sorted into an ‘Uncategorized’ category. Targets with multiple UniProt features are further sub-divided into their feature traits and associated PDB structure are sorted based on sequence similarity.

Each target will attempt to find a reference structure using Spruce. Failure to find viable structures, or a viable reference structure will be recorded. Successful targets will contain a list of related PDB structures and a reference structure, and will be saved in a Target dataset.

If possible, a common reference structure will attempt to be found from the list of related targets. These are saved in the Family dataset along with all the categorization information.

Parameter title in user interface (promoted name)

  • Existing dataset failures (data_out) type: dataset_out: Output existing dataset failures
    Default: Existing Dataset Failures

Parameter title in user interface (promoted name)

  • Target dataset no structure (data_out) type: dataset_out: Output dataset of targets without any structures able to generate reference structures.
    Default: Target Dataset No Structures

Parameter title in user interface (promoted name)

  • Target dataset no ref structure (data_out) type: dataset_out: Output dataset of targets that are not able to generate reference structures.
    Default: Target Dataset No Ref Structure

Parameter title in user interface (promoted name)

  • UniProt dataset map (data_out) type: dataset_out: Output UniProt dataset map
    Default: UniProt Dataset Map

Parameter title in user interface (promoted name)

  • Output Dataset (data_out) type: dataset_out: Output dataset to write to
    Default: retrieve_failures

Parameter title in user interface (promoted name)

  • Target dataset (data_out) type: dataset_out: Output Target dataset
    Default: Target Dataset

Parameter title in user interface (promoted name)

  • Failed Family dataset (data_out) type: dataset_out: Output failed family dataset
    Default: Failed Family Dataset

Parameter title in user interface (promoted name)

  • Failed target dataset (data_out) type: dataset_out: Output failed target dataset
    Default: Failed Target Dataset

Parameter title in user interface (promoted name)

  • PDB Structure Collection (collection) type: collection_source: Collection containing input PDB structures from source

Parameter title in user interface (promoted name)

  • Family dataset (data_out) type: dataset_out: Output Family dataset
    Default: Family Dataset