MMDS 02. Generate Target and Family Dataset¶
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/SPRUCE
Product-based/MMDS
Role-based/MMDS Staff User/MMDS Data Prep
Solution-based/Virtual-screening/Target Preparation
Solution-based/Hit to Lead/Target Preparation
Solution-based/Hit to Lead/Target Preparation/Structural Data Preparation
Task-based/Data Science/Clustering
Task-based/Target Prep & Analysis/Protein Preparation
Task-based/Target Prep & Analysis/Protein Similarity Search
Description
MMDS 02. Generate Target and Family Dataset makes a Target Dataset and Family Dataset of proteins using UniProt to group similar protein sequences with PDB structures according to UniProt KBID.
Target categorization is based on ‘Guide to Pharmacology’ families. Any remaining UniProt targets are sorted into an ‘Uncategorized’ category. Targets with multiple UniProt features are further sub-divided into their feature traits and associated PDB structure are sorted based on sequence similarity.
Each target will attempt to find a reference structure using Spruce. Failure to find viable structures, or a viable reference structure will be recorded. Successful targets will contain a list of related PDB structures and a reference structure, and will be saved in a Target dataset.
If possible, a common reference structure will attempt to be found from the list of related targets. These are saved in the Family dataset along with all the categorization information.
Parameter title in user interface (promoted name)
Target dataset no ref structure (data_out) type: dataset_out: Output dataset of targets that are not able to generate reference structures.Default: Target Dataset No Ref Structure
Parameter title in user interface (promoted name)
Failed target dataset (data_out) type: dataset_out: Output failed target datasetDefault: Failed Target Dataset
Parameter title in user interface (promoted name)
Target dataset no structure (data_out) type: dataset_out: Output dataset of targets without any structures able to generate reference structures.Default: Target Dataset No Structures
Parameter title in user interface (promoted name)
PDB Structure Collection (collection) type: collection_source: Collection containing input PDB structures from source
Parameter title in user interface (promoted name)
UniProt dataset map (data_out) type: dataset_out: Output UniProt dataset mapDefault: UniProt Dataset Map
Parameter title in user interface (promoted name)
Existing dataset failures (data_out) type: dataset_out: Output existing dataset failuresDefault: Existing Dataset Failures
Parameter title in user interface (promoted name)
Family dataset (data_out) type: dataset_out: Output Family datasetDefault: Family Dataset
Parameter title in user interface (promoted name)
Output Dataset (data_out) type: dataset_out: Output dataset to write toDefault: retrieve_failures
Parameter title in user interface (promoted name)
Target dataset (data_out) type: dataset_out: Output Target datasetDefault: Target Dataset
Parameter title in user interface (promoted name)
Failed Family dataset (data_out) type: dataset_out: Output failed family datasetDefault: Failed Family Dataset