Generate and Deduplicate SMILES for One or More Datasets

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Task-based/Cheminformatics/SMILES Gen & Deduplication

  • Role-based/Medicinal Chemist

Description

Add a string data field that stores the SMILES representation of the primary molecule of each record to the combined dataset, then deduplicate the dataset based on that SMILES.

Promoted Parameters

Title in user interface (promoted name)

Outputs

Output Dataset for Unique Records (unique): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: unique_SMILES

Output Dataset for Duplicate Records (duplicate): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: duplicate_SMILES

Output Dataset for Records Missing SMILES (missing): Output dataset to write to

  • Required

  • Type: dataset_out

  • Default: missing_SMILES

Inputs

Input Dataset (in): Dataset to deduplicate

  • Required

  • Type: data_source