Generate and Deduplicate SMILES for A Dataset

This Floe adds a new string data field to the dataset that stores the SMILES representation of the primary molecule of each record. It then deduplicates the dataset based on canonical SMILES.

Extra Required Parameters

  • Input Dataset (data_source) : Dataset to append deduplicated smiles to
  • SMILES Field (Field Type: String) : The name for the SMILES field.
    Default: SMILES
  • SMILES Type (string) : The type of the SMILES generated.
    Default: isomeric-canonical
    Choices: isomeric-canonical, non-isomeric-canonical, non-canonical
  • Deduplication Type (string) : The type of field on which to deduplicate.
    Default: molecule
    Choices: string, molecule, integer, float
  • Use Pka Normalization for Mol Deduplication (boolean) : If set to True, molecules will be pka normalized before deduplication.
    Default: False
  • Write missing dataset (boolean) : If off, then the ‘missing’ dataset is not generated.
    Default: False
  • Output unique dataset (dataset_out) : Output dataset to write to
    Default: unique
  • Write duplicate dataset (boolean) : If off, then the ‘duplicate’ dataset is not generated.
    Default: False