Dataset Deduplication – Based on Molecule, String, Integer, or Float Field

This Floe deduplicates a dataset based on a user-defined molecule, string, float, or integer field. Only one type of deduplication can be carried out by each cube instance. The type of deduplication must be selected in the Deduplication Type parameter.

Extra Required Parameters

  • Input Dataset (data_source) : Dataset to deduplicate
  • Deduplication Type (string) : The type of field on which to deduplicate.
    Default: molecule
    Choices: string, molecule, integer, float
  • Use Pka Normalization for Mol Deduplication (boolean) : If set to True, molecules will be pka normalized before deduplication.
    Default: False
  • Output unique dataset (dataset_out) : Output dataset to write to
    Default: unique
  • Write missing dataset (boolean) : If off, then the ‘missing’ dataset is not generated.
    Default: False
  • Write duplicate dataset (boolean) : If off, then the ‘duplicate’ dataset is not generated.
    Default: False