Generate and Deduplicate SMILES for One or More Datasets¶
This Floe concatenates input datasets. It then adds a new string data field to the combined dataset that stores the SMILES representation of the primary molecule of each record. It then deduplicates the dataset based on canonical SMILES.
Extra Required Parameters
Input Dataset (data_source) : Dataset to deduplicate SMILES Field (Field Type: String) : The name for the SMILES field.Default: SMILES SMILES Type (string) : The type of the SMILES generated.Default: isomeric-canonicalChoices: isomeric-canonical, non-isomeric-canonical, non-canonical Deduplication Type (string) : The type of field on which to deduplicate.Default: moleculeChoices: string, molecule, integer, float Use Pka Normalization for Mol Deduplication (boolean) : If set to True, molecules will be pka normalized before deduplication.Default: False Output unique dataset (dataset_out) : Output dataset to write toDefault: unique Write missing dataset (boolean) : If off, then the ‘missing’ dataset is not generated.Default: False Write duplicate dataset (boolean) : If off, then the ‘duplicate’ dataset is not generated.Default: False