Dataset Similarity – Fingerprint Generation (User Defined)

This Floe generates the following user customizable fingerprint types: The generated fingerprints can be customized with the following parameters:

  • Fingerprint Type parameter determines the type of fingerprint (circular - ECFP-like, path, tree)

  • Fingerprint Size parameter determines the size of the generated fingerprint (in bits)

  • Minimum Fragment Size and Maximum Fragment Size parameters determine the minimum and maximum size of the fragments that are exhaustively enumerated during the fingerprint generation

  • Fingerprint Atom Typing and Fingerprint Bond Typing parameters determine which atom and bond properties are encoded into the fingerprints

Extra Required Parameters

  • Fingerprint Atom Typing (string) : The atom properties encoded into the fingerprints.
    Default: [‘Atomic number’]
    Choices: Atomic number, Aromaticity, Chiral, Formal charge, Heavy degree, Hybridization, In ring, Hydrogen count, Halogen equivalent, Aromatic equivalent, HBond acceptor equivalent, HBond donor equivalent
  • Fingerprint Bond Typing (string) : The bond properties encoded into the fingerprints.
    Default: [‘Bond order’]
    Choices: Bond order, Chiral, In ring
  • Fingerprint Field (Field Type: Chem.FingerPrint) : Tag name for the field that stores fingerprints.
    Default: Fingerprint
  • Maximum Fragment Size (integer) : The largest fragments that are enumerated during the fingerprint generation. In case of path and tree fingerprint types, this means maximum number of bonds in a fragment. In case of circular fingerprint type, this numbers means bond distance from central atoms.
    Default: 4 Min: 1 Max: 8
  • Minimum Fragment Size (integer) : The smallest fragments that are enumerated during the fingerprint generation. In case of path and tree fingerprint type, this means minimum number of bonds in a fragment. In case of circular fingerprint type, this numbers means bond distance from central atoms.
    Default: 0 Max: 5
  • Fingerprint Size (integer) : The size of the fingerprint (in bits) generated for similarity calculation. It is recommended to generate fingerprints with the size of multiple of 256.
    Default: 4096 Min: 256 Max: 16384
  • Fingerprint Type (string) : The fingerprint type generated for similarity calculation.
    Default: Tree
    Choices: Circular, Path, Tree
  • Input Dataset (data_source) : Dataset to generate fingerprints
  • Output Dataset (dataset_out) : Output dataset of successful calculations
    Default: fingerprints
  • Failed Dataset (dataset_out) : Output dataset of failed calculations
    Default: Failed Output for Dataset Similarity – Fingerprint Generation (User Defined)