Dataset Classification – Bemis-Murcko

This Floe classifies molecules based on their Bemis-Murcko frameworks. The Floe can generate the following datasets:

  • Members dataset that will contain each molecule from the input dataset with class id, class size, and class member id data fields.

  • Cores dataset that contains one representative from each class with the class id and class size data fields.

  • Singletons dataset that contains classes with only one member.

When writing the singletons dataset, neither the members nor the cores output datasets will contain the singleton records.

The representatives are selected by scoring the molecules by number of atoms and bonds in their Bemis-Murcko frameworks.

Extra Required Parameters

  • Class ID Field (Field Type: Int) : The name for the field that will contain the unique class ID.
    Default: Class ID
  • Class Member ID Field (Field Type: Int) : The name for the field that will contain the unique number ID of the molecule in its class.
    Default: Class Member
  • Class Size Field (Field Type: Int) : The name for the field that will contain the size of the class the molecule is belong to.
    Default: Class Size
  • Input Classification Data Field (Field Type: String) : The name for the input string data field.
  • Output Class Cores (boolean) : If on, then one representative from each class will be sent to the ‘cores’ output dataset.
    Default: False
  • Output Class Members (boolean) : If on, then each record with class id will be sent to the ‘members’ dataset.
    Default: True
  • Output Singletons (boolean) : If on, then singletons will be sent only to the ‘singletons’ output dataset. Otherwise they will be emitted to both the ‘members’ and ‘cores’ output datasets with the other records.
    Default: False
  • Molecular Score Field (Field Type: Float) : The tag name of the score field.
    Default: Molecule Score
  • Uncolor Strategy Options (string) : Option that controls how to uncolor a molecular graph.
    Default: [‘RemoveDimension’]
    Choices: RemoveDimension, ConvertAtomTypeToC, ConvertBondTypeToSingle, RemoveAtomStereo, RemoveBondStereo, RemoveAtomProperties, RemoveGroupStereo
  • Output Members Dataset (dataset_out) : Output dataset to write to
    Default: members
  • Input Dataset (data_source) : Dataset to classify
  • Uncolor Switch (boolean) : If on, then all colors (atom and bond) are removed from the molecule graphs prior to clustering. If off, then only stereo information is removed.
    Default: True
  • SMILES Field (Field Type: String) : The name for the SMILES field.
    Default: SMILES
  • SMILES Type (string) : The type of the SMILES generated.
    Default: isomeric-canonical
    Choices: isomeric-canonical, non-isomeric-canonical, non-canonical
  • Region Tag (string) : The tag that is used to mark atoms/bonds that are in any Bemis Murcko region.
    Default: BM Region
  • Uncolor Strategy Options (string) : Option that controls how to uncolor a molecular graph.
    Default: [‘RemoveDimension’]
    Choices: RemoveDimension, ConvertAtomTypeToC, ConvertBondTypeToSingle, RemoveAtomStereo, RemoveBondStereo, RemoveAtomProperties, RemoveGroupStereo