Release Notes

v0.2.3 June 2021

General Notice

  • This package is built using OpenEye-Snowball==0.21.0 and the associated OpenEye-orionplatform.

New Floes

  • The previously existing four separate subset floes were combined into a single floe, Dataset Subsetting. that can subset based on a string field, numerical field, dataset, or regex.

  • The Dataset Subsetting Based on String Keys Floe takes a dataset and two input parameters: a string field from that dataset, and a string parameter as input. It splits the string field by line to create keys, and then emits records from the input dataset that have values of the specified string field which match any of these keys.

  • Created two new floes that combine functionality in existing floes:

  • The Generate and Deduplicate SMILES for a Dataset Floe adds a new string data field to a dataset that stores the SMILES representation of the primary molecule of each record. It then deduplicates the dataset based on canonical SMILES.

  • The Generate and Deduplicate SMILES for One or More Datasets Floe does the same as the floe above, but also concatenates input datasets.

Floe Updates

  • Extended the floes that use deduplication to be able to do this deduplication based on a string field; float field or int field, with numerical tolerance; or a molecule field.

  • Fixed a bug in the Dataset Classification -- Bemis-Murcko floe which was restricting the name of the molecule input column. Now the column does not need to have a specific name.

  • Floes were standardized with the floe_endgame function from snowball, which abstracts the success and failure output behavior of a floe.

  • Floe descriptions were improved and extended.

v0.2.2 December 2020

General Notice

  • This package is built using OpenEye-Snowball==0.20.1 and the associated OpenEye-orionplatform.

  • This version solves some dependency resolution issues

v0.2.1 November 2020

General Notice

  • This package is built using OpenEye-Snowball==0.20.0 and the associated OpenEye-orionplatform.

v0.2.0 August 2020

General Notice

  • Upgraded to use OpenEye-Snowball==0.19.0 and the associated OpenEye-orionplatform

  • Minor bug fixes and improvements to default output dataset names for Floes

  • The DBSCAN Cube has been re-factored, exposing DBSCAN algorithm parameters epsilon and minimum samples. Furthermore, the automatic estimation of epsilon has been improved, in the case the user does not supply one.

v0.1.1 April 2020

General Notice

  • The package is built using OpenEye-Snowball==0.18.0 and the associated OpenEye-orionplatform==2.4.4

New Floes

  • New ``Dataset Append – Generating SMILES Field` Floe has been added that adds a SMILES field to records

  • New Dataset Classification -- Bemis-Murcko Floe has been added that classifies molecules based on their Bemis-Murcko frameworks

  • New Dataset Manipulation -- Concatenation Floe has been added to concatenate datasets

  • New Dataset Filtering -- Built-in Filter Types Floe has been added that filters dataset based on built-in filtering types

  • New Dataset Subsetting -- Random Selection Floe has been added that randomly selects N records

  • New Dataset Subsetting -- Random Splitting Floe has been added that randomly datasets

  • New Dataset Manipulation -- Field Rename Floe Floe has been added that renames a record field

  • New Dataset Subsetting -- Based on Reference Dataset Floe has been added that subsets a dataset based on whether its molecules existence in a reference dataset.

  • New Dataset Subsetting -- Based on Numerical Field Floe has been added that subsets a dataset based on numerical (float/int) data field

  • New Dataset Subsetting -- Based on String Field Floe has been added that subsets a dataset based on a string data field

  • New Dataset Subsetting -- Based on String Field (Regex) Floe has been added that subsets a dataset based on a string data field match to a given regular expression

  • New Dataset Deduplication -- Based on String Field Floe has been added that deduplicate a dataset based on a user-defined string field.

  • New Dataset Deduplication -- Based on SMILES Floe has been added that deduplicate a dataset based on canonical SMILES.