Release Notes

v1.3.0 February 2024

General Notice

  • This package is built using OpenEye-orionplatform==6.1.0, OpenEye-toolkits==2023.2.3, and OpenEye-Snowball==0.27.0.

Floe Updates

  • The large scale clustering, hitlist clustering, and diverse subset floes have been further optimized to both run significantly faster and require less memory and lower cost.

v1.2.0 July 2023

General Notice

  • This package is built using OpenEye-orionplatform==5.1.0, OpenEye-toolkits==2023.1.0, and OpenEye-Snowball==0.26.0.

New Floes

Floe Updates

  • The 3D clustering floes now align molecules by default, before shape similarity calculations. Alignment can also be disabled in the floe parameters.

  • The 3D large scale clustering, 3D hitlist clustering, and 3D diverse subset floes have been optimized to both run significantly faster and require less memory and lower cost.

v1.1.0 December 2022

General Notice

  • This package is built using OpenEye-orionplatform==4.5.4, OpenEye-toolkits==2022.2.1, and OpenEye-Snowball==0.24.2.

New Floes

Floe Updates

  • 2D clustering floes will now allow either pregenerated fingerprints, or generate fingerprints within the floe.

  • The 2D and 3D DBSCAN, Hierarchical, and K-Medoids clustering floes can optionally output the distance matrix that was calculated for clustering.

  • The previously existing 2D Hierarchical and DBSCAN clustering floes have been optimized to run much faster and produce more accurate results.

  • The DBSCAN 2D Similarity Clustering and DBSCAN 3D Similarity Clustering floes were tuned to more accurately calculate a reasonable EPS automtically using constraints on the largest cluster percentage, if EPS is not provided. These floes also now output outliers as singleton clusters, instead of ignoring them in the output dataset.

  • The 2D DBSCAN, Hierarchical, and K-Medoids clustering floes can now take a similarity or matrix distance numpy binary file as input, for custom clustering applications.

  • Output for any of the clustering floes can now optionally sort clusters based on a selected score field for each molecule.

v1.0.0 July 2022

General Notice

  • This package is built using OpenEye-orionplatform==4.4.0, OpenEye-toolkits==2022.1.1, and OpenEye-Snowball==0.24.0.

New Floes

  • The MultistatePKaModel based Ionization states enumeration floe was added, which enumerates the reasonable ionization state(s) of input molecules at neutral/physiological pH (7.4) based on the pKa assessed using a multistate pKa model.

  • The Hierarchical:ref:Hierarchical 2D Similarity Clustering<floe_Hierarchical2DSimilarityClustering> floe was added. This floe clusters datasets based on pre-generated fingerprints using OEGraphSim similarity calculation and hierarchical clustering. Unlike the existing DBSCAN and sphere exclusion floes, this floe allows the user to specify the number of clusters they would like.

  • The Dataset Manipulation – Add Molecule Title Field floe was added. This floe updates a dataset with a title field for the primary molecule of that dataset.

  • The Dataset Manipulation – Add Title to Molecule Field floe was added. This floe updates the primary molecule field of a dataset with a title taken from a string field in that dataset.

  • The Dataset Filtering – Create Custom Filter floe was added. This floe creates a custom molecule filter file compatible with the OEMolProp toolkit.

Floe Updates

  • All floes have a new brief description and are placed in the new Orion floe classification system.

  • A clustering tutorial was added that briefly describes how to run the clustering floes and analyze their data.

  • The Dataset Subsetting – Random Splitting floe was combined with the Dataset Subsetting – Random Splitting floe, into the Dataset Subsetting – Random Splitting or Selection floe. The combined floe has been redesigned so that more of the cubes can run in parallel.

  • The DBSCAN 2D Similarity Clustering floe was modified to give the user more control over the size of the clusters in the floe. It now has two optional parameters, minimum and maximum largest cluster percentage, which can be used in place of eps.

  • Error handling was improved in the Dataset Manipulation – Field Type Conversion floe.

v0.2.4 December 2021

General Notice

  • This package is built using OpenEye-orionplatform==4.2.5, OpenEye-toolkits==2021.2.0, and OpenEye-Snowball==0.23.0.

New Floes

  • The Dataset Manipulation -- Field Type Conversion floe was added, which converts fields on records of basic types (boolean, integer, float, and string) to fields of another basic type.

Floe Updates

  • The two dataset clustering floes were updated to include singleton counts in floe reports in all cases, even if writing to a singleton dataset.

  • A bug in the Dataset Similarity - Fingerprint Generation floe was fixed. The floe now writes molecules that failed fingerprinting to a failure dataset.

  • The minimum Tanimoto cutoff for clustering floes has been removed, so any cutoff as low as zero can be used.

v0.2.3 June 2021

General Notice

  • This package is built using OpenEye-Snowball==0.21.0 and the associated OpenEye-orionplatform.

New Floes

  • The previously existing four separate subset floes were combined into a single floe, Dataset Subsetting. that can subset based on a string field, numerical field, dataset, or regex.

  • The Dataset Subsetting Based on String Keys Floe takes a dataset and two input parameters: a string field from that dataset, and a string parameter as input. It splits the string field by line to create keys, and then emits records from the input dataset that have values of the specified string field which match any of these keys.

  • Created two new floes that combine functionality in existing floes:

  • The Generate and Deduplicate SMILES for a Dataset Floe adds a new string data field to a dataset that stores the SMILES representation of the primary molecule of each record. It then deduplicates the dataset based on canonical SMILES.

  • The Generate and Deduplicate SMILES for One or More Datasets Floe does the same as the floe above, but also concatenates input datasets.

Floe Updates

  • Extended the floes that use deduplication to be able to do this deduplication based on a string field; float field or int field, with numerical tolerance; or a molecule field.

  • Fixed a bug in the Dataset Classification -- Bemis-Murcko floe which was restricting the name of the molecule input column. Now the column does not need to have a specific name.

  • Floes were standardized with the floe_endgame function from snowball, which abstracts the success and failure output behavior of a floe.

  • Floe descriptions were improved and extended.

v0.2.2 December 2020

General Notice

  • This package is built using OpenEye-Snowball==0.20.1 and the associated OpenEye-orionplatform.

  • This version solves some dependency resolution issues

v0.2.1 November 2020

General Notice

  • This package is built using OpenEye-Snowball==0.20.0 and the associated OpenEye-orionplatform.

v0.2.0 August 2020

General Notice

  • Upgraded to use OpenEye-Snowball==0.19.0 and the associated OpenEye-orionplatform

  • Minor bug fixes and improvements to default output dataset names for Floes

  • The DBSCAN Cube has been re-factored, exposing DBSCAN algorithm parameters epsilon and minimum samples. Furthermore, the automatic estimation of epsilon has been improved, in the case the user does not supply one.

v0.1.1 April 2020

General Notice

  • The package is built using OpenEye-Snowball==0.18.0 and the associated OpenEye-orionplatform==2.4.4

New Floes

  • New Dataset Append -- Generating SMILES Field Floe has been added that adds a SMILES field to records

  • New Dataset Classification -- Bemis-Murcko Floe has been added that classifies molecules based on their Bemis-Murcko frameworks

  • New Dataset Manipulation -- Concatenation Floe has been added to concatenate datasets

  • New Dataset Filtering -- Built-in Filter Types Floe has been added that filters dataset based on built-in filtering types

  • New Dataset Subsetting -- Random Selection Floe has been added that randomly selects N records

  • New Dataset Subsetting -- Random Splitting Floe has been added that randomly datasets

  • New Dataset Manipulation -- Field Rename Floe Floe has been added that renames a record field

  • New Dataset Subsetting -- Based on Reference Dataset Floe has been added that subsets a dataset based on whether its molecules existence in a reference dataset.

  • New Dataset Subsetting -- Based on Numerical Field Floe has been added that subsets a dataset based on numerical (float/int) data field

  • New Dataset Subsetting -- Based on String Field Floe has been added that subsets a dataset based on a string data field

  • New Dataset Subsetting -- Based on String Field (Regex) Floe has been added that subsets a dataset based on a string data field match to a given regular expression

  • New Dataset Deduplication -- Based on String Field Floe has been added that deduplicate a dataset based on a user-defined string field.

  • New Dataset Deduplication -- Based on SMILES Floe has been added that deduplicate a dataset based on canonical SMILES.