Cheminformatics Floes Historical Release Notes

v1.3.1 July 2024

General Notice

This package is built using OpenEye-orionplatform==6.2.0, OpenEye-toolkits==2024.1.1, and OpenEye-Snowball==0.28.0.

Floe Updates

The Prepare Collection for Fast Similarity or Substructure Search from Dataset and Prepare Collection for Fast Similarity or Substructure Search from File Floes now have additional parameters to make only one of these collections as well as choose the substructure type.
The Fast Fingerprint Similarity Search Floe now supports enhanced stereochemistry.

v1.3.0 January 2024

General Notice

This package is built using OpenEye-orionplatform==6.1.0, OpenEye-toolkits==2023.2.3, and OpenEye-Snowball==0.27.0.

Floe Updates

The large scale clustering, hitlist clustering, and diverse subset floes have been further optimized to both run significantly faster and require less memory and lower cost.

v1.2.0 July 2023

General Notice

This package is built using OpenEye-orionplatform==5.1.0, OpenEye-toolkits==2023.1.0, and OpenEye-Snowball==0.26.0.

New Floes

The Fast Fingerprint Similarity Search floe was added, which quickly searches a large prepared collection of fingerprints and finds the top similarity hits using any type of fingerprint or OEGraphSim score function.
The Fast Substructure Search with an MDL Query and Fast Substructure Search with SMARTS floes were added, which search large prepared collections of molecules for substructure hits.
The Prepare Collection for Fast Similarity or Substructure Search from File and Prepare Collection for Fast Similarity or Substructure Search from Dataset floes were added, which prepare files or datasets for fast similarity and/or substructure search floes.
The Count Molecules in Fast Substructure Search Collection` floe was added, which counts the number of molecules in a prepared collection for fast search.

Floe Updates

The 3D clustering floes now align molecules by default, before shape similarity calculations. Alignment can also be disabled in the floe parameters.
The 3D large scale clustering, 3D hitlist clustering, and 3D diverse subset floes have been optimized to both run significantly faster and require less memory and lower cost.

v1.1.0 December 2022

General Notice

This package is built using OpenEye-orionplatform==4.5.4, OpenEye-toolkits==2022.2.1, and OpenEye-Snowball==0.24.2.

New Floes

The 2D Diverse Subset and 3D Diverse Subset floes were added, which find diverse subsets of the requested size from input molecule datasets using 2D and 3D clustering, respectively.
The 2D Hitlist Clustering and 3D Hitlist Clustering floes were added, which cluster large 2D and 3D hitlists using a provided score field to direct sphere exclusion clustering. They also provide output sorted by the clusters with the best scores.
The Large Scale 2D Similarity Clustering and Large Scale 3D Similarity Clustering floes were added, which can cluster large datasets of over input 100,000 molecules, using directed sphere exclusion clustering.
The K-Medoids 2D Similarity Clustering and K-Medoids 3D Similarity Clustering floes were added, which cluster input datasets using OEGraphsim 2D similarity and OEShape 3D similarity scores, and scikit-learn k-medoids clustering.
The DBSCAN 2D Similarity Clustering and DBSCAN 3D Similarity Clustering floes were added, which use OEShape 3D similarity scores to cluster input datasets using scikit-learn DBSCAN or hierarchical clustering.
The Generate 2D Similarity Matrix and Generate 3D Similarity Matrix floes were added, which calculate similarity scores using OEGraphSim and OEShape, gather summary statistics on these scores, and optionally generate 2D or 3D similarity or distance matrices as files that can be input to other clustering floes.
The Calculate Average Precision floe was added, which calculates average precision on an input dataset using a binary classifier.

Floe Updates

2D clustering floes will now allow either pregenerated fingerprints, or generate fingerprints within the floe.
The 2D and 3D DBSCAN, Hierarchical, and K-Medoids clustering floes can optionally output the distance matrix that was calculated for clustering.
The previously existing 2D Hierarchical and DBSCAN clustering floes have been optimized to run much faster and produce more accurate results.
The DBSCAN 2D Similarity Clustering and DBSCAN 3D Similarity Clustering floes were tuned to more accurately calculate a reasonable EPS automtically using constraints on the largest cluster percentage, if EPS is not provided. These floes also now output outliers as singleton clusters, instead of ignoring them in the output dataset.
The 2D DBSCAN, Hierarchical, and K-Medoids clustering floes can now take a similarity or matrix distance numpy binary file as input, for custom clustering applications.
Output for any of the clustering floes can now optionally sort clusters based on a selected score field for each molecule.

v1.0.0 July 2022

General Notice

This package is built using OpenEye-orionplatform==4.4.0, OpenEye-toolkits==2022.1.1, and OpenEye-Snowball==0.24.0.

New Floes

The MultistatePKaModel based Ionization states enumeration floe was added, which enumerates the reasonable ionization state(s) of input molecules at neutral/physiological pH (7.4) based on the pKa assessed using a multistate pKa model.
The Hierarchical:ref:Hierarchical 2D Similarity Clustering<floe_Hierarchical2DSimilarityClustering> floe was added. This floe clusters datasets based on pre-generated fingerprints using OEGraphSim similarity calculation and hierarchical clustering. Unlike the existing DBSCAN and sphere exclusion floes, this floe allows the user to specify the number of clusters they would like.
The Dataset Manipulation – Add Molecule Title Field floe was added. This floe updates a dataset with a title field for the primary molecule of that dataset.
The Dataset Manipulation – Add Title to Molecule Field floe was added. This floe updates the primary molecule field of a dataset with a title taken from a string field in that dataset.
The Dataset Filtering – Create Custom Filter floe was added. This floe creates a custom molecule filter file compatible with the OEMolProp toolkit.

Floe Updates

All floes have a new brief description and are placed in the new Orion floe classification system.
A clustering tutorial was added that briefly describes how to run the clustering floes and analyze their data.
The Dataset Subsetting – Random Splitting floe was combined with the Dataset Subsetting – Random Splitting floe, into the Dataset Subsetting – Random Splitting or Selection floe. The combined floe has been redesigned so that more of the cubes can run in parallel.
The DBSCAN 2D Similarity Clustering floe was modified to give the user more control over the size of the clusters in the floe. It now has two optional parameters, minimum and maximum largest cluster percentage, which can be used in place of eps.
Error handling was improved in the Dataset Manipulation – Field Type Conversion floe.

v0.2.4 December 2021

General Notice

This package is built using OpenEye-orionplatform==4.2.5, OpenEye-toolkits==2021.2.0, and OpenEye-Snowball==0.23.0.

New Floes

The Dataset Manipulation -- Field Type Conversion floe was added, which converts fields on records of basic types (boolean, integer, float, and string) to fields of another basic type.

Floe Updates

The two dataset clustering floes were updated to include singleton counts in floe reports in all cases, even if writing to a singleton dataset.
A bug in the Dataset Similarity - Fingerprint Generation floe was fixed. The floe now writes molecules that failed fingerprinting to a failure dataset.
The minimum Tanimoto cutoff for clustering floes has been removed, so any cutoff as low as zero can be used.

v0.2.3 June 2021

General Notice

This package is built using OpenEye-Snowball==0.21.0 and the associated OpenEye-orionplatform.

New Floes

The previously existing four separate subset floes were combined into a single floe, Dataset Subsetting. that can subset based on a string field, numerical field, dataset, or regex.
The Dataset Subsetting Based on String Keys Floe takes a dataset and two input parameters: a string field from that dataset, and a string parameter as input. It splits the string field by line to create keys, and then emits records from the input dataset that have values of the specified string field which match any of these keys.
Created two new floes that combine functionality in existing floes:
The Generate and Deduplicate SMILES for a Dataset Floe adds a new string data field to a dataset that stores the SMILES representation of the primary molecule of each record. It then deduplicates the dataset based on canonical SMILES.
The Generate and Deduplicate SMILES for One or More Datasets Floe does the same as the floe above, but also concatenates input datasets.

Floe Updates

Extended the floes that use deduplication to be able to do this deduplication based on a string field; float field or int field, with numerical tolerance; or a molecule field.
Fixed a bug in the Dataset Classification -- Bemis-Murcko floe which was restricting the name of the molecule input column. Now the column does not need to have a specific name.
Floes were standardized with the floe_endgame function from snowball, which abstracts the success and failure output behavior of a floe.
Floe descriptions were improved and extended.

v0.2.2 December 2020

General Notice

This package is built using OpenEye-Snowball==0.20.1 and the associated OpenEye-orionplatform.
This version solves some dependency resolution issues

v0.2.1 November 2020

General Notice

This package is built using OpenEye-Snowball==0.20.0 and the associated OpenEye-orionplatform.

v0.2.0 August 2020

General Notice

Upgraded to use OpenEye-Snowball==0.19.0 and the associated OpenEye-orionplatform
Minor bug fixes and improvements to default output dataset names for Floes
The DBSCAN Cube has been re-factored, exposing DBSCAN algorithm parameters epsilon and minimum samples. Furthermore, the automatic estimation of epsilon has been improved, in the case the user does not supply one.

v0.1.1 April 2020

General Notice

The package is built using OpenEye-Snowball==0.18.0 and the associated OpenEye-orionplatform==2.4.4

New Floes

New Dataset Append -- Generating SMILES Field Floe has been added that adds a SMILES field to records
New Dataset Classification -- Bemis-Murcko Floe has been added that classifies molecules based on their Bemis-Murcko frameworks
New Dataset Manipulation -- Concatenation Floe has been added to concatenate datasets
New Dataset Filtering -- Built-in Filter Types Floe has been added that filters dataset based on built-in filtering types
New Dataset Subsetting -- Random Selection Floe has been added that randomly selects N records
New Dataset Subsetting -- Random Splitting Floe has been added that randomly datasets
New Dataset Manipulation -- Field Rename Floe Floe has been added that renames a record field
New Dataset Subsetting -- Based on Reference Dataset Floe has been added that subsets a dataset based on whether its molecules existence in a reference dataset.
New Dataset Subsetting -- Based on Numerical Field Floe has been added that subsets a dataset based on numerical (float/int) data field
New Dataset Subsetting -- Based on String Field Floe has been added that subsets a dataset based on a string data field
New Dataset Subsetting -- Based on String Field (Regex) Floe has been added that subsets a dataset based on a string data field match to a given regular expression
New Dataset Deduplication -- Based on String Field Floe has been added that deduplicate a dataset based on a user-defined string field.
New Dataset Deduplication -- Based on SMILES Floe has been added that deduplicate a dataset based on canonical SMILES.