v0.2.3 June 2021¶
This package is built using OpenEye-Snowball==0.21.0 and the associated OpenEye-orionplatform.
The previously existing four separate subset floes were combined into a single floe,
Dataset Subsetting. that can subset based on a string field, numerical field, dataset, or regex.
Dataset Subsetting Based on String KeysFloe takes a dataset and two input parameters: a string field from that dataset, and a string parameter as input. It splits the string field by line to create keys, and then emits records from the input dataset that have values of the specified string field which match any of these keys.
Created two new floes that combine functionality in existing floes:
Generate and Deduplicate SMILES for a DatasetFloe adds a new string data field to a dataset that stores the SMILES representation of the primary molecule of each record. It then deduplicates the dataset based on canonical SMILES.
Generate and Deduplicate SMILES for One or More DatasetsFloe does the same as the floe above, but also concatenates input datasets.
Extended the floes that use deduplication to be able to do this deduplication based on a string field; float field or int field, with numerical tolerance; or a molecule field.
Fixed a bug in the
Dataset Classification -- Bemis-Murckofloe which was restricting the name of the molecule input column. Now the column does not need to have a specific name.
Floes were standardized with the floe_endgame function from snowball, which abstracts the success and failure output behavior of a floe.
Floe descriptions were improved and extended.
v0.2.2 December 2020¶
This package is built using OpenEye-Snowball==0.20.1 and the associated OpenEye-orionplatform.
This version solves some dependency resolution issues
v0.2.1 November 2020¶
This package is built using OpenEye-Snowball==0.20.0 and the associated OpenEye-orionplatform.
v0.2.0 August 2020¶
Upgraded to use
OpenEye-Snowball==0.19.0and the associated
Minor bug fixes and improvements to default output dataset names for Floes
The DBSCAN Cube has been re-factored, exposing DBSCAN algorithm parameters epsilon and minimum samples. Furthermore, the automatic estimation of epsilon has been improved, in the case the user does not supply one.
v0.1.1 April 2020¶
The package is built using
OpenEye-Snowball==0.18.0and the associated
New ``Dataset Append – Generating SMILES Field` Floe has been added that adds a SMILES field to records
Dataset Classification -- Bemis-MurckoFloe has been added that classifies molecules based on their Bemis-Murcko frameworks
Dataset Manipulation -- ConcatenationFloe has been added to concatenate datasets
Dataset Filtering -- Built-in Filter TypesFloe has been added that filters dataset based on built-in filtering types
Dataset Subsetting -- Random SelectionFloe has been added that randomly selects N records
Dataset Subsetting -- Random SplittingFloe has been added that randomly datasets
Dataset Manipulation -- Field Rename FloeFloe has been added that renames a record field
Dataset Subsetting -- Based on Reference DatasetFloe has been added that subsets a dataset based on whether its molecules existence in a reference dataset.
Dataset Subsetting -- Based on Numerical FieldFloe has been added that subsets a dataset based on numerical (float/int) data field
Dataset Subsetting -- Based on String FieldFloe has been added that subsets a dataset based on a string data field
Dataset Subsetting -- Based on String Field (Regex)Floe has been added that subsets a dataset based on a string data field match to a given regular expression
Dataset Deduplication -- Based on String FieldFloe has been added that deduplicate a dataset based on a user-defined string field.
Dataset Deduplication -- Based on SMILESFloe has been added that deduplicate a dataset based on canonical SMILES.