Diverse Subset

The 2D and 3D diverse subset floes are designed to take an input dataset and find a representative subset of that dataset using 2D and 3D similarity clustering. This tutorial describes how to run these specific floes. For general guidance on this floe and the other floes that use large scale clustering, please refer to the large scale clustering tutorial.

Floes used in this Tutorial

The floes used in this tutorial are:

Required Inputs

This floe can take an input dataset with up to several hundred thousand input molecules as input. The 3D Floe will only look at the active conformer for datasets that contain multiconformer molecules. The 3D Floe will ignore molecules without 3D coordinate information.

If the input dataset size is over 20,000, it is recommended to have the Is Large Scale parameter turned on. In this case, the sphere exclusion radius should be chosen carefully. See the section in the large scale clustering tutorial for more details.

If the Is Large Scale parameter is turned off, the sphere exclusion radius will not be used, and k-medoids will be used for clustering. Advanced K-Medoids settings can be chosen in the Advanced: K-Medoids parameter section.

In either case, small or large scale, the floe requires the user to input the number of molecules to be input. In the large scale case, the number actually output in the subset may be less than the number requested.

Outputs

Unlike other floes involving clustering, this floe only outputs a subset dataset of the chosen subset. The subset name can be chosen in the Output Parameter Group.

Troubleshooting

Refer to the troubleshooting section in the large scale clustering floe tutorial for advice on fixing any problems encountered while running this floe.