Cluster Poses

Category Paths

Follow one of these paths in the Orion user interface, to find the floe.

  • Product-based/FastROCS

  • Product-based/Gigadock

  • Role-based/Computational Chemist

  • Solution-based/Virtual-screening/Analysis/Clustering

  • Task-based/Data Science/Clustering

Description

This floe clusters poses based on 3D similarity.

The outputted cluster information includes: (1) an integer cluster ID for each pose that identifies which cluster the pose belongs to; (2) an integer cluster rank for each pose that indicates the rank of the pose within its cluster (rank is based on the order of poses in the dataset); and (3) the Tanimoto of each pose to its cluster center. Cluster centers will have a Tanimoto of 1.0.

The 3D similarity is calculated in place, that is, the poses are not moved or overlayed before calculating the 3D similarity.

Promoted Parameters

Title in user interface (promoted name)

Inputs

Input Dataset (input_dataset): The dataset(s) to read records from

  • Required

  • Type: data_source

Outputs

Output Dataset (output_dataset): Dataset for the clustered output.

  • Required

  • Type: dataset_out

  • Default: Clustered

Options

Cluster Tanimoto Threshold (cluster_tanimoto_threshold): Tanimoto similarity threshold used to determine cluster centers. Larger values will result in more clusters with fewer conformers or poses in each cluster that are more similar to each other.

  • Type: decimal

  • Default: 0.9

Single Conformer/Pose Input (single_conformerpose_input): If On, the floe will assume that the input molecules are single conformers and place the clustering information on the output records directly. If Off, the floe will cluster all conformers of each molecule and place the clustering information on the child conformer records of the output records. If multiconformer molecules are passed to this floe with this option On, the floe will use the active conformer of the molecule.

  • Type: boolean

  • Default: True

  • Choices: [True, False]

Options: Advanced

Charge Model (charge_model): Charge model to use in the electrostatically similar part of the 3D similarity calculation.

  • Type: string

  • Default: elf10

  • Choices: [‘elf10’, ‘mmff’, ‘input’]

Shape Falloff (shape_falloff): Distance at which the Gaussian atom density is half its max value. This can be thought of roughly as the effective radius of the heavy atoms in the similarity model. Higher values mean that two poses with atoms that are not exactly overlapping can still have a high similarity or Tanimoto.

  • Type: decimal

  • Default: 2.0

Charge Falloff (charge_falloff): Distance at which the Gaussian atom charge density is half its max value. Higher values mean that atoms with different partial charges are more likely to be considered similar and that poses with the same shape but differing partial charges can have high similarity or Tanimoto.

  • Type: decimal

  • Default: 0.25

Output Fields

Pose Cluster ID (pose_cluster_id): Integer field with the identifier of the cluster the pose/conformer is associated with.

  • Type: field_parameter::int

  • Default: Pose Cluster ID

Pose Cluster Rank (pose_cluster_rank): Integer field with the rank of the pose/conformer within its cluster. This rank is based on the order the pose/conformer appears in the original dataset. Rank=1 is not necessarily a cluster center. Cluster centers have a 3D cluster Tanimoto of 1.0 (see field parameter of that name).

  • Type: field_parameter::int

  • Default: Pose Cluster Rank

Pose Cluster Tanimoto (pose_cluster_tanimoto): Tanimoto similarity between the pose/conformer and its cluster center pose/conformer. A Tanimoto of 1.0 indicates the pose/conformer is the cluster center.

  • Type: field_parameter::float

  • Default: Pose Cluster Tanimoto

Input Fields

Molecule Field (Input Molecule Field): Field on the input records containing the molecules to cluster. If this field is left blank, the primary (default) molecule field will be used.

  • Type: field_parameter::mol