Cluster Poses
Category Paths
Follow one of these paths in the Orion user interface, to find the floe.
Product-based/FastROCS
Product-based/Gigadock
Role-based/Computational Chemist
Solution-based/Virtual-screening/Analysis/Clustering
Task-based/Data Science/Clustering
Description
This floe clusters poses based on 3D similarity.
The outputted cluster information includes: (1) an integer cluster ID for each pose that identifies which cluster the pose belongs to; (2) an integer cluster rank for each pose that indicates the rank of the pose within its cluster (rank is based on the order of poses in the dataset); and (3) the Tanimoto of each pose to its cluster center. Cluster centers will have a Tanimoto of 1.0.
The 3D similarity is calculated in place, that is, the poses are not moved or overlayed before calculating the 3D similarity.
Promoted Parameters
Title in user interface (promoted name)
Inputs
Input Dataset (input_dataset): The dataset(s) to read records from
Required
Type: data_source
Outputs
Output Dataset (output_dataset): Dataset for the clustered output.
Required
Type: dataset_out
Default: Clustered
Options
Cluster Tanimoto Threshold (cluster_tanimoto_threshold): Tanimoto similarity threshold used to determine cluster centers. Larger values will result in more clusters with fewer conformers or poses in each cluster that are more similar to each other.
Type: decimal
Default: 0.9
Single Conformer/Pose Input (single_conformerpose_input): If On, the floe will assume that the input molecules are single conformers and place the clustering information on the output records directly. If Off, the floe will cluster all conformers of each molecule and place the clustering information on the child conformer records of the output records. If multiconformer molecules are passed to this floe with this option On, the floe will use the active conformer of the molecule.
Type: boolean
Default: True
Choices: [True, False]
Options: Advanced
Charge Model (charge_model): Charge model to use in the electrostatically similar part of the 3D similarity calculation.
Type: string
Default: elf10
Choices: [‘elf10’, ‘mmff’, ‘input’]
Shape Falloff (shape_falloff): Distance at which the Gaussian atom density is half its max value. This can be thought of roughly as the effective radius of the heavy atoms in the similarity model. Higher values mean that two poses with atoms that are not exactly overlapping can still have a high similarity or Tanimoto.
Type: decimal
Default: 2.0
Charge Falloff (charge_falloff): Distance at which the Gaussian atom charge density is half its max value. Higher values mean that atoms with different partial charges are more likely to be considered similar and that poses with the same shape but differing partial charges can have high similarity or Tanimoto.
Type: decimal
Default: 0.25
Output Fields
Pose Cluster ID (pose_cluster_id): Integer field with the identifier of the cluster the pose/conformer is associated with.
Type: field_parameter::int
Default: Pose Cluster ID
Pose Cluster Rank (pose_cluster_rank): Integer field with the rank of the pose/conformer within its cluster. This rank is based on the order the pose/conformer appears in the original dataset. Rank=1 is not necessarily a cluster center. Cluster centers have a 3D cluster Tanimoto of 1.0 (see field parameter of that name).
Type: field_parameter::int
Default: Pose Cluster Rank
Pose Cluster Tanimoto (pose_cluster_tanimoto): Tanimoto similarity between the pose/conformer and its cluster center pose/conformer. A Tanimoto of 1.0 indicates the pose/conformer is the cluster center.
Type: field_parameter::float
Default: Pose Cluster Tanimoto
Input Fields
Molecule Field (Input Molecule Field): Field on the input records containing the molecules to cluster. If this field is left blank, the primary (default) molecule field will be used.
Type: field_parameter::mol