Cluster Poses
Description
Clusters Poses based on 3D similarity.
The outputted cluster information is. 1) An integer cluster ID for each pose that identified the which cluster the pose belongs to. 2) An integer cluster rank for each pose that indicates the rank of the pose within its cluster (rank is based on the order of the poses in the dataset). 3) The Tanimoto of each pose to its cluster center; cluster centers will have a Tanimoto of 1.0.
The 3D similarity is calculated in place, i.e, the poses are not moved/overlayed before calculating the 3D similarity.
See also
This floe is used in the Dock One Million Molecules with Gigadock Floe tutorial.
Details
Title : Cluster PosesTags : Munging Clustering AnalysisPython Name : #08_cluster_poses
Parameters
Inputs
Input Dataset The dataset(s) to read records fromType : data_sourceRequired : TruePython Name : input_dataset
Outputs
Output Dataset Dataset for the clustered outputType : dataset_outRequired : TrueDefault : ClusteredPython Name : output_dataset
Options
Cluster Tanimoto Threshold Tanimoto Similarity threshold used to determine cluster centers. Larger values will result in more clusters with fewer conformers/poses in each cluster that are more similar to each other.Type : decimalRequired : FalseDefault : 0.9Range : 0.0 to 1.0Python Name : cluster_tanimoto_threshold Single Conformer/Pose Input If ‘On’ the floe will assumed that the input molecules are single conformer and place the clustering information on the output records directly. If ‘Off’ the floe will cluster all conformers of each molecule and place the clustering information on the child conformer records of the output records. If multi conformer molecules are passed to this floe with this option ‘On’ the floe will use the active conformer of the molecule.Type : booleanRequired : FalseDefault : TrueChoices :True, FalsePython Name : single_conformerpose_input
Options: Advanced
These parameters control how the 3D Similarity is calculated.
Charge Model Charge model to use in the electrostatic similar part of the 3D similarity calculation.Type : stringRequired : FalseDefault : elf10Choices :elf10, mmff, inputPython Name : charge_model Shape Falloff Distance at which the gaussian atom density is half it’s max value. This can be thought of roughly as the effective radius of the heavy atoms in the similarity model. Higher values mean that two poses can with atoms at are not exactly on top of each other can still have high a high similarity/TanimotoType : decimalRequired : FalseDefault : 2.0Min Value : 0.0Python Name : shape_falloff Charge Falloff Distance at which the gaussian atom charge density is half it’s max value. Higher values mean that atoms with different partial charges are more likely to be considered similar and that poses with the same shape but differing partial charges can have high similarity/Tanimoto.Type : decimalRequired : FalseDefault : 0.25Min Value : 0.0Python Name : charge_falloff
Output Fields
These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.
Pose Cluster ID Integer field with the identifier of the cluster the pose/conformer is associated with.Type : field_parameter::intRequired : FalseDefault : Pose Cluster IDPython Name : pose_cluster_id Pose Cluster Rank Integer field with the rank of the pose/conformer within its cluster. This rank is based on the order the pose/conformer appears in the original dataset. Rank=1 is not necessarily a cluster center. Cluster center have a ‘3D Cluster Tanimoto’ of 1.0 (see field parameter of that name)Type : field_parameter::intRequired : FalseDefault : Pose Cluster RankPython Name : pose_cluster_rank Pose Cluster Tanimoto Tanimoto similarity between the pose/conformer and its cluster center pose/conformer. A Tanimoto of 1.0 indicates the pose/conformer is the cluster center.Type : field_parameter::floatRequired : FalseDefault : Pose Cluster TanimotoPython Name : pose_cluster_tanimoto
Input Fields
These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.
Molecule Field Field on the input records containing the molecules to cluster. If this field is left blank the primary (i.e., default) molecule field will be used.Type : field_parameter::molRequired : FalsePython Name : Input Molecule Field