Cluster Poses

Description

Clusters Poses based on 3D similarity.

The outputted cluster information is. 1) An integer cluster ID for each pose that identified the which cluster the pose belongs to. 2) An integer cluster rank for each pose that indicates the rank of the pose within its cluster (rank is based on the order of the poses in the dataset). 3) The tanimoto of each pose to its cluster center; cluster centers will have a tanimto of 1.0.

The 3D similarity is calculated in place, i.e, the poses are not moved/overlayed before calculating the 3D similarity.

See also

This floe is used in the Dock One Million Molecules with Gigadock Floe tutorial.

Details

Title : Cluster Poses
Tags : Munging Clustering Analysis
Python Name : #08_cluster_poses

Parameters

Inputs

  • Input Dataset The dataset(s) to read records from
    Type : data_source
    Required : True
    Python Name : input_dataset

Outputs

  • Output Dataset Dataset for the clustered output
    Type : dataset_out
    Required : True
    Default : Clustered
    Python Name : output_dataset

Options

  • Cluster Tanimoto Threshold Tanimoto Similarity threshold used to determine cluster centers. Larger values will result in more clusters with fewer conformers/poses in each cluster that are more similar to each other.
    Type : decimal
    Required : False
    Default : 0.9
    Range : 0.0 to 1.0
    Python Name : cluster_tanimoto_threshold
  • Single Conformer/Pose Input If ‘On’ the floe will assumed that the input molecules are single conformer and place the clustering information on the output records directly. If ‘Off’ the floe will cluster all conformers of each molecule and place the clustering information on the child conformer records of the output records. If multi conformer molecules are passed to this floe with this option ‘On’ the floe will use the active conformer of the molecule.
    Type : boolean
    Required : False
    Default : True
    Choices :True, False
    Python Name : single_conformerpose_input

Options: Advanced

These parameters control how the 3D Similarity is calculated.

  • Charge Model Charge model to use in the electrostatic similar part of the 3D similarity calculation.
    Type : string
    Required : False
    Default : elf10
    Choices :elf10, mmff, input
    Python Name : charge_model
  • Shape Falloff Distance at which the gaussian atom density is half it’s max value. This can be thought of roughly as the effective radius of the heavy atoms in the similarity model. Higher values mean that two poses can with atoms at are not exactly on top of each other can still have high a high similarity/tanimoto
    Type : decimal
    Required : False
    Default : 2.0
    Min Value : 0.0
    Python Name : shape_falloff
  • Charge Falloff Distance at which the gaussian atom charge density is half it’s max value. Higher values mean that atoms with different partial charges are more likely to be considered similar and that poses with the same shape but differing partial charges can have high similarity/tanimoto.
    Type : decimal
    Required : False
    Default : 0.25
    Min Value : 0.0
    Python Name : charge_falloff

Output Fields

These parameters allow the user to change the default output fields this floe creates in the output datasets and/or collections. Note that parameters identifying a molecule field are special. If a molecule field is left empty the floe writes the molecule to the primary (i.e., default) molecule field of the record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge. CAUTION: If these parameters are modified the modifications must also be applied to the input fields of downstream floes that read fields written by this floe. If the downstream floe does not support specifying the input field then they may not work properly with the output of this floe if these settings are modified.

  • Pose Cluster ID Integer field with the identifier of the cluster the pose/conformer is associated with.
    Type : field_parameter::int
    Required : False
    Default : Pose Cluster ID
    Python Name : pose_cluster_id
  • Pose Cluster Rank Integer field with the rank of the pose/conformer within its cluster. This rank is based on the order the pose/conformer appears in the original dataset. Rank=1 is not necessarily a cluster center. Cluster center have a ‘3D Cluster Tanimoto’ of 1.0 (see field parameter of that name)
    Type : field_parameter::int
    Required : False
    Default : Pose Cluster Rank
    Python Name : pose_cluster_rank
  • Pose Cluster Tanimoto Tanimoto similarity between the pose/conformer and its cluster center pose/conformer. A tanimoto of 1.0 indicates the pose/conformer is the cluster center.
    Type : field_parameter::float
    Required : False
    Default : Pose Cluster Tanimoto
    Python Name : pose_cluster_tanimoto

Input Fields

These parameters specify the fields on the input datasets and/or collections these floes read data from. Note that parameters identifying a molecule field are special. If left empty the floe will read the molecule from the primary (i.e., default) molecule field on the input record. The primary molecule of a dataset can be identified in the UI by looking for star on its field badge.

  • Molecule Field Field on the input records containing the molecules to cluster. If this field is left blank the primary (i.e., default) molecule field will be used.
    Type : field_parameter::mol
    Required : False
    Python Name : Input Molecule Field