Sphere Exclusion Clustering

Calculation Parameters

  • Metric Used To Compute Linkage (affinity) type: string: If linkage is “ward”, only “euclidean” is accepted. If “precomputed”, a distance matrix (instead of a similarity matrix) is needed as input for the fit method.
    Default: precomputed
    Choices: euclidean, l1, l2, manhattan, cosine, precomputed
  • Compute Full Tree (compute_full_tree) type: string: Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be True if distance_threshold is not None. By default compute_full_tree is “auto”, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, “auto” is equivalent to False.
    Default: True
    Choices: auto, True, False
  • Connectivity Matrix (connectivity) type: string: Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix,such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithmis unstructured.
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Use Diagnostics (debug_mode) type: boolean:
    Default: False
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Distance Threshold (distance_threshold) type: decimal: The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True.
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Linkage (linkage) type: string: Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.‘average’ uses the average of the distances of each observation of the two sets.‘complete’ or ‘maximum’ linkage uses the maximum distances between all observations of the two sets.‘single’ uses the minimum of the distances between all observations of the two sets.
    Default: average
    Choices: complete, average, single
  • None (matrix_input_file) type: file_in:
  • Calculation Cache Output (memory) type: string: Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Number of Clusters (n_clusters) type: integer: The number of clusters to find.
    Default: 10
  • Output Similarity Matrix (output_similarity_matrix) type: boolean:
    Default: False
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • None (row_label_input_file) type: file_in:
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Similarity Matrix Filename (similarity_matrix_filename) type: string:
    Default: clustering_similarity_matrix.txt
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • None (use_matrix_input_file) type: boolean:
    Default: False

Field parameters

  • Cluster ID Field (cluster_id_field) type: Field Type: String: The name for the field that will contain the unique cluster ID.
    Default: Cluster ID
  • Cluster Method Field (cluster_method_field) type: Field Type: String: Field name for passing the clustering method to the floe report.
    Default: Cluster Method
  • Cluster Parameters Field (cluster_parameters_field) type: Field Type: String: Field name for passing the cluster parameters to the Floe report.
    Default: Parameters
  • None (coord_list_field) type: Field Type: IntVec:
    Default: coord_list_field
  • Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log field
    Default: Extended Log Field
  • None (is_core) type: Field Type: Bool:
    Default: is_core
  • Log Field (log_field) type: Field Type: String: The field to store messages to floe report
    Default: Log Field
  • None (matrix_size_field) type: Field Type: Int:
    Default: matrix_size
  • None (mol_field) type: Field Type: Chem.Mol:
  • Similarity Score Field (score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.
    Default: similarity_score
  • None (score_list_field) type: Field Type: FloatVec:
    Default: score_list_field
  • None (x_field) type: Field Type: Int:
    Default: x
  • None (y_field) type: Field Type: Int:
    Default: y

Hardware Parameters

Machine hardware requirements

  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters

  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network