Sphere Exclusion Clustering¶
Calculation Parameters¶
Metric Used To Compute Linkage (affinity) type: string: If linkage is “ward”, only “euclidean” is accepted. If “precomputed”, a distance matrix (instead of a similarity matrix) is needed as input for the fit method.Default: precomputedChoices: euclidean, l1, l2, manhattan, cosine, precomputed
Compute Full Tree (compute_full_tree) type: string: Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be True if distance_threshold is not None. By default compute_full_tree is “auto”, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, “auto” is equivalent to False.Default: TrueChoices: auto, True, False
Connectivity Matrix (connectivity) type: string: Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix,such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithmis unstructured.
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network
Use Diagnostics (debug_mode) type: boolean:Default: False
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
Distance Threshold (distance_threshold) type: decimal: The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True.
GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Linkage (linkage) type: string: Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.‘average’ uses the average of the distances of each observation of the two sets.‘complete’ or ‘maximum’ linkage uses the maximum distances between all observations of the two sets.‘single’ uses the minimum of the distances between all observations of the two sets.Default: averageChoices: complete, average, single
None (matrix_input_file) type: file_in:
Calculation Cache Output (memory) type: string: Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Number of Clusters (n_clusters) type: integer: The number of clusters to find.Default: 10
Output Similarity Matrix (output_similarity_matrix) type: boolean:Default: False
Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
None (row_label_input_file) type: file_in:
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
Similarity Matrix Filename (similarity_matrix_filename) type: string:Default: clustering_similarity_matrix.txt
Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
None (use_matrix_input_file) type: boolean:Default: False
Field parameters¶
Cluster ID Field (cluster_id_field) type: Field Type: String: The name for the field that will contain the unique cluster ID.Default: Cluster ID
Cluster Method Field (cluster_method_field) type: Field Type: String: Field name for passing the clustering method to the floe report.Default: Cluster Method
Cluster Parameters Field (cluster_parameters_field) type: Field Type: String: Field name for passing the cluster parameters to the Floe report.Default: Parameters
None (coord_list_field) type: Field Type: IntVec:Default: coord_list_field
Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log fieldDefault: Extended Log Field
None (is_core) type: Field Type: Bool:Default: is_core
Log Field (log_field) type: Field Type: String: The field to store messages to floe reportDefault: Log Field
None (matrix_size_field) type: Field Type: Int:Default: matrix_size
None (mol_field) type: Field Type: Chem.Mol:
Similarity Score Field (score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.Default: similarity_score
None (score_list_field) type: Field Type: FloatVec:Default: score_list_field
None (x_field) type: Field Type: Int:Default: x
None (y_field) type: Field Type: Int:Default: y
Hardware Parameters¶
Machine hardware requirements
Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
Cube Metric Parameters
Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network