Sphere Exclusion Clustering¶
Parameter Details¶
Calculation Parameters¶
Metric Used To Compute Linkage (affinity) type: string: If linkage is “ward”, only “euclidean” is accepted. If “precomputed”, a distance matrix (instead of a similarity matrix) is needed as input for the fit method.Default: precomputedChoices: euclidean, l1, l2, manhattan, cosine, precomputed Compute Full Tree (compute_full_tree) type: string: Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be True if distance_threshold is not None. By default compute_full_tree is “auto”, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, “auto” is equivalent to False.Default: TrueChoices: auto, True, False Connectivity Matrix (connectivity) type: string: Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix,such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithmis unstructured. CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Use Diagnostics (debug_mode) type: boolean:Default: False Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 Distance Threshold (distance_threshold) type: decimal: The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True. GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Linkage (linkage) type: string: Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.‘average’ uses the average of the distances of each observation of the two sets.‘complete’ or ‘maximum’ linkage uses the maximum distances between all observations of the two sets.‘single’ uses the minimum of the distances between all observations of the two sets.Default: averageChoices: complete, average, single None (matrix_input_file) type: file_in: Calculation Cache Output (memory) type: string: Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory. Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Number of Clusters (n_clusters) type: integer: The number of clusters to find.Default: 10 Output Similarity Matrix (output_similarity_matrix) type: boolean:Default: False None (row_label_input_file) type: file_in: Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Similarity Matrix Filename (similarity_matrix_filename) type: string:Default: clustering_similarity_matrix.txt Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required None (use_matrix_input_file) type: boolean:Default: False
Field parameters¶
Cluster ID Field (cluster_id_field) type: Field Type: String: The name for the field that will contain the unique cluster ID.Default: Cluster ID Cluster Method Field (cluster_method_field) type: Field Type: String: Field name for passing the clustering method to the floe report.Default: Cluster Method Cluster Parameters Field (cluster_parameters_field) type: Field Type: String: Field name for passing the cluster parameters to the Floe report.Default: Parameters None (coord_list_field) type: Field Type: IntVec:Default: coord_list_field Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log fieldDefault: Extended Log Field None (is_core) type: Field Type: Bool:Default: is_core Log Field (log_field) type: Field Type: String: The field to store messages to floe reportDefault: Log Field None (matrix_size_field) type: Field Type: Int:Default: matrix_size None (mol_field) type: Field Type: Chem.Mol: Similarity Score Field (score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.Default: similarity_score None (score_list_field) type: Field Type: FloatVec:Default: score_list_field None (x_field) type: Field Type: Int:Default: x None (y_field) type: Field Type: Int:Default: y
Hardware Parameters¶
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network