DBScan Clustering¶
Calculation Parameters¶
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network
Use Diagnostics (debug_mode) type: boolean:Default: False
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
Epsilon (eps) [OPTIONAL] (eps) type: decimal:If not provided, an eps value will be estimated based on the input data. The epsilon value controls DBSCAN clustering. This is the maximum DISTANCE between core cluster members and the maximum single-linkage distance for non-core cluster members. Increase eps to cluster more molecules together in fewer clusters and decrease eps to cluster fewer molecules together in more clusters. Scores are normalized from 0 to 1, so. for example, a TanimotoCombo similarity score of 1.5 is normalized to a soore of 0.75 and an eps/distance of 0.25.
, Min: 0.01
GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
None (matrix_input_file) type: file_in:
Maximum Largest Cluster Percentage [OPTIONAL] (max_largest_cluster_percentage) type: decimal: If an eps value is not provided, this will controlDBSCAN clustering by setting the maximum percentage of molecules in the largest allowed cluster.Default: 90.0 , Min: 1.0, Max: 100.0
Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Minimum Largest Cluster Percentage [OPTIONAL] (min_largest_cluster_percentage) type: decimal: If an eps value is not provided, this will controlDBSCAN clustering by setting the minimum percentage of molecules in the largest allowed cluster.Default: 1.0 , Min: 1.0, Max: 99.0
Minimum Samples (min_samples) type: integer:This is a control parameter for DBSCAN that has little effect on the clustering for most cases. Across a wide-range of literature datasets, a value of 5 (or possibly a range of 5-10) is quite effective. The default value is 5.
Default: 10 , Min: 1
Output Similarity Matrix (output_similarity_matrix) type: boolean:Default: False
Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
None (row_label_input_file) type: file_in:
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
Similarity Matrix Filename (similarity_matrix_filename) type: string:Default: clustering_similarity_matrix.txt
Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
None (use_matrix_input_file) type: boolean:Default: False
Field parameters¶
Cluster ID Field (cluster_id_field) type: Field Type: String: The name for the field that will contain the unique cluster ID.Default: Cluster ID
Cluster Method Field (cluster_method_field) type: Field Type: String: Field name for passing the clustering method to the floe report.Default: Cluster Method
Cluster Parameters Field (cluster_parameters_field) type: Field Type: String: Field name for passing the cluster parameters to the Floe report.Default: Parameters
None (coord_list_field) type: Field Type: IntVec:Default: coord_list_field
Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log fieldDefault: Extended Log Field
None (is_core) type: Field Type: Bool:Default: is_core
Log Field (log_field) type: Field Type: String: The field to store messages to floe reportDefault: Log Field
None (matrix_size_field) type: Field Type: Int:Default: matrix_size
None (mol_field) type: Field Type: Chem.Mol:
Similarity Score Field (score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.Default: similarity_score
None (score_list_field) type: Field Type: FloatVec:Default: score_list_field
None (x_field) type: Field Type: Int:Default: x
None (y_field) type: Field Type: Int:Default: y
Hardware Parameters¶
Machine hardware requirements
Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
Cube Metric Parameters
Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network