DBScan Clustering

Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Use Diagnostics (debug_mode) type: boolean:
    Default: False
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • Epsilon (eps) [OPTIONAL] (eps) type: decimal:

    If not provided, an eps value will be estimated based on the input data. The epsilon value controls DBSCAN clustering. This is the maximum DISTANCE between core cluster members and the maximum single-linkage distance for non-core cluster members. Increase eps to cluster more molecules together in fewer clusters and decrease eps to cluster fewer molecules together in more clusters. Scores are normalized from 0 to 1, so. for example, a TanimotoCombo similarity score of 1.5 is normalized to a soore of 0.75 and an eps/distance of 0.25.

    , Min: 0.01
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • None (matrix_input_file) type: file_in:
  • Maximum Largest Cluster Percentage [OPTIONAL] (max_largest_cluster_percentage) type: decimal: If an eps value is not provided, this will controlDBSCAN clustering by setting the maximum percentage of molecules in the largest allowed cluster.
    Default: 90.0 , Min: 1.0, Max: 100.0
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Minimum Largest Cluster Percentage [OPTIONAL] (min_largest_cluster_percentage) type: decimal: If an eps value is not provided, this will controlDBSCAN clustering by setting the minimum percentage of molecules in the largest allowed cluster.
    Default: 1.0 , Min: 1.0, Max: 99.0
  • Minimum Samples (min_samples) type: integer:

    This is a control parameter for DBSCAN that has little effect on the clustering for most cases. Across a wide-range of literature datasets, a value of 5 (or possibly a range of 5-10) is quite effective. The default value is 5.

    Default: 10 , Min: 1
  • Output Similarity Matrix (output_similarity_matrix) type: boolean:
    Default: False
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • None (row_label_input_file) type: file_in:
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Similarity Matrix Filename (similarity_matrix_filename) type: string:
    Default: clustering_similarity_matrix.txt
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • None (use_matrix_input_file) type: boolean:
    Default: False

Field parameters

  • Cluster ID Field (cluster_id_field) type: Field Type: String: The name for the field that will contain the unique cluster ID.
    Default: Cluster ID
  • Cluster Method Field (cluster_method_field) type: Field Type: String: Field name for passing the clustering method to the floe report.
    Default: Cluster Method
  • Cluster Parameters Field (cluster_parameters_field) type: Field Type: String: Field name for passing the cluster parameters to the Floe report.
    Default: Parameters
  • None (coord_list_field) type: Field Type: IntVec:
    Default: coord_list_field
  • Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log field
    Default: Extended Log Field
  • None (is_core) type: Field Type: Bool:
    Default: is_core
  • Log Field (log_field) type: Field Type: String: The field to store messages to floe report
    Default: Log Field
  • None (matrix_size_field) type: Field Type: Int:
    Default: matrix_size
  • None (mol_field) type: Field Type: Chem.Mol:
  • Similarity Score Field (score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.
    Default: similarity_score
  • None (score_list_field) type: Field Type: FloatVec:
    Default: score_list_field
  • None (x_field) type: Field Type: Int:
    Default: x
  • None (y_field) type: Field Type: Int:
    Default: y

Hardware Parameters

Machine hardware requirements

  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters

  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network