Deduplicate Records Based on Best Similarity Score
This cube deduplicate records based on their primary mol field and a similarity score.
The similarity score field is specified by the Score Field parameter. If the molecule field of two separate records is the same, then the record with the best score is emitted from the best port. Other records are emitted through the other port.
The output ports are:
best : records with the highest score for their input molecule.
other : records without the highest score for their input molecule.
missing : any record with missing molecule field or score
Calculation Parameters
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Field parameters
Extended Log Field (ext_log_field) type: Field Type: StringVec: Message extended log fieldDefault: Extended Log Field None (in_mol_field) type: Field Type: Chem.Mol: Log Field (log_field) type: Field Type: String: The field to store messages to floe reportDefault: Log Field Similarity Score Field (similarity_score_field) type: Field Type: Float: Name for the field that stores fingerprint similarity scores.
Hardware Parameters
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
- Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network