String Hit List¶
This cube builds a hit list of records.
It accumulates a hitlist of a given size by sorting input records based on the ‘string’ value of the field specified by the Sort Field parameter. The final hit list is sent to the hit_list port. The sort ordering of the records on the hit list is stable.
Records which have a Sort Field but are not included in the hit list are written to the discard output port. The order of records emitted to the discard port is not stable. Any records which do not contain a Sort Field are written to the missing output port. These records are written as encountered so they are in relative input order.
If Keep Ties parameter is true, then it indicates that ties in the last position of the hit list will be kept and the output hit list may have more than Hit List Size. If Keep Ties parameter is false, then the hit list is truncated to exactly Hit List Size irrespective of ties in the last position.
If ties are allowed, the Hit List Truncate parameter is used to guard against pathological behavior for lists of low cardinality. If there are a large number of ties, a hit list can grow to much larger than the desired hit list size. The Hit List Truncate parameter sets the absolute maximum hit list size, expressed as a multiple of Hit List Size. If the hit list grows to that size it is truncated without regard to ties.
Calculation Parameters¶
CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128 Cube Metrics (cube_metrics) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network Descending (descending) type: boolean: This parameter determines whether the list will be sorted in descending or ascending order.Default: False Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592 GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16 Hit List Size (hit_list_size) type: integer: The desired size of the hit list.Default: 1 , Min: 1, Max: 100000 Hit List Truncate (hit_list_truncate) type: integer: The maximum size of hit list, as a multiple of desired size, if ties are allowed.Default: 4 , Min: 1 Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “” Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on Keep Ties (keep_ties) type: boolean: This parameter indicates whether to keep identical values even when exceeding the desired hit list size.Default: False Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300 Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592 Metric Period (metric_period) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300 Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32 Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64 Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
Field parameters¶
String Sort Field (None) type: Field Type: String: Record field containing the key value to sort by String Sort Field (sort_field) type: Field Type: String: Record field containing the key value to sort by
Hit List Parameters¶
Hit List Size (None) type: integer: The desired size of the hit list.Default: 1 , Min: 1, Max: 100000 Descending (None) type: boolean: This parameter determines whether the list will be sorted in descending or ascending order.Default: False Keep Ties (None) type: boolean: This parameter indicates whether to keep identical values even when exceeding the desired hit list size.Default: False Hit List Truncate (None) type: integer: The maximum size of hit list, as a multiple of desired size, if ties are allowed.Default: 4 , Min: 1
Hardware Parameters¶
- Machine hardware requirements
- Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 1800 , Min: 256.0, Max: 8589934592
- Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to addressDefault: 64
- Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPUDefault: 32
- Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluatedDefault: 600 , Min: 300
- Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.Default: 5120.0 , Min: 128.0, Max: 8589934592
- GPUs (gpu_count) type: integer: The number of GPUs to run this cube withDefault: 0 , Max: 16
- CPUs (cpu_count) type: integer: The number of CPUs to run this cube withDefault: 1 , Min: 1, Max: 128
- Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
- Spot policy (spot_policy) type: string: Control cube placement on spot market instancesDefault: ProhibitedChoices: Allowed, Preferred, NotPreferred, Prohibited, Required
- Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)Default: “”
Metrics Parameters¶
- Cube Metric Parameters
- Metric Period (None) type: decimal: How often to sample metrics, in secondsDefault: 60Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
- Cube Metrics (None) type: string: Set of metrics to be collectedChoices: cpu, disk, memory, network