Integer Hit List

This cube builds a hit list of records.

It accumulates a hitlist of a given size by sorting input records based on the ‘integer’ value of the field specified by the Sort Field parameter. The final hit list is sent to the hit_list port. The sort ordering of the records on the hit list is stable.

../../../../../_images/HitListCubeIcon.svg

Records which have a Sort Field but are not included in the hit list are written to the discard output port. The order of records emitted to the discard port is not stable. Any records which do not contain a Sort Field are written to the missing output port. These records are written as encountered so they are in relative input order.

If Keep Ties parameter is true, then it indicates that ties in the last position of the hit list will be kept and the output hit list may have more than Hit List Size. If Keep Ties parameter is false, then the hit list is truncated to exactly Hit List Size irrespective of ties in the last position.

If ties are allowed, the Hit List Truncate parameter is used to guard against pathological behavior for lists of low cardinality. If there are a large number of ties, a hit list can grow to much larger than the desired hit list size. The Hit List Truncate parameter sets the absolute maximum hit list size, expressed as a multiple of Hit List Size. If the hit list grows to that size it is truncated without regard to ties.

Calculation Parameters

  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Cube Metrics (cube_metrics) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network
  • Descending (descending) type: boolean: This parameter determines whether the list will be sorted in descending or ascending order.
    Default: False
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • Hit List Size (hit_list_size) type: integer: The desired size of the hit list.
    Default: 1 , Min: 1, Max: 100000
  • Hit List Truncate (hit_list_truncate) type: integer: The maximum size of hit list, as a multiple of desired size, if ties are allowed.
    Default: 4 , Min: 1
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Keep Ties (keep_ties) type: boolean: This parameter indicates whether to keep identical values even when exceeding the desired hit list size.
    Default: False
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Metric Period (metric_period) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required

Field parameters

  • Integer Sort Field (None) type: Field Type: Int: Record field containing the key value to sort by
  • Integer Sort Field (sort_field) type: Field Type: Int: Record field containing the key value to sort by

Hit List Parameters

  • Hit List Size (None) type: integer: The desired size of the hit list.
    Default: 1 , Min: 1, Max: 100000
  • Descending (None) type: boolean: This parameter determines whether the list will be sorted in descending or ascending order.
    Default: False
  • Keep Ties (None) type: boolean: This parameter indicates whether to keep identical values even when exceeding the desired hit list size.
    Default: False
  • Hit List Truncate (None) type: integer: The maximum size of hit list, as a multiple of desired size, if ties are allowed.
    Default: 4 , Min: 1

Hardware Parameters

Machine hardware requirements
  • Memory (MiB) (memory_mb) type: decimal: The minimum amount of memory in MiBs (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 1800 , Min: 256.0, Max: 8589934592
  • Shared Memory (MiB) (shared_memory_mb) type: decimal: The amount of shared memory to allow a container to address
    Default: 64
  • Thread limit per CPU (pids_per_cpu_limit) type: integer: The number of threads per CPU
    Default: 32
  • Max Backlog Wait (max_backlog_wait) type: integer: The max time (in seconds) that a cube will be backlogged on a group before being re-evaluated
    Default: 600 , Min: 300
  • Temporary Disk Space (MiB) (disk_space) type: decimal: The minimum amount of disk space in MiB (1048576 B) this cube requires. Due to overhead, request a couple hundred MiB more than required.
    Default: 5120.0 , Min: 128.0, Max: 8589934592
  • GPUs (gpu_count) type: integer: The number of GPUs to run this cube with
    Default: 0 , Max: 16
  • CPUs (cpu_count) type: integer: The number of CPUs to run this cube with
    Default: 1 , Min: 1, Max: 128
  • Instance Type (instance_type) type: string: The type of instance that this cube needs to be run on
  • Spot policy (spot_policy) type: string: Control cube placement on spot market instances
    Default: Prohibited
    Choices: Allowed, Preferred, NotPreferred, Prohibited, Required
  • Instance Tags (instance_tags) type: string: Only run on machines with matching tags (comma separated)
    Default: “”

Metrics Parameters

Cube Metric Parameters
  • Metric Period (None) type: decimal: How often to sample metrics, in seconds
    Default: 60
    Choices: 1, 5, 10, 30, 60, 120, 180, 240, 300, Min: 1, Max: 300
  • Cube Metrics (None) type: string: Set of metrics to be collected

    Choices: cpu, disk, memory, network