Dominant Resource Factor

Given a set of hardware requirements, one of them will be the most constrained. The Dominant Resource Factor (DRF) is a single number (0, 1] which represents the largest proportion of any resource component. Orion’s DRF is similar to, but not the same as Dominant Resource Fairness. Orion uses DRF to divide instance resources, instead of using it schedule cluster-wide as described in the paper.

The effects of DRF may be surprising at first, potentially giving a cube more resources than it asked for. However, DRF allows Orion to ensure that no islands of resources are left unusable and that jobs are billed accordingly. Suppose an instance has 8 CPUs and 16GiB of memory and a cube requests 1 CPU and 16GiB. If the request is not modified, then the remaining 7 CPUs could not reasonably be allocated to another cube.

Before computing DRF, Orion determines the smallest unit of memory that can be allocated on an instance. That is the total amount of memory divided by the total number of CPUs, divided by eight and scaled up to the power of 2 greater than or equal to 128MiB. The worker’s resized memory is set to a multiple larger than the requested memory. Then DRF is calculated, which may increase one or more of the other hardware resources.

Note

The number of CPUs is scaled up to the nearest larger power of 2. If possible, cubes should be written to take advantage of a variable number of processors. Check the ORION_CPU environment variable.

Dominant GPU Example:

A Cube requires 1 GPU, leaving other resource requirements to their defaults

Orion selects an instance with 1 GPU, 8 CPUs, 8000MiB memory, and 100GiB storage

GPU has a DRF of 1, so the Cube must be given all resources on that instance

Fractional GPU Examples:

A Molsearch database requires 1 CPU, 0.1 GPU, and 1000 MB of memory of a cdns-g1 instance with 96 CPUs, 8 GPUs, and 1000000 MB of memory. Since the CPUs are not evenly divisible by 10, the actual resource allotment would be 2 CPUs, 1/6th of a GPU, and 20833 MB of memory (or 1/48th of the instance).

A Cube requires 1 CPU and 1/12th of a GPU (0.08333) of a g6e.16xlarge with 64 CPUs, 1 GPU, and 497 GB of memory. The CPU resources cannot be subdivided nicely into 12 pieces. So, the actual resources given would be 1/8th of the instance: 8 CPUs, 1/8th of a GPU, and 62125 MB of memory.

Dominant Memory Example:

A Cube requires 1 CPU, 2000MiB memory, and 10GiB storage

Orion selects an instance with 8 CPUs, 8000MiB memory, and 100GiB storage

The proportions of requested/available are 0.125 CPU, 0.25 memory, and 0.1 storage

Memory is the DRF, resizing the Cube’s resources to 2 CPUs, 2000MiB memory, and 25 GiB storage

Dominant Storage Example:

A Cube requires 1 CPU, 2000MiB memory, and 75GiB storage

Orion selects an instance with 8 CPUs, 8000MiB memory, and 100GiB storage

The proportions of requested/available are 0.125 CPU, 0.25 memory, and 0.75 storage

Storage is the DRF, resizing the Cube’s resources to 8 CPUs, 8000MiB memory, and 100 GiB storage

This is an example that may be surprising since the cube could have been given 75% of each resource. However, Orion scales CPUs to powers of 2, so all resources must be scaled up to 100%.