Scaling Groups

This page displays Orion scaling group information.


The Scaling Groups subpage details the various compute scaling groups available to Orion. The two tables below represent spot and non-spot/on-demand EC2 instances.

Scaling Groups
Name Description

EC2 instance type and specification.

  • GPU types are highlighted in green.
  • Mousing over the Information icon provides further details about the Auto Scaling Group (ASG).
  • Group Name represent either an Internal/AWS group name.
  • Affinity is a Scheduler feature to increase the preference of a group (the default is 0).

Indicates the state of an ASG.

  • Healthy: New instances can be added.
  • Drain: Due to administrative action, existing instances will not be given more work and will be terminated as they finish their current work. New instances will not be added.
  • Not Configured: A new group that will attempt to start a new instance to finish configuration.
  • Scaling Suspended: Existing instances accept work, but new instances cannot be added due to AWS capacity or spot market issues.
  • Failed: Due to persistent errors, existing instances will not be given more work and will be terminated. New instances will not be added.
  • Suspect: Non-scaling AWS errors are occurring. Existing instances will accept work, but new instances cannot be added. If errors continue, the group state will move to “Failed”.

ASG policy adjusts:

  • the desired capacity of the group, between the minimum and maximum capacity values.
  • launches or terminates the instances as needed. The number of instances increase (Up Arrow) or decrease (Down Arrow) dynamically to meet changing conditions.

Shows the state of the ASG and whether it is Active or Deactivated.

Healthy: Number of instances currently available to do process work.
Desired Instances: Number of instances the scheduler would like to have as healthy.
Min. Size: Minimum size of the group (the default is 0).
Max. Size: Maximum size of the group (a useful value to limit spend in Orion).
Usage Amount of resources currently provided by the ASG.
Cost/Hour Hourly instance cost. If spot, will update regularly.
Pool Scheduler feature used to segregate tasks into separate scaling groups.
Edit Allows an Orion Stack admin to manage an ASG. Available options are Min. Size, Max. Size, Min Reserve, Affinity, and State.

As tasks are submitted to Orion, the scheduler decides where to place the work based on the hardware and spot requirements of the Cubes, as well as other factors (such as pool or affinity). As the workload grows, more instances are launched. This is first seen by an increased desired instances count; soon thereafter, the healthy instances should match this. Desired and healthy instances are not allowed to exceed the maximum size.

Once work is complete and Orion starts to scale down, users then see the desired count drop far more quickly than the healthy count. This is for two reasons: (1) those instances are likely still working on their current task, and (2) Orion does not terminate instances immediately after they complete work as startup time is real (several minutes depending on instance type and pricing model), so they remain as hot instances for new work.

If the desired count is higher than the healthy count for a long period, this likely means that either the spot price is now out-bid or there is no/limited spot availability (typically the latter).

Data conversion Floes and special tasks such as Iterative Design can be performed in the system pool, whereas regular jobs use the default pool.