Elasticity - Dynamically Shrink & Expand Workloads

Run:AI’s elasticity feature essentially allows use of more resources, speeding up runtime of training when resources are available, while also allowing a job to run when there are seemingly no available resources. 

We refer to this as Elasticity, which works by Expanding and Shrinking workloads:

Expanding Workloads

Convert spare capacity to speed

Run:AI allows the addition of available resources in runtime to accelerate running workloads. This works because of Run:AI’s automated distributed computing capabilities, applying data parallelism to running workloads. By using data parallelism, every set of input samples is divided between the GPU resources. Each GPU calculates updates independently based on a subset of the input samples and then a global update is calculated based on all of the partial updates. This, in turn, translates into automated speedups and increased cluster utilization.

Shrinking Workloads

Gradient Accumulation helps when resources are limited

Often, when a job’s request for GPUs cannot be satisfied by the system, jobs fall into a pending state, but using Run:AI’s elasticity feature the system automatically shrinks the workload and runs it on less resources than requested. 

This works because of a Run:AI feature known as “gradient accumulation”, where users can control and accelerate training times even on one GPU. The training model processes data samples and gets some updates, then processes another set of data samples. The gradients, or updates to the model, can be accumulated. The average of the gradients is taken and a new model is created, even with limited resources for the job.

We’ve open-sourced a gradient accumulation tool that can help when resources are limited.

See how you can move AI models into production faster – simply by optimizing GPU resources with Run:AI.