Why Run:AI - Elastic GPU Clusters and Gradient Accumulation

January 13, 2022

Ready for a demo of Run:ai?

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Elasticity - Dynamically Shrink & Expand Workloads

Run:AI’s elasticity feature essentially allows use of more resources, speeding up runtime of training when resources are available, while also allowing a job to run when there are seemingly no available resources.

We refer to this as Elasticity, which works by Expanding and Shrinking workloads:

Expanding Workloads - Convert spare capacity to speed

Run:AI allows the addition of available resources in runtime to accelerate running workloads. This works because of Run:AI’s automated distributed computing capabilities, applying data parallelism to running workloads. By using data parallelism, every set of input samples is divided between the GPU resources. Each GPU calculates updates independently based on a subset of the input samples and then a global update is calculated based on all of the partial updates. This, in turn, translates into automated speedups and increased cluster utilization.

What is Gradient Accumulation?

Overcoming the problem of batch size and available GPU memory in training neural networks

Run:AI has developed and open sourced a feature known as “gradient accumulation”, where users can run training jobs even when there are not enough available resources. The model processes data samples and gets some updates, then processes another set of samples and gets additional updates. The gradients, or updates to the model, can be accumulated. The average of the gradients is taken and a new model is created, even with limited resources for the job.

Consider the following example. There is a single machine with two GPUs, and one GPU is already in use by a job. When a new job comes in, a job that requires the full resource (two GPUs), it would typically fall into a pending state, which is inefficient since one GPU is idle and in theory, could be used. With Run:AI, no job would fall into a pending state. Instead, using Run:AI’s elasticity feature, we essentially shrink the workload and still run it on a single GPU. It will run more slowly, but the data scientist will be able to begin the job. When the second GPU becomes available the job would dynamically expand back to use two GPUs.

Learn more about Run:AI's Gradient Accumulation »