Run:AI’s elasticity feature essentially allows use of more resources, speeding up runtime of training when resources are available, while also allowing a job to run when there are seemingly no available resources.
We refer to this as Elasticity, which works by Expanding and Shrinking workloads:
Run:AI allows the addition of available resources in runtime to accelerate running workloads. This works because of Run:AI’s automated distributed computing capabilities, applying data parallelism to running workloads. By using data parallelism, every set of input samples is divided between the GPU resources. Each GPU calculates updates independently based on a subset of the input samples and then a global update is calculated based on all of the partial updates. This, in turn, translates into automated speedups and increased cluster utilization.
Run:AI has developed and open sourced a feature known as “gradient accumulation”, where users can run training jobs even when there are not enough available resources. The model processes data samples and gets some updates, then processes another set of samples and gets additional updates. The gradients, or updates to the model, can be accumulated. The average of the gradients is taken and a new model is created, even with limited resources for the job.
Consider the following example. There is a single machine with two GPUs, and one GPU is already in use by a job. When a new job comes in, a job that requires the full resource (two GPUs), it would typically fall into a pending state, which is inefficient since one GPU is idle and in theory, could be used. With Run:AI, no job would fall into a pending state. Instead, using Run:AI’s elasticity feature, we essentially shrink the workload and still run it on a single GPU. It will run more slowly, but the data scientist will be able to begin the job. When the second GPU becomes available the job would dynamically expand back to use two GPUs.
Here are some more that may interest you.