Press Releases

Run:AI Eliminates GPU Resource Allocation Issues For AI Workloads

by
Fara Hain
October 6, 2021

New Features, ‘Thin GPU Provisioning’ and ‘Job Swapping’, Together Ensure Automated Resource Allocation and 100% Utilization

October 6, 2021 – Tel Aviv, Israel. Run:AI, a leader in compute orchestration for AI workloads, today released two new AI technologies, ‘Thin GPU Provisioning’ and ‘Job Swapping’ which together completely automate the allocation and utilization of GPUs. Together with the rest of Run:AI’s solution, Thin GPU Provisioning and Job Swapping can bring AI cluster utilization to almost 100%, ensuring no resources are sitting idle.

Data Scientists often receive an allocation of Graphics Processing Units (GPUs), reserving those chips so they cannot be used by others. With Thin GPU Provisioning and Job Swapping, whenever a running workload is not utilizing its allocated GPUs, those resources can be provisioned and used by a different workload. This innovation is similar to classic ‘Thin Provisioning’, first introduced by VMware for Storage Area Networks, where available storage disk space is allocated but not provisioned until necessary.

Thin GPU Provisioning essentially creates over-provisioned GPUs; Run:AI’s new Job Swapping feature, developed in parallel, enables the platform to seamlessly swap workloads that have been allocated the same GPU resources based on pre-set priorities. Together, the two features ensure that enough GPU resources are available for all researchers in an automated way.

Combined with Run:AI’s other hardware abstraction features such as over-quota management and splitting GPUs into fractions, Thin GPU Provisioning with Job Swap gives researchers as much GPU compute power as they need, when they need it. Data Scientists don’t have to deal with scheduling and provisioning, with the Run:AI platform abstracting the management away.

Thin GPU Provisioning and Job Swapping are currently in testing in Run:AI customer labs. They are expected to be generally available in Q4 2021.

About Run:AI
Run:AI is a cloud-native compute management platform for the AI era. Run:AI gives data scientists access to all of the pooled compute power they need to accelerate AI development and deployment – whether on-premises or in the cloud. The platform provides IT and MLOps with real-time visibility and control over scheduling and dynamic provisioning of GPUs to deliver more than 2X gains in utilization of existing infrastructure. Built on Kubernetes, Run:AI enables seamless integration with existing IT and data science workflows. Learn more at www.run.ai.