Virtualization Software for AI Infrastructure

What Powerful AI Could You Create With Unlimited Compute?

Run:AI's orchestration and virtualization platform optimizes compute for AI infrastructure and speeds up data science initiatives.

AI and Deep Learning

Get More Out of Existing GPU Infrastructure

From Fractional GPU to Multi-Node and Multi-Cluster Scale

Run:AI’s Deep Learning (DL) acceleration platform helps organizations manage GPU resource allocation and increase cluster utilization. Run:AI pools compute resources and then applies advanced scheduling to dynamically set policies and orchestrate jobs. IT gains full control over Graphics Processing Unit (GPU) utilization across nodes, clusters, and sites, while data scientists gain easy access to compute when and how they need it. 

The Run:AI Platform

The diagram below shows where Run:AI software integrates with your AI software and hardware stack. Integration at each level is seamless and supports the build, train and inference stages of data science workflows.

Virtualization and orchestration layer for containerized AI workloads

Run:AI’s abstracts data science workflows from AI infrastructure, pooling all GPU resources both on premises and cloud. IT teams retain control and gain real-time visibility and data science teams have automatic access to as many resources as they need.

Guaranteed compute power when you need it 

The platform is based on a virtualization layer built for containers, where multiple containers can share a single GPU, using memory isolation and shared processing cores. Users can also use more GPU than they have been allocated, as long as there are idle GPU available. 

Automated policy management with the Kubernetes-based Run:AI Scheduler

The scheduler manages requests for compute using smart queuing based on preset priorities. Using gang scheduling, batch scheduling, and topology awareness, the platform can dynamically allocate resources while maximizing GPU resources.

Build, train and inference jobs get the resources they need

Run:AI pools GPU into logical environments based on data science workflows so that GPU utilization is maximized for the specific needs of the workload. Data scientists can automatically provision resources without depending on IT admins.

Simple management of complex workloads

Easily onboard new users, maintain and add new hardware to the infrastructure pool and gain visibility, including a holistic view of GPU usage and utilization.

Accelerate Time-to Market for AI

Kubernetes-based Batch System Dynamically Allocates Resources

Moving from static to dynamic allocation of resources, researchers can now use as much compute as their models need and see two to three times faster results from DL modeling. Run:AI’s unique approach adds GPU virtualization and advanced scheduling capabilities to Kubernetes, allowing researchers to work in a familiar environment and to easily launch jobs without waiting for IT to allocate resources.

Fractional GPU is a GPU sharing system for containers. Especially suited for AI workloads such as inference and lightweight training tasks, the fractional GPU system transparently gives data science and AI engineering teams the ability to run multiple workloads simultaneously on a single GPU.

Proud to Partner with

"With Run:AI we’ve seen great improvements in speed of experimentation and GPU hardware utilization. Reducing time to results ensures we can ask and answer more critical questions about people’s health and lives."

– M. Jorge Cardoso
Associate Professor & Senior Lecturer in AI London Medical Imaging & AI Centre for Value-Based Healthcare

Integrations