The Run:AI deep learning virtualization platform, now supporting Kubernetes, brings control and visibility to IT teams supporting data science initiatives
Tel Aviv — 17 March, 2020 — Run:AI, a company virtualizing AI infrastructure, today announced the general availability of its deep learning virtualization platform. Now supporting Kubernetes-based infrastructures, Run:AI’s solution enables IT departments to set up and manage the critical AI infrastructure that data science teams need, providing control and visibility while maximizing hardware utilization and development velocity.
Data science workloads often need ‘greedy’ access to multiple computing resources such as GPUs for hours on end, but instead face bottlenecks and long experimentation times. Typically, data scientists are statically allocated a few GPUs each, with those expensive hardware resources sitting idle when not used. IT departments struggle to allocate the right amount of resources to data science teams, suffering from poor visibility and a lack of control. Data scientists, meanwhile, either have more GPU capacity than they can currently use, or are limited when they try to run large experiments.
Instead of statically assigning GPUs to data scientists, Run:AI creates a pool of GPU resources, and will automatically and elastically “stretch” a workload to run over multiple GPUs if they’re available. Important jobs can be given guaranteed quotas which they can also exceed, and Run:AI’s software will elastically and automatically scale the workloads to the available hardware based on defined priorities. To simplify workflows, Run:AI’s virtualization platform plugs into Kubernetes with a single line of code. The deep learning virtualization platform’s powerful visibility tools enable companies to understand how their GPU resources are being used by their data science teams, helping with infrastructure scaling and identifying bottlenecks.
“About six months ago, we decided to make our scheduler available as a plug-in to Kubernetes,” said Dr. Ronen Dar, CTO and co-founder of Run:AI. “This approach was based on the widespread adoption of containers and Kubernetes as the de-facto platform for AI workloads. Containers are portable, light, and are a good fit for experiments that need to run for days or weeks on end. Building our powerful platform to simply plug in to Kubernetes makes it seamless to install and requires no additional training or change to a data scientist’s workflows.”
Since leaving stealth in April 2019, Run:AI has expanded its team and is working with dozens of customers while refining the platform. Run:AI’s elastic AI infrastructure management system will develop into a full virtualization layer for deep neural networks.
“Deep learning is creating whole new industries and transforming old ones,” said Omri Geller, co-founder and CEO of Run:AI. “Now it’s time for computing to adapt to deep learning. Run:AI gives both IT and data scientists what they need to get the most out of their GPUs, so they can innovate and iterate their models faster to produce the advanced AI of the future”.
Run:AI has built the world’s first virtualization layer for AI workloads. By abstracting workloads from underlying infrastructure, Run:AI creates a shared pool of resources that can be dynamically provisioned, enabling full utilization of expensive GPU compute. IT teams retain control and gain real-time visibility – including seeing and provisioning run-time, queueing and GPU utilization – from a single web-based UI. This virtual pool of resources enables IT leaders to view and allocate compute resources across multiple sites – whether on premises or in the cloud. The Run:AI platform is built on top of Kubernetes, enabling simple integration with existing IT and data science workflows.