Why Should AI Infrastructure be Virtualized?

Pool resources to gain control and visibility

AI workloads are often run on bare metal and are allocated statically to data scientists. Static allocation of resources leads to limitations on experiment size and speed, low GPU utilization, and lack of IT controls.

Today, enterprises expect the simplicity and resource availability of the virtualized data center. But as enterprises adopt Deep Learning (DL) initiatives they find that managing infrastructure at scale, simplified maintenance, and visibility, all hallmarks of virtualized infrastructure, are not necessarily available in the AI stack. Virtualization of AI needs to support the unique nature of data science workloads, and still be easy for IT to manage and maintain.

From Static Allocation of Resources to Dynamic

At each stage of the deep learning process, data scientists have specific needs for compute resources. Build stages require CPU or GPU in interactive sessions. Training is highly compute intensive, and requires considerable GPU compute power. Performance and speed are critical, but training is erratic – sometimes concurrent workloads are running other times no workloads are running as data scientists optimize the models. The Inference stage typically requires lower GPU utilization. Dynamic resource allocations, that take into account the process of deep learning, are critical for AI development.

AI Workloads are Containerized

Run:AI Kubernetes plugin simplifies workflows

Data scientists use containers to support their much-needed agility and portability. Kubernetes is currently the de-facto tool for orchestrating containerized applications in enterprise IT environments. For this reason, Run:AI was built as a Kubernetes plugin, enhancing its scheduling capabilities to support the existing workflows of data scientists.

Deep Learning Requires a Different Paradigm

Traditional computing uses virtualization to share a single physical resource between multiple workloads. Deep learning (DL), however, requires a paradigm shift pertaining to virtualization, one that incorporates elements of distributed computing. For AI, virtualization should enable acceleration of a single workload to take as many resources as the workload needs. Run:AI’s “greedy” approach to virtualization better suits the needs of DL workloads, which are highly compute-intensive, often fully utilizing hardware accelerator resources in parallel for days or even weeks.

See how you can move AI models into production faster – simply by optimizing GPU resources with Run:AI.