At the AI Summit in NYC, Omri Geller, Run:AI CEO spoke on the subject of GPU management. This was his session topic:
Can New Approaches to GPU Machine Management Speed Delivery of Deep Learning Projects?
In many organizations that have taken on deep learning (DL) initiatives, GPU resources are allocated statically or even managed manually inside spreadsheets. Companies buy expensive GPU servers but don’t have control, visibility, or a way to maximize utilization of GPUs efficiently for their users. This leads to training bottlenecks and resource allocation issues, impacting the productivity of data scientists, the time to production for DL projects, and the rapid escalation of infrastructure costs.
In this session, led by Omri Geller of Run:AI, we will examine new approaches to the problem of GPU machine management that can maximize resource utilization for deep learning:
- What’s been done in the past?
- Could Kubernetes alone, which is growing quickly in adoption, solve the infrastructure management problem?
- Could advanced scheduling concepts be applicable?
- What can we learn or adapt from HPC for DL projects?
- Can virtualization concepts be applied to better utilize GPUs?
Feel free to watch the talk here.