At the AI Summit in NYC, Omri Geller, Run:AI CEO spoke on the subject of GPU management. This was his session topic:
In many organizations that have taken on deep learning (DL) initiatives, GPU resources are allocated statically or even managed manually inside spreadsheets. Companies buy expensive GPU servers but don’t have control, visibility, or a way to maximize utilization of GPUs efficiently for their users. This leads to training bottlenecks and resource allocation issues, impacting the productivity of data scientists, the time to production for DL projects, and the rapid escalation of infrastructure costs.
In this session, led by Omri Geller of Run:AI, we will examine new approaches to the problem of GPU machine management that can maximize resource utilization for deep learning:
Here are the slides from his presentation.
Feel free to watch the talk here.
Here are some more that may interest you.