Announcing Run:ai v2.17

Today, we are excited to announce Run:ai v2.17. With this release, we run towards our goal to optimize AI resources while easing platform engineers’ and data scientists’ daily routines to increase your organization’s ROI.

‍Accelerating Experimentation: Internal LLM Catalog for Enterprises

Introducing LLM Catalog by Run:ai—a solution for organizations to run, maintain, and secure their internal LLM catalog. Recognizing the security needs, especially regarding the sending prompts outside the organization, and challenges of complex deployment processes, the catalog simplifies management and deployment, empowering organizations to integrate the latest models into their environment with just a few clicks. By packaging models into deployable images, the LLM Catalog removes the burden of manual server and engine setup for ML engineers, allowing data scientists to dive into state-of-the-art models without delay. Furthermore, Run:ai's auto-scaling feature optimizes resource utilization, dynamically adjusting to demand and scaling to zero during idle periods to minimize costs. For more information, check out our blog post.

Core Enhancements: Dynamic GPU Fractions and Node Level Scheduler

Version 2.17 brings major core enhancements for users: Dynamic GPU Fractions and Node Level Scheduler. These two cutting-edge features aim at optimizing GPU resource management for AI workloads.

Dynamic GPU Fractions enable dynamic allocation of your GPU resources based on real-time workload demands, maximizing resource utilization and minimizing waste. This feature allows users to specify a fraction of GPU memory or compute resources required for their workloads along with an upper limit. These workloads are guaranteed to get the required fraction while having the flexibility to allocate resources up to specified upper limit. Complementing this is the Node Level Scheduler, which enhances GPU utilization and pod performance by making localized decisions on GPU allocation based on the node's internal GPU state. For a more detailed overview on both, check out our blog post.

API-First Approach: Enhanced Telemetry and Metrics of your Workloads

2.17 delivers enhanced telemetry and metrics capabilities with our latest API and metrics service. Easily access and analyze workload telemetry data through a RESTful API and pre-built dashboards, empowering infrastructure teams to consume telemetry data via APIs and build customized reporting and data analysis. Tailored for various roles, from data analysts to executives, Run:ai platform offers granular data access, custom filters, and advanced queries for in-depth analysis. For more information, check out our blog post.

Accelerate your distributed training jobs with Pytorch Elastic Workloads

In addition to the native support for submitting distributed Pytorch workloads, 2.17 includes integration of Pytroch elastic workloads. Elastic workloads enable data scientists to submit distributed training jobs that can dynamically scale up or down based on available resources in the cluster while maintaining their state.

The key advantage of elastic workloads is that they can run minimally or grow to a very large number of pods. They retain their state even when scaled down and seamlessly expand again when resources become available. With this update, data scientists can submit elastic workloads using the Run:ai CLI, maximizing resource utilization in the cluster and accelerating training job times.

‍Improved Data Scientist Experience and Improved Monitoring/Metrics

Building upon the new workloads experience introduced in 2.16, in 2.17, workloads are getting even better. Now, monitoring workload performance is easier with per-pod per GPU utilization metrics. Users can follow the utilization of each single GPU in the desired pod, zoom in and out, and analyze workload performance. Additionally, CPU limits are now visible on the metrics graph, providing better understanding of the workload state.

2.17 also includes user experience improvements: now users can view a workload’s external connections like Node Ports and Ingress, and directly connect to supported tools or copy to a clipboard. Researchers will now be able to see on the UI which jobs are preemptible, providing clarity on workload status. Furthermore, users can view and copy to the clipboard the CLI command syntax for workloads submitted via the CLI, enhancing ease of use. For more information, check out our blog post.

As Run:ai continues to innovate, stay tuned for more features to meet workload management needs.

Ready to get started? Book your demo today and see how Run:ai can help you accelerate AI development and increase efficiency.