LLM Catalog by Run:ai

Enterprises today face a pressing need to leverage the capabilities of cutting-edge large language models (LLMs) while addressing risks associated with data security and deployment complexities. Every day, new open-source models appear, offering exciting possibilities for applications across various domains. However, keeping up with this fast evolution and ensuring secure, efficient deployment on AI infrastructure remains a significant challenge, let alone doing it on your own on-prem or cloud infrastructure. Therefore, we introduce the LLM Catalog in our 2.17 release—a solution designed to address these challenges head-on.

The Need: An Internal Model Hub with ready-to-use Interface

Enterprises increasingly aim to create internal model hubs to enable experimentation and integration of open-source models into their projects. However, the essential requirement is that these models operate within the enterprise's environment to address concerns regarding security, privacy, and data leakage risks.

The Problem: Keeping Pace and Ensuring Secure, Efficient Deployment on AI Infrastructure

Enterprises face an ongoing challenge in keeping pace with the continuous release of new open-source models on platforms like HuggingFace. Each of these models offers varying performance in specific tasks. There's no definitive "right" or "wrong" model selection. While organizations often release traditional evaluations of models for specific tasks, such as HellaSwag, TruthfulQA, MMLU, nothing beats hands-on experimentation to ensure practitioners choose the most optimal model for their use case. With many available models to choose from, it is crucial for practitioners to quickly engage with multiple initial models that they are considering using for their use case, enabling a smooth transition into fine-tuning and/or deploying solutions using processes such as Retrieval-Augmented Generation (RAG), whether for internal use within the organization or for customer-facing applications. Moreover, the process of deploying and running these models securely and efficiently poses significant hurdles. ML engineers have a tough time setting up everything for all those models—from servers to engines—because there are many aspects to consider for each one of them.

The Solution: Run:ai's LLM Catalog - A Comprehensive Approach to Model Management

Recognizing all these needs and challenges associated with deploying large language models (LLMs), we introduce LLM Catalog—a solution to help organizations effectively manage, maintain, and secure their internal model catalog. By simplifying the deployment process and ensuring secure access, Run:ai empowers enterprises to utilize state-of-the-art LLMs within their own environment on their AI infrastructure.

***Figure 1:*** *Overview to LLM Catalog*

The LLM Catalog consists of multiple models, which are wrapped into images. Pre-set configurations eliminates the hassle to choose, integrate and set up the best environment assets when deploying models.

‍Auto-Scaling for Optimal Resource Utilization & Scaling to Zero

Resource costs associated with LLMs can be significant, making cost optimization crucial for big organizations. To address this challenge Run:ai LLM Catalog incorporates auto-scaling capabilities. When there is high demand for the deployed model, the system will dynamically scale up to meet the users’ needs. During idle periods with no incoming requests, the system can automatically terminate replicas, reducing resource consumption to zero, releasing the resources for other ML projects and tasks. This dynamic resource allocation adjusts to demand, ensuring optimal resource utilization and cost efficiency for organizations integrating LLMs into their workflows.

‍

***Figure 2:*** *Compute Resources and autoscaling can easily be set up by ML engineers depending on the organization’s needs*

‍Envisioning the Future

Looking ahead, we have a tight agenda. We are committed to enhance the capabilities of the LLM Catalog to meet the evolving needs of enterprises. Here is a high level overview of what you can expect from the next releases:

Simplified Model Addition: In the future, adding new models that you finetuned in-house to the catalog will be a straightforward process, expanding the model repository effortlessly.

Out-of-the-box Integration to External Model Registries: We will offer an easier integration with external model registries such as HuggingFace, simplifying the process of incorporating new models into the catalog.

Access Control: We will introduce access control mechanisms, making sure that only authorized users can interact with specific models within the catalog.

Optimized Model Loading: We will continue to optimize the catalog for rapid model loading, making the loading as fast as possible for end users.

Democratized Usage of LLMs: Starting with LLMs, we will implement features that democratize the usage, allowing managers to set token quotas and rate limits to ensure fair usage across different consumers while making these valuable resources more shareable within the organization.

Easy and Secure API Access: We will introduce easy-to-use and secure APIs, enabling interaction with the LLM Catalog.

Final Words

Run:ai LLM Catalog enables organizations to run, maintain, and secure their internal model catalog. By providing secured access to models and eliminating the hassle to choose, integrate and set up environment assets within the deployment process, the catalog empowers practitioners to start experimenting fast. With various open-source models like Llama and Falcon available at their fingertips, MLOps & ML engineers can select and deploy models for their teams’ needs. Moreover, we address the resource cost challenge by implementing auto-scaling, ensuring optimal resource utilization and cost efficiency. With the LLM Catalog, enterprises can start experimenting on the potential of language models while not worrying about data security and deployment complexities.

Curious to learn more? Explore our announcement here or book a demo today and see how Run:ai can help you accelerate AI initiatives and increase efficiency.