NVIDIA Container Toolkit: Benefits, Architecture & Tutorial

What Is the NVIDIA Container Toolkit?

The NVIDIA Container Toolkit is an open source toolkit designed to simplify the deployment of GPU-accelerated applications in Docker containers. It provides a user-friendly interface for interacting with NVIDIA GPUs, making it easier for developers and system administrators to leverage GPUs for data science and machine learning tasks.

The Container Toolkit enables containers to leverage the underlying NVIDIA GPUs, abstracting the complex details of GPU communication, and providing a high-level interface that integrates directly with Docker.

The NVIDIA Container Toolkit is part of the larger NVIDIA GPU Cloud (NGC) ecosystem. The NGC provides a comprehensive platform for GPU-accelerated software development, offering a range of tools, libraries, and frameworks to assign with the development, deployment, and execution of GPU-accelerated applications.

This is part of a series of articles about AI open source projects

In this article:

Why Use NVIDIA Container Toolkit for Machine Learning Applications?
NVIDIA Container Toolkit Architecture
Quick Tutorial: Installing and Configuring the NVIDIA Container Toolkit
Managing AI Infrastructure with Run:ai

Why Use NVIDIA Container Toolkit for Machine Learning Applications?

Here are some of the key benefits of using the Container Toolkit for machine learning tasks:

Easier deployment: With Docker, you can package your application and all its dependencies into a Docker image. This image can be deployed on any system that has Docker installed. This simplifies deployment, eliminating complex setup procedures, and ensures that your application runs consistently across environments.
Easier parallelization: By running multiple Docker containers simultaneously, you can experiment with different versions of your application, test different configurations concurrently, or separate a task into smaller components and run them on different GPUs. This can greatly speed up the development and experimentation process.
Optimized resource allocation: With the NVIDIA Container Toolkit, Docker can leverage NVIDIA GPUs, enabling you to run GPU-accelerated applications in Docker containers. This provides an efficient way to allocate GPU resources, allowing you to maximize hardware utilization.
GPU isolation: You can assign specific GPUs to specific Docker containers. This allows you to control how your GPU resources are allocated, ensuring that each container gets the resources it needs to run optimally.
Dynamic scalability: Docker's built-in orchestration capabilities, such as Docker Compose and Swarm, lets you scale your application to handle increased workloads. You can also use Docker in combination with orchestration tools like Kubernetes to achieve more complex orchestration.

NVIDIA Container Toolkit Architecture

The NVIDIA Container Toolkit has a modular architecture, consisting of several components that work together to provide a seamless interface for GPU-accelerated Docker containers.

The NVIDIA Container Runtime

The NVIDIA Container Runtime is a key component of the Toolkit. It extends Docker runtime, allowing it to understand and handle NVIDIA GPUs. This makes it possible for Docker containers to directly interface with NVIDIA GPUs.

The Runtime is built on top of the CUDA platform, NVIDIA's parallel computing platform and application programming interface. This enables it to interact with NVIDIA GPUs at a low level, providing the necessary infrastructure for running optimized GPU-accelerated applications in Docker containers.

The NVIDIA Container Runtime Hook

The NVIDIA Container Runtime Hook is a piece of software that gets executed during the Docker container lifecycle, specifically at container start time. The Hook is responsible for setting up the GPU environment inside the Docker container, ensuring that the container can interact with the underlying NVIDIA GPUs.

The NVIDIA Container Library and CLI

The NVIDIA Container Library and CLI (Command-Line Interface) are the final pieces of the Toolkit architecture. The Library provides the high-level interface for interacting with NVIDIA GPUs, while the CLI provides a user-friendly command-line interface for managing GPU-accelerated Docker containers.

The Library abstracts the complex details of GPU communication, providing a high-level API that integrates seamlessly with Docker. The CLI provides a set of commands for managing GPU-accelerated Docker containers.

Key RAPIDS Libraries

As a collection of libraries tailored for specific tasks, RAPIDS offers a modular approach that greatly enhances its usability and flexibility. Let's look deeper into these key libraries.

Data Preprocessing: cuDF

The cuDF library is a GPU-accelerated dataframe library, similar to Pandas. It provides a set of dataframe manipulation methods that are optimized for GPUs. cuDF is fast, often outperforming traditional CPU-based methods by orders of magnitude.

With cuDF, you can perform a wide range of operations on your dataframes, including filtering, sorting, joining, and aggregating. This can reduce the time spent on preprocessing, allowing you to focus more on the actual analysis and model building.

This library also seamlessly integrates with Pandas. You can easily convert my Pandas dataframes to cuDF dataframes and vice versa. This means you can continue using the familiar Pandas syntax while benefiting from the speed of GPU acceleration.

Big Data Processing: RAPIDS Accelerator for Apache Spark

When dealing with big data, Apache Spark is a popular solution for many data professionals. The RAPIDS Accelerator for Apache Spark is a plugin that enables the use of RAPIDS libraries in Spark applications. It takes advantage of the GPU acceleration capabilities of RAPIDS to make Spark tasks run much faster. It can significantly reduce the time needed for big data processing.

The RAPIDS Accelerator for Apache Spark supports a wide range of Spark tasks, including SQL, DataFrame, Dataset, and MLlib operations. This means that you can continue using your existing Spark workflows while adding GPU acceleration.

Machine Learning: cuML

cuML is a suite of machine learning algorithms that are optimized for GPUs. It includes a wide range of algorithms for classification, regression, clustering, and dimensionality reduction. The algorithms in cuML are compatible with scikit-learn, one of the most popular machine learning libraries in Python. This makes it relatively easy to integrate cuML into existing machine learning workflows.

You can use cuML to improve the speed and efficiency of machine learning tasks. Training models and making predictions are faster, allowing you to iterate through different models and parameters more quickly.

Graph Analytics: cuGraph

cuGraph is a collection of graph algorithms that are optimized for GPUs. It includes algorithms for graph traversal, community detection, and centrality measures. The performance of these algorithms on GPUs far exceeds that of traditional CPU-based methods.

cuGraph also integrates with NetworkX, a popular graph analysis library in Python. This means that you can easily convert NetworkX graphs to cuGraph and utilize GPU acceleration for graph analytics tasks.

Vector Search: RAFT

RAFT is a library for vector search on GPUs. Vector search is a crucial task in many applications, including recommendation systems and information retrieval.

RAFT provides a set of vector search methods that are optimized for GPUs. It supports both exact and approximate search methods, catering to a wide range of use cases. It helps improve the speed and accuracy of vector search tasks.

Quick Tutorial: Installing and Configuring the NVIDIA Container Toolkit

Installing Container Toolkit with Apt

Step 1: Set up the production repository

Execute the following command to import the GPG key from NVIDIA's production repository:


curl -fsSL https://NVIDIA.github.io/libNVIDIA-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/NVIDIA-container-toolkit-keyring.gpg

Then, replace the standard http repository url with one that employs the imported key:


curl -s -L https://NVIDIA.github.io/libNVIDIA-container/stable/deb/NVIDIA-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/NVIDIA-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/NVIDIA-container-toolkit.list

If desired, modify the repository settings to use experimental packages. Do this by uncommenting any lines in your sources file that contain 'experimental':


sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/NVIDIA-container-toolkit.list

Step 2: Refresh your system's package index

Run the following command:


sudo apt-get update

Step 3: Install the NVIDIA Container Toolkit packages

Run this command:


sudo apt-get install -y NVIDIA-container-toolkit

That’s it! You’ve set up your system with the NVIDIA Container Toolkit via the Apt package handler.

Configuring containerd for Kubernetes

Step 1: Set up container runtime

Use this command to set up the containerd runtime, which is compatible with Kubernetes:


sudo NVIDIA-ctk runtime configure --runtime=containerd

This particular command modifies the existing /etc/containerd/config.toml file located on the host system. This enables containerd to efficiently utilize the NVIDIA Container Runtime.

Step 2: Proceed to reboot containerd


sudo systemctl restart containerd

Running a Sample Workload with Docker

After the toolkit has been installed, and an NVIDIA GPU Driver has been set up, you can execute a sample task with a GPU-enabled Docker container.

To run a sample CUDA container:


sudo docker run --rm --runtime=NVIDIA --gpus all ubuntu NVIDIA-smi

The output should look like this:

Managing AI Infrastructure with Run:ai

As an AI developer, you will need to manage large-scale computing architecture to train and deploy AI models. Run:ai automates resource management and orchestration for AI infrastructure. With Run:ai, you can automatically run as many compute intensive experiments as needed.

Here are some of the capabilities you gain when using Run:ai:

Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
A higher level of control—Run:ai enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:ai simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.

Learn more about the Run:ai GPU virtualization platform.

NVIDIA Container Toolkit

Benefits, Architecture, and Quick Tutorial