Running ML and Big Data Faster with GPUs


NVIDIA RAPIDS is an open-source GPU-acceleration platform for large-scale data analytics and machine learning. It includes a variety of libraries that integrate with popular data science software, such as Apache Arrow, Pandas, and scikit-learn, and accelerate them using GPUs.

RAPIDS can significantly speed up data preprocessing and machine learning training tasks. It achieves this by providing a familiar API that mirrors the one used by Pandas and scikit-learn, minimizing the learning curve.

NVIDIA RAPIDS is an ecosystem of libraries rather than a single tool. These libraries cover a broad range of needs, from data ingestion and data manipulation to model training and graph analytics.

You can access all NVIDIA RAPIDS libraries from the GitHub overview page.

This is part of a series of articles about AI open source projects

In this article:

What Is the Connection Between NVIDIA and RAPIDS?

NVIDIA developed the RAPIDS platform, and designed it to make optimal use of its popular GPU hardware. NVIDIA’s GPUs are designed to handle parallel processing tasks efficiently, and RAPIDS helps machine learning applications make full use of their capabilities.

One of the ways RAPIDS harnesses NVIDIA's GPU power is through CUDA, a parallel computing platform and application programming interface (API) model created by NVIDIA. CUDA gives developers direct access to the virtual instruction set and memory of the parallel computational elements in GPUs. CUDA can be complex to learn and use, and RAPIDS provides convenient APIs that allows developers to leverage CUDA while using familiar interfaces, as part of their existing machine learning toolset.

Benefits of RAPIDS for Data Science and Analytics Pipelines

Let’s look at some of the ways that NVIDIA RAPIDS helps streamline and strengthen data-intensive tasks.


By utilizing the parallel processing capabilities of GPUs, RAPIDS can distribute data and computations across multiple GPU cores, significantly reducing the time required for data preprocessing and analysis tasks. This means that data scientists can handle datasets of sizes that were previously unmanageable or would take prohibitively long to process on CPUs alone.

Moreover, RAPIDS facilitates the use of distributed computing frameworks like Dask, which further enhances its scalability. Dask integrates seamlessly with RAPIDS libraries, enabling distributed dataframes and machine learning models that can scale across multiple GPUs and nodes in a cluster.


The RAPIDS platform provides high-precision computations that enable accurate results in data analytics and machine learning tasks.

RAPIDS libraries, like cuDF and cuML, mimic the functionality of Pandas and scikit-learn but are designed for GPU-accelerated computations. This means you can apply the same familiar data manipulation and machine learning operations while benefiting from the high precision offered by NVIDIA GPUs.


By leveraging NVIDIA GPUs and the CUDA platform, RAPIDS can accelerate data processing and machine learning tasks. This speed-up can be up to 50 times faster than CPU-only solutions, depending on the specific tasks and the size of the datasets.

In addition to faster results, this increase in speed also means that data scientists and machine learning engineers can iterate quickly, experiment with different models, and fine-tune their algorithms more efficiently. This can lead to better results, more innovative solutions, and a significant reduction in time-to-insight.

Open Source

NVIDIA RAPIDS is an open-source platform. This means that it's freely available to anyone who wants to use it, and its source code is open for anyone to inspect, modify, and enhance.

As an open-source platform, RAPIDS also benefits from the contributions of a community of developers and data scientists. The contributors aid in the platform's development, share knowledge, and help troubleshoot issues.

Key RAPIDS Libraries

As a collection of libraries tailored for specific tasks, RAPIDS offers a modular approach that greatly enhances its usability and flexibility. Let's look deeper into these key libraries.

Data Preprocessing: cuDF

The cuDF library is a GPU-accelerated dataframe library, similar to Pandas. It provides a set of dataframe manipulation methods that are optimized for GPUs. cuDF is fast, often outperforming traditional CPU-based methods by orders of magnitude.

With cuDF, you can perform a wide range of operations on your dataframes, including filtering, sorting, joining, and aggregating. This can reduce the time spent on preprocessing, allowing you to focus more on the actual analysis and model building.

This library also seamlessly integrates with Pandas. You can easily convert my Pandas dataframes to cuDF dataframes and vice versa. This means you can continue using the familiar Pandas syntax while benefiting from the speed of GPU acceleration.

Big Data Processing: RAPIDS Accelerator for Apache Spark

When dealing with big data, Apache Spark is a popular solution for many data professionals. The RAPIDS Accelerator for Apache Spark is a plugin that enables the use of RAPIDS libraries in Spark applications. It takes advantage of the GPU acceleration capabilities of RAPIDS to make Spark tasks run much faster. It can significantly reduce the time needed for big data processing.

The RAPIDS Accelerator for Apache Spark supports a wide range of Spark tasks, including SQL, DataFrame, Dataset, and MLlib operations. This means that you can continue using your existing Spark workflows while adding GPU acceleration.

Machine Learning: cuML

cuML is a suite of machine learning algorithms that are optimized for GPUs. It includes a wide range of algorithms for classification, regression, clustering, and dimensionality reduction. The algorithms in cuML are compatible with scikit-learn, one of the most popular machine learning libraries in Python. This makes it relatively easy to integrate cuML into existing machine learning workflows.

You can use cuML to improve the speed and efficiency of machine learning tasks. Training models and making predictions are faster, allowing you to iterate through different models and parameters more quickly.

Graph Analytics: cuGraph

cuGraph is a collection of graph algorithms that are optimized for GPUs. It includes algorithms for graph traversal, community detection, and centrality measures. The performance of these algorithms on GPUs far exceeds that of traditional CPU-based methods.

cuGraph also integrates with NetworkX, a popular graph analysis library in Python. This means that you can easily convert NetworkX graphs to cuGraph and utilize GPU acceleration for graph analytics tasks.

Vector Search: RAFT

RAFT is a library for vector search on GPUs. Vector search is a crucial task in many applications, including recommendation systems and information retrieval.

RAFT provides a set of vector search methods that are optimized for GPUs. It supports both exact and approximate search methods, catering to a wide range of use cases. It helps improve the speed and accuracy of vector search tasks.

Tutorial: Getting Started with RAPIDS


It's crucial to ensure that every proposed system is compatible with RAPIDS. Here's what you need:

Graphic Processing Unit (GPU): Your machine should have NVIDIA Volta™ or higher GPUs that have a compute capability of at least 7.0. From RAPIDS 24.02 version onwards, only GPUs with a compute capability of 7.0 or higher are supported.

Operating System (OS): Your system must run on one of the following operating systems:

  • Ubuntu (Version 20.04 or 22.04) or CentOS 7 / Rocky Linux 8, with gcc/++ 9.0 or higher
  • Windows 11 with a Windows Subsystem for Linux (WSL2) specific installation
  • Red Hat Enterprise Linux (RHEL) 7/8 compatibility can be achieved via CentOS 7 / Rocky Linux 8 builds and installations

CUDA and NVIDIA Drivers: The following versions are acceptable for a smooth-running system:

  • CUDA 11.2, accompanied by NVIDIA Driver 470.42.01 or a newer release
  • CUDA 11.4 or higher, coupled with Driver 470.42.01 or above
  • CUDA 11.5, bundled with Driver 495.29.05 or later
  • CUDA 11.8 with Driver 520.61.05 or later
  • CUDA 12.0 and beyond, with Driver 525.60.13 or later

For Docker and Conda:

  • Conda packages and Docker images that support CUDA 12 are currently available for CUDA 12.0
  • CUDA 11 Conda packages and Docker images can function on a system integrated with a CUDA 12 driver, as they come with their own CUDA toolkit

For pip:

  • pip installations require the use of a wheel file in line with the system's available CUDA toolkit
  • For systems equipped with CUDA 11 toolkits, install the -cu11 wheels, and for CUDA 12 toolkits, opt for the -cu12 wheels. If you have a CUDA 12 driver but a CUDA 11 toolkit, use the -cu11 wheels.

Recommended system configuration

Beyond the above prerequisites, NVIDIA recommends having the following for an optimal setup:

  • An SSD drive, preferably NVMe
  • System Memory to GPU Memory ratio of approximately 2:1, particularly beneficial if you are using Dask
  • NVLink enabled for two or more GPUs

Installing RAPIDS

The preferred method of installation is via Conda. See the official documentation for other options, including Docker and pip.

RAPIDS is compatible with several versions of Conda: Anaconda, Miniconda, and Mamba.

To install Rapids with Miniconda:

  1. Install WSL2 (Windows Subsystem for Linux) and the Ubuntu 22.04 package using Microsoft’s instructions.
  2. Install the latest NVIDIA Drivers on the Windows host.
  3. Log in to the WSL2 Linux instance.
  4. Download and execute the installation script using the command below:
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shbash Miniconda3-latest-Linux-x86_64.sh
  5. Finalize the installation process in your terminal window. NVIDIA strongly recommends running Conda using conda-init.
  6. Launch Conda in a new terminal window.
  7. Install RAPIDS via Conda. To get the installation command for your exact environment, use the release selector tool (click the buttons matching your setup and the tool will provide the installation command).
  8. Run this code to check that the RAPIDS installation is working:
    import cudf
    print(cudf.Series([1, 2, 3]))

Multi GPU Processing With Run:ai

Run:ai automates resource management and workload orchestration for machine learning infrastructure. With Run:ai, you can automatically run as many deep learning experiments as needed on multi-GPU infrastructure.

Here are some of the capabilities you gain when using Run:ai:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:ai enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:ai simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.

Learn more about the Run:ai GPU virtualization platform.