NVIDIA CUDA Deep Neural Network (cuDNN) is a GPU-accelerated primitive library for deep neural networks, providing highly-tuned standard routine implementations, including normalization, pooling, back-and-forth convolution, and activation layers.
The cuDNN library allows deep learning framework developers and researchers everywhere to leverage GPU acceleration for high performance. It reduces the need to fine-tune GPU performance at a low level, saving time so you can concentrate on developing your software and training your neural networks. cuDNN acceleration supports popular deep learning frameworks such as Keras, Caffe2, Chainer, MxNet, MATLAB, TensorFlow, and PyTorch.
In this article:
Key features of NVIDIA cuDNN include:
cuDNN enjoys support from Linux and Windows with a variety of mobile GPU and data center architectures, including Ampere, Volta, Turing, Pascal, Kepler, and Maxwell. The latest version of cuDNN is 8.3, which provides improved performance with A100 GPUs (up to five times higher than out-of-the-box V100 GPUs).It also offers new APIs and optimizations for computer vision and conversational AI applications.
The version 8.3 redesign is user-friendly and offers improved flexibility and easy application integration. It includes optimizations to accelerate transformer-based deep learning models, runtime fusion for compiling kernels with new operators, and a smaller download package (reduced by 30%).
NVIDIA cuDNN offers highly-tuned, optimized implementations of common routines for DNN applications. These convolution routines include:
The cuDNN routines offer competitive performance with fast, matrix multiply (GEMM)-based implementations that use less memory. Features of cuDNN include:
The flexibility of cuDNN means you can integrate it into all neural network implementations while avoiding the steps for input/output transposition often required for GEMM-based convolutions. The cuDNN library assumes that the required data for GPU-based operations is directly accessible to the device while also exposing a host API.
Applications using the cuDNN library must call cudnnCreate() to initialize a library context handle. They explicitly pass the handle to each library function operating on GPU data. When an application has finished using cuDNN, it can use the cudnnDestroy() command to release any resources associated with it. Users can control the functioning of the library for multiple GPUs, host threads, and CUDA streams.
For instance, applications can associate specific devices with specific host threads using the cudaSetDevice command. They can use unique cuDNN handles for each host thread, which direct library calls to the associated device. If you make your cuDNN library calls using different handles, they automatically run on the different devices you specified.
The system assumes that any device associated with specific cuDNN contexts remains unchanged from their creation to their destruction (corresponding with cudnnCreate() and cudnnDestroy() calls). If you want the cuDNN library to use another device in the same host thread, you need the application to call cudnnDestroy()to set up the other device. The application must create a new cuDNN context and call cudnnCreate() to associate it with the device.
Related content: Read our guide to CUDA programming
Before you download cuDNN, make sure you have the following installed on your Windows computer:
To download cuDNN, you must register for the NVIDIA Developer Program:
Before you issue any commands, you must specify your chosen versions of CUDA and cuDNN (and the package date) in the x.x and 8.x.x.x fields. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x is the CUDA directory path, while <installpath> is the cuDNN directory path.
Use the following steps:
Before you download cuDNN, make sure you have the following installed on your Linux machine:
Before downloading cuDNN register for the NVIDIA Developer Program. Then do the following:
Note that in the instructions below we refer to your local CUDA path as <local-cuda-path> and your cuDNN download path as <download-path>.
Note that the installation packages for cuDNN are available online. The executable you downloaded is a package manager that automatically downloads and installs them.
To install cuDNN on Ubuntu 18.04 and 20.04:
wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin
sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"
sudo apt-get update
sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
Note that:
To install cuDNN on RHEL7 and RHEL8:
sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.repo
sudo yum clean all
sudo yum install libcudnn8=${cudnn_version}-1.${cuda_version}
sudo yum install
libcudnn8-devel=${cudnn_version}-1.${cuda_version}
Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed on NVIDIA infrastructure.
Here are some of the capabilities you gain when using Run:AI:
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.