Fine-Tune GPU Performance for Neural Nets


NVIDIA CUDA Deep Neural Network (cuDNN) is a GPU-accelerated primitive library for deep neural networks, providing highly-tuned standard routine implementations, including normalization, pooling, back-and-forth convolution, and activation layers.

The cuDNN library allows deep learning framework developers and researchers everywhere to leverage GPU acceleration for high performance. It reduces the need to fine-tune GPU performance at a low level, saving time so you can concentrate on developing your software and training your neural networks. cuDNN acceleration supports popular deep learning frameworks such as Keras, Caffe2, Chainer, MxNet, MATLAB, TensorFlow, and PyTorch. 

In this article:

NVIDIA cuDNN Features

Key features of NVIDIA cuDNN include:

  • Acceleration of fused operations for all convoluted neural network (CNN) architectures
  • Support for UINT8 and INT8 integer formats and BF16, FP16, FP32, and TF32 floating-point formats 
  • Tensor core performance acceleration for widely used convolutions such as 2D, 3D, grouped, depth-wise separable, and dilated (with NCHW and NHWC inputs and outputs)
  • Kernel optimization for speech and computer vision models such as ResNet, ResNext, EfficientDet, EfficientNet, SSD, MaskRCNN, Tacotron2, Unet, and VNet
  • Integration with all neural network implementations using arbitrary dimension ordering, 4D tensor sub-regions, and striding 

cuDNN enjoys support from Linux and Windows with a variety of mobile GPU and data center architectures, including Ampere, Volta, Turing, Pascal, Kepler, and Maxwell. The latest version of cuDNN is 8.3, which provides improved performance with A100 GPUs (up to five times higher than out-of-the-box V100 GPUs).It also offers new APIs and optimizations for computer vision and conversational AI applications. 

The version 8.3 redesign is user-friendly and offers improved flexibility and easy application integration. It includes optimizations to accelerate transformer-based deep learning models, runtime fusion for compiling kernels with new operators, and a smaller download package (reduced by 30%).

cuDNN Programming Model

NVIDIA cuDNN offers highly-tuned, optimized implementations of common routines for DNN applications. These convolution routines include:

  • Back-and-forth and cross-correlation convolution
  • Back-and-forth pooling
  • Back-and-forth softmax 
  • Back-and-forth neuron activations, including sigmoid, rectified linear (ReLU), and hyperbolic tangent (TANH)
  • Back-and-forth LCN, LRN, and batch normalization 
  • Tensor transformations

The cuDNN routines offer competitive performance with fast, matrix multiply (GEMM)-based implementations that use less memory. Features of cuDNN include:

  • Flexible dimension ordering support
  • Customizable data layouts
  • Striding
  • 4D tensor subregions (which serve as inputs/outputs for all routines)

The flexibility of cuDNN means you can integrate it into all neural network implementations while avoiding the steps for input/output transposition often required for GEMM-based convolutions. The cuDNN library assumes that the required data for GPU-based operations is directly accessible to the device while also exposing a host API.

Applications using the cuDNN library must call cudnnCreate() to initialize a library context handle. They explicitly pass the handle to each library function operating on GPU data. When an application has finished using cuDNN, it can use the cudnnDestroy() command to release any resources associated with it. Users can control the functioning of the library for multiple GPUs, host threads, and CUDA streams.

For instance, applications can associate specific devices with specific host threads using the cudaSetDevice command. They can use unique cuDNN handles for each host thread, which direct library calls to the associated device. If you make your cuDNN library calls using different handles, they automatically run on the different devices you specified.

The system assumes that any device associated with specific cuDNN contexts remains unchanged from their creation to their destruction (corresponding with cudnnCreate() and cudnnDestroy() calls). If you want the cuDNN library to use another device in the same host thread, you need the application to call cudnnDestroy()to set up the other device. The application must create a new cuDNN context and call cudnnCreate() to associate it with the device.

Related content: Read our guide to CUDA programming

Installing cuDNN on Windows


Before you download cuDNN, make sure you have the following installed on your Windows computer:

Download cuDNN for Windows

To download cuDNN, you must register for the NVIDIA Developer Program:

  1. Go to the NVIDIA cuDNN home page and select Download.
  2. Fill in the survey and select Submit.
  3. Accept the terms and conditions to view the list of available cuDNN version downloads.
  4. Choose the cuDNN version you plan to install to view the available resources.
  5. Export the cuDNN archive to your chosen directory.

Install cuDNN 

Before you issue any commands, you must specify your chosen versions of CUDA and cuDNN (and the package date) in the x.x and 8.x.x.x fields. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x is the CUDA directory path, while <installpath> is the cuDNN directory path.

Use the following steps:

  1. Go to the <installpath> directory with cuDNN.
  2. Use cudnn-windows-x86_64-*-archive.zip to unzip your cuDNN package.
  3. Copy these files into your CUDA toolkit directory:
  • <installpath>\cuda\bin\cudnn*.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\bin
  • <installpath>\cuda\include\cudnn*.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\include
  • <installpath>\cuda\lib\x64\cudnn*.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\lib\x64
  1. Modify the environment variables to specify the location of cuDNN. Use these steps to access the $(CUDA_PATH) environment variable value:
  • Go to the Start menu and open a command prompt.
  • Enter Run and the control sysdm.cpl command.
  • Go to the Advanced tab above.
  • Select Environment Variables located at the bottom of the window and ensure the values are set as follows:
  • Variable Name: CUDA_PATH 
  • Variable Value: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x
  1. Ensure Visual Studio project includes the cudnn.lib:
  • Go to Visual Studio and right-click on your project name.
  • Select Linker, then Input, and then Additional Dependencies.
  • Enter cudnn.lib and select OK.

Installing cuDNN On Linux


Before you download cuDNN, make sure you have the following installed on your Linux machine:

Downloading cuDNN For Linux

Before downloading cuDNN register for the NVIDIA Developer Program. Then do the following:

  1. Visit the NVIDIA cuDNN page and click Download.
  2. Complete the survey and accept the terms and conditions.
  3. Select the cuDNN version you want to install and download the binary.

Installing On Linux

Note that in the instructions below we refer to your local CUDA path as <local-cuda-path> and your cuDNN download path as <download-path>.

Note that the installation packages for cuDNN are available online. The executable you downloaded is a package manager that automatically downloads and installs them.

To install cuDNN on Ubuntu 18.04 and 20.04:

  1. Enable the repository containing the relevant cuDNN libraries by running the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin 

sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"

sudo apt-get update

  1. Install cuDNN by running:

sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}

sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}

Note that:

  • ${cudnn_version} is the cuDNN version you want to use - currently 8.3.2.*
  • ${cuda_version} is either cuda10.2 or cuda11.5

To install cuDNN on RHEL7 and RHEL8:

  1. Enable the repository by running these commands:

sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.repo

   sudo yum clean all

  1. Install the cuDNN library by running:

sudo yum install libcudnn8=${cudnn_version}-1.${cuda_version}

sudo yum install


GPU Virtualization with Run:AI

Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed on NVIDIA infrastructure.

Here are some of the capabilities you gain when using Run:AI: 

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models. 
Learn more about the Run:AI GPU virtualization platform.