Deep Learning (DL) orchestration

TensorFlow GPU

Setup, Basic Operations, and Multi-GPU

TensorFlow GPU

TensorFlow is Google’s popular, open source machine learning framework. It can be used to run mathematical operations on CPUs, GPUs, and Google’s proprietary Tensorflow Processing Units (TPUs). GPUs are commonly used for deep learning model training and inference. 

To set up TensorFlow to work with GPUs, you need to have the relevant GPU device drivers and configure it to use GPUs (which is slightly different for Windows and Linux machines). Then, TensorFlow runs operations on your GPUs by default. You can control how TensorFlow uses CPUs and GPUs:

  • Logging operations placement on specific CPUs or GPUs
  • Instructing TensorFlow to run certain operations in a specific “device context”—a CPU or a specific GPU, if there are multiple GPUs on the machine
  • Limiting TensorFlow to use only certain GPUs, and free up memory for other programs

Related content: If you are the Keras front-end API, read our guide to Keras GPU

In this article:

Setting up TensorFlow to Use GPUs

Here is an outline of how to configure TensorFlow to use GPUs on a machine.

Hardware Prerequisites

TensorFlow supports GPU-enabled devices such as NVIDIA GPU cards with a CUDA architecture of 8.0 or higher (as well as 3.5, 5.0, 6.0, 7.0 and 7.5). See Linux’s guide for building from the source to use a GPU with an unsupported CUDA architecture, to use different versions of the NVIDIA library, or to prevent the compilation of JIT from PTX.

Packages only include PTX code with recent, supported CUDA architectures. TensorFlow will therefore fail to run on an older GPU when you set CUDA_FORCE_PTX_JIT=1.

Software Prerequisites

For NVIDIA GPUs, you must have following software installed on your system:

  • CUDA Toolkit—CUDA 11.2 is currently supported
  • NVIDIA GPU drivers—you need 450.80.02 or above for CUDA 11.2 
  • CUPTI—this is available with the CUDA Toolkit.
  • cuDNN SDK 8.1.0
  • TensorRT 6.0 (optional)—improves throughput and latency and for certain models

Setting Up GPUs on Windows

The NVIDIA software packages you install must match the above-listed versions.

The CUDA, cuDNN and CUPTI installation directories must be added to the %PATH% environment variable. If, for example, you’ve installed the CUDA Toolkit to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0 and cuDNN is installed to C:\tools\cuda, you should update %PATH% to look like this:

SET PATH=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.0\bin;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.0\extras\CUPTI\lib64;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.0\include;%PATH%
SET PATH=C:\tools\cuda\bin;%PATH%

Setting Up GPUs on Linux 

You can easily install the desired NVIDIA software on Ubuntu. If you build TensorFlow from source, you have to manually install the above-listed software requirements. Consider using a TensorFlow image (Docker -devel) as the base, because this will make it easier to consistently deploy Ubuntu with all the required software dependencies.

It should look like this:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

TensorFlow GPU Operations

TensorFlow refers to the CPU on your local machine as /device:CPU:0 and to the first GPU as /GPU:0—additional GPUs will have sequential numbering. By default, if a GPU is available, TensorFlow will use it for all operations. You can control which GPU TensorFlow will use for a given operation, or instruct TensorFlow to use a CPU, even if a GPU is available.

Logging Which Device Operations Run On

It is useful to log the CPU or GPU each TensorFlow operation runs on, since often TensorFlow will select the device without user intervention. Place this statement at the start of your program to log device placement:

tf.debugging.set_log_device_placement(True)

You can then print device placement as follows (where “a” is a tensor or similar object):

print(a)

Choosing Which Device to Place an Operation On

TensorFlow provides the command with tf.device to let you place one or more operations on a specific CPU or GPU.

You must first use the following statement:

tf.debugging.set_log_device_placement(True)

Then, to place a tensor on a specific device as follows:

  • To place a tensor on the CPU use with tf.device(‘/CPU:0’):
  • To place a tensor on GPU #3 use with tf.device(‘/CPU:3’):

Restricting GPU Use

By default, TensorFlow runs operations on all available GPU memory. However, you can limit it to use a specific set of GPUs, using the following statement:

tf.config.list_physical_devices(,)

For example, the following code restricts TensorFlow to using only the first GPU:

gpus = tf.config.list_physical_devices(‘GPU’)

tf.config.set_visible_devices(gpus[0], ‘GPU’)

TensorFlow Multi-GPU

TensorFlow supports the distribution of deep learning workloads across multiple GPUs. 

The main way to implement distributed training in TensorFlow is with tf.distribute.Strategy. This method lets you distribute the training of your model across multiple GPUs, TPUs or machines. It should be easy to use and provides powerful out-of-the-box performance, so you can easily switch between strategies. 

A variety of additional strategies, including certain experimental ones, use the distribute strategy as their base.

Related content: To better understand distributed training, read our guide to multi GPU

TPU Strategy

You can distribute training across multiple TPUs with tf.distribute.experimental.TPUStrategy. This method uses a special all-reduce implementation customized for TPUs, but is otherwise similar to the mirrored strategy. 

Mirrored Strategy

You can implement synchronous distributed training across GPUs with  tf.distribute.MirroredStrategy. This strategy allows you to create replicas of model variables, which are then mirrored across the GPUs. 

While in operation, the mirrored variables are grouped together and kept synchronized using all-reduce algorithms. The algorithm used by NVIDIA NCCL is the default, but you can create custom algorithms or specify other pre-built options.

Multi-Worker Mirrored Strategy

Similar to the mirrored strategy, tf.distribute.experimental.MultiWorkerMirroredStrategy syncs variables across multiple workers using a set of collectiveOps strategies. This allows you to distribute your training across multiple machines. The combined strategies reduce operations to a single unit in the TensorFlow graph, enabling the selection of a suitable all-reduce algorithm.

Parameter Server Strategy

You can train parameter servers on different machines using tf.distribute.experimental.ParameterServerStrategy. This strategy allows you to separate the machines into workers and parameter servers. Variables are distributed to the various parameter servers, with the computations being replicated across the worker GPUs.

Central Storage Strategy

You can synchronously train your model from a central CPU using tf.distribute.experimental.CentralStorageStrategy. This strategy manages variables centrally, with operations being mirrored across multiple GPUs. This lets you use different subsets of data to perform the same operations. 

Learn more in our detailed guide to Tensorflow multiple GPU

TensorFlow GPU Virtualization with Run.AI

Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed in TensorFlow and other deep learning frameworks. 

Here are some of the capabilities you gain when using Run:AI: 

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models. 

Learn more about the Run.ai GPU virtualization platform.

Share on linkedin
LinkedIn
Share on twitter
Twitter
Share on facebook
Facebook
We use cookies on our site to give you the best experience possible. By continuing to browse the site, you agree to this use. For more information on how we use cookies, see our Privacy Policy.