TensorFlow GPU

Setup, Basic Operations, and Multi-GPU

How Can You Use GPUs with TensorFlow?

TensorFlow is Google’s popular, open source machine learning framework. It can be used to run mathematical operations on CPUs, GPUs, and Google’s proprietary Tensorflow Processing Units (TPUs). GPUs are commonly used for deep learning model training and inference.

To set up TensorFlow to work with GPUs, you need to have the relevant GPU device drivers and configure it to use GPUs (which is slightly different for Windows and Linux machines). Then, TensorFlow runs operations on your GPUs by default. You can control how TensorFlow uses CPUs and GPUs:

  • Logging operations placement on specific CPUs or GPUs
  • Instructing TensorFlow to run certain operations in a specific “device context”—a CPU or a specific GPU, if there are multiple GPUs on the machine
  • Limiting TensorFlow to use only certain GPUs, and free up memory for other programs

Related content: If you are the Keras front-end API, read our guide to Keras GPU

In this article:

Setting Up GPUs on Windows

The NVIDIA software packages you install must match the above-listed versions.

The CUDA, cuDNN and CUPTI installation directories must be added to the %PATH% environment variable. If, for example, you’ve installed the CUDA Toolkit to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0 and cuDNN is installed to C:\tools\cuda, you should update %PATH% to look like this:

SET PATH=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.0\bin;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.0\extras\CUPTI\lib64;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.0\include;%PATH%
SET PATH=C:\tools\cuda\bin;%PATH%

Setting Up GPUs on Linux

You can easily install the desired NVIDIA software on Ubuntu. If you build TensorFlow from source, you have to manually install the above-listed software requirements. Consider using a TensorFlow image (Docker -devel) as the base, because this will make it easier to consistently deploy Ubuntu with all the required software dependencies.

It should look like this:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

TensorFlow GPU Operations

TensorFlow refers to the CPU on your local machine as /device:CPU:0 and to the first GPU as /GPU:0—additional GPUs will have sequential numbering. By default, if a GPU is available, TensorFlow will use it for all operations. You can control which GPU TensorFlow will use for a given operation, or instruct TensorFlow to use a CPU, even if a GPU is available.

Logging Which Device Operations Run On

It is useful to log the CPU or GPU each TensorFlow operation runs on, since often TensorFlow will select the device without user intervention. Place this statement at the start of your program to log device placement:

tf.debugging.set_log_device_placement(True)

You can then print device placement as follows (where “a” is a tensor or similar object):

print(a)

Choosing Which Device to Place an Operation On

TensorFlow provides the command with tf.device to let you place one or more operations on a specific CPU or GPU.

You must first use the following statement:

tf.debugging.set_log_device_placement(True)

Then, to place a tensor on a specific device as follows:

  • To place a tensor on the CPU use with tf.device(‘/CPU:0’):
  • To place a tensor on GPU #3 use with tf.device(‘/CPU:3’):

Restricting GPU Use

By default, TensorFlow runs operations on all available GPU memory. However, you can limit it to use a specific set of GPUs, using the following statement:

tf.config.list_physical_devices(,)

For example, the following code restricts TensorFlow to using only the first GPU:

gpus = tf.config.list_physical_devices(‘GPU’)

tf.config.set_visible_devices(gpus[0], ‘GPU’)

TensorFlow Multi-GPU

TensorFlow supports the distribution of deep learning workloads across multiple GPUs.

The main way to implement distributed training in TensorFlow is with tf.distribute.Strategy. This method lets you distribute the training of your model across multiple GPUs, TPUs or machines. It should be easy to use and provides powerful out-of-the-box performance, so you can easily switch between strategies.

A variety of additional strategies, including certain experimental ones, use the distribute strategy as their base.

Related content: To better understand distributed training, read our guide to multi GPU

TPU Strategy

You can distribute training across multiple TPUs with tf.distribute.experimental.TPUStrategy. This method uses a special all-reduce implementation customized for TPUs, but is otherwise similar to the mirrored strategy.

Mirrored Strategy

You can implement synchronous distributed training across GPUs with  tf.distribute.MirroredStrategy. This strategy allows you to create replicas of model variables, which are then mirrored across the GPUs.

While in operation, the mirrored variables are grouped together and kept synchronized using all-reduce algorithms. The algorithm used by NVIDIA NCCL is the default, but you can create custom algorithms or specify other pre-built options.

Multi-Worker Mirrored Strategy

Similar to the mirrored strategy, tf.distribute.experimental.MultiWorkerMirroredStrategy syncs variables across multiple workers using a set of collectiveOps strategies. This allows you to distribute your training across multiple machines. The combined strategies reduce operations to a single unit in the TensorFlow graph, enabling the selection of a suitable all-reduce algorithm.

Parameter Server Strategy

You can train parameter servers on different machines using tf.distribute.experimental.ParameterServerStrategy. This strategy allows you to separate the machines into workers and parameter servers. Variables are distributed to the various parameter servers, with the computations being replicated across the worker GPUs.

Central Storage Strategy

You can synchronously train your model from a central CPU using tf.distribute.experimental.CentralStorageStrategy. This strategy manages variables centrally, with operations being mirrored across multiple GPUs. This lets you use different subsets of data to perform the same operations.

Learn more in our detailed guide to Tensorflow multiple GPU

TensorFlow GPU Virtualization with Run:AI

Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed in TensorFlow and other deep learning frameworks.

Here are some of the capabilities you gain when using Run:AI:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.

Learn more about the Run:AI GPU virtualization platform.