GPUs are commonly used for deep learning, to accelerate training and inference for computationally intensive models. Keras is a Python-based, deep learning API that runs on top of the TensorFlow machine learning platform, and fully supports GPUs.
Keras was historically a high-level API sitting on top of a lower-level neural network API. It served as a wrapper for lower-level TensorFlow libraries. Keras has since been integrated with TensorFlow and is now entirely packaged with the TensorFlow library. This means that you automatically get the Keras API when you install TensorFlow.
TensorFlow has advanced support for Graphical Processing Units (GPUs) and Google’s proprietary Tensorflow Processing Units (TPUs). Keras, as a front-end for TensorFlow, can help you build and train TensorFlow models on GPU and TPU infrastructure.
Related content: Read our guide to Tensorflow GPU
In this article:
TensorFlow code, with Keras included, can run transparently on a single GPU without requiring explicit code configuration. Currently, both Ubuntu and Windows offer TensorFlow GPU support with CUDA-enabled cards.
For operations that can run on GPU, TensorFlow code runs on GPU by default. Thus, if there is both CPU and GPU available, TensorFlow will run the GPU-capable code unless otherwise specified.
To use Keras with GPU, follow these steps:
You can use the Python pip package manager to install TensorFlow.
TensorFlow is supported on several 64-bit systems, including Python (3.6-3.9), Ubuntu (16.04 and later), macOS (10.12.6 Sierra—later versions don’t offer GPU support) and Windows (7 and later, with C++ redistributable).
To install the drivers, download them from the NVIDIA website and run the installation wizard.
To install the CUDA Toolkit, select the version you want to download on the NVIDIA website. Make sure that it is the version currently supported by TensorFlow—you can check this on the TensorFlow website.
Go back to the NVIDIA website and create a free account to access the download. Select the CuDNN version corresponding to the supported CUDA Toolkit that you have downloaded.
To check if TensorFlow can detect a GPU, open an IDE (such as a Jupyter notebook). To see if TensorFlow has detected a GPU on your machine, check the size of the array tf.config.experimental.list_physical_devices(‘GPU’)
If the array length is greater than 0, it means TensorFlow has detected one or more GPUs on the machine, and you can safely run GPU operations.
You can run a model on multiple GPUs in one of two ways—data parallelism and model parallelism. Usually, you will likely opt for data parallelism:
A Tensor Processing Unit (TPU) is a deep learning accelerator available publicly on Google Cloud. TPUs can be used with Deep Learning VMs, AI Platform (ML Engine) and Colab.
To use a TPU, select a TPU runtime (for example, in Colab). Once you’ve connected to the runtime, you need to use a TPU Cluster Resolver to automatically detect the TPU on any supported platform.
Here is sample code that illustrates how to detect a TPU on the current machine.
Once you’ve set it up, the TPU workflow will be similar to implementing multi-GPU training on a single machine. The main difference is that the distribution strategy used is TPUStrategy.
Mixed precision involves combining 32-bit and 16-bit floating-point types to make a training model faster and less memory consuming. Most hardware can support mixed precision, but this strategy is only effective at speeding up models on the latest NVIDIA GPUs and TPUs. With NVIDIA GPUs, you combine float16 with float32. With TPUs, you combine bfloat16 with float32.
To implement mixed precision in Keras, you must create a mixed precision policy (also known as a dtype policy). This policy specifies the dtypes in which the layers will run.
Create the following :
tf.keras.mixed_precision.Policy
It should look like this:
policy = mixed_precision.Policy(‘mixed_float16’)
mixed_precision.set_global_policy(policy)
The mixed_float16 policy works best with NVIDIA GPUs that have a compute capability of 7.0 or higher. The policy can run on CPUs or other GPUs but it may be less effective at improving performance.
When you use mixed precision, consider implementing these tips to enhance performance:
Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed in Keras and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.