Can You Run Keras Models on GPU?
GPUs are commonly used for deep learning, to accelerate training and inference for computationally intensive models. Keras is a Python-based, deep learning API that runs on top of the TensorFlow machine learning platform, and fully supports GPUs.
Keras was historically a high-level API sitting on top of a lower-level neural network API. It served as a wrapper for lower-level TensorFlow libraries. Keras has since been integrated with TensorFlow and is now entirely packaged with the TensorFlow library. This means that you automatically get the Keras API when you install TensorFlow.
TensorFlow has advanced support for Graphical Processing Units (GPUs) and Google’s proprietary Tensorflow Processing Units (TPUs). Keras, as a front-end for TensorFlow, can help you build and train TensorFlow models on GPU and TPU infrastructure.
Related content: Read our guide to Tensorflow GPU
In this article:
- Using Keras on a Single GPU
- Keras on Multiple GPUs
- Keras on TPU
- Keras Mixed Precision
- Keras GPU Virtualization With Run:AI
TensorFlow code, with Keras included, can run transparently on a single GPU without requiring explicit code configuration. Currently, both Ubuntu and Windows offer TensorFlow GPU support with CUDA-enabled cards.
For operations that can run on GPU, TensorFlow code runs on GPU by default. Thus, if there is both CPU and GPU available, TensorFlow will run the GPU-capable code unless otherwise specified.
To use Keras with GPU, follow these steps:
- Install TensorFlow
You can use the Python pip package manager to install TensorFlow.
TensorFlow is supported on several 64-bit systems, including Python (3.6-3.9), Ubuntu (16.04 and later), macOS (10.12.6 Sierra—later versions don’t offer GPU support) and Windows (7 and later, with C++ redistributable).
- Install the NVIDIA Drivers
To install the drivers, download them from the NVIDIA website and run the installation wizard.
- Install CUDA
To install the CUDA Toolkit, select the version you want to download on the NVIDIA website. Make sure that it is the version currently supported by TensorFlow—you can check this on the TensorFlow website.
- Install CuDNN
Go back to the NVIDIA website and create a free account to access the download. Select the CuDNN version corresponding to the supported CUDA Toolkit that you have downloaded.
- Verify GPU detection
To check if TensorFlow can detect a GPU, open an IDE (such as a Jupyter notebook). To see if TensorFlow has detected a GPU on your machine, check the size of the array tf.config.experimental.list_physical_devices(‘GPU’)
If the array length is greater than 0, it means TensorFlow has detected one or more GPUs on the machine, and you can safely run GPU operations.
You can run a model on multiple GPUs in one of two ways—data parallelism and model parallelism. Usually, you will likely opt for data parallelism:
- Data parallelism—segments model training data into components that are run in parallel. This is the most popular form of distributed training. You can use copies of the model to run each component on different resources. You have to synchronize your model parameters while training the subsets to avoid misalignment of prediction errors between subsets. This means that implementations of data parallelism require communications between workers to enable synchronization of changes.
- Model parallelism—segments your model into components that can be run in parallel. Each component is trained individually before all the results are aggregated. You can run components on different resources with the same data, which reduces the communication required between workers—you only need the communication necessary for synchronizing shared parameters. You can apply this method using multiple GPUs in one server.
A Tensor Processing Unit (TPU) is a deep learning accelerator available publicly on Google Cloud. TPUs can be used with Deep Learning VMs, AI Platform (ML Engine) and Colab.
To use a TPU, select a TPU runtime (for example, in Colab). Once you’ve connected to the runtime, you need to use a TPU Cluster Resolver to automatically detect the TPU on any supported platform.
Here is sample code that illustrates how to detect a TPU on the current machine.
Once you’ve set it up, the TPU workflow will be similar to implementing multi-GPU training on a single machine. The main difference is that the distribution strategy used is TPUStrategy.
Mixed precision involves combining 32-bit and 16-bit floating-point types to make a training model faster and less memory consuming. Most hardware can support mixed precision, but this strategy is only effective at speeding up models on the latest NVIDIA GPUs and TPUs. With NVIDIA GPUs, you combine float16 with float32. With TPUs, you combine bfloat16 with float32.
To implement mixed precision in Keras, you must create a mixed precision policy (also known as a dtype policy). This policy specifies the dtypes in which the layers will run.
Create the following :
It should look like this:
policy = mixed_precision.Policy(‘mixed_float16’)
The mixed_float16 policy works best with NVIDIA GPUs that have a compute capability of 7.0 or higher. The policy can run on CPUs or other GPUs but it may be less effective at improving performance.
When you use mixed precision, consider implementing these tips to enhance performance:
- Increase the batch size—when using GPUs, try to double the batch size (provided it doesn’t affect the quality of the model quality). Usually, you should be able to double the batch size and not run out of memory, because float16 tensors only use half the memory. Larger batch sizes usually mean greater training throughput, so your model can run more training elements per second.
- Ensure the use of GPU Tensor Cores—the latest NVIDIA GPUs have special hardware units called Tensor Cores, which can quickly multiply float16 matrices. You should utilize Tensor Cores wherever possible. Note that Tensor Cores require certain tensor dimensions to be multiples of 8. For example tf.keras.layers.Dense and tf.keras.layers.LSTM require 64 units.
- Improve performance with Cloud TPUs—when using Cloud TPUs, try to double the batch size to take advantage of the bfloat16 tensors that use half the memory. As with GPUs, larger batch sizes can mean greater training throughput. However, unlike GPUs, you do not need to tune TPUs specifically for mixed precision to achieve optimal performance.
Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed in Keras and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run.ai GPU virtualization platform.