Question 1

What Is Multi GPU in Deep Learning?

Accepted Answer

Deep learning is a subset of machine learning that does not rely on structured data to develop accurate predictive models. This method uses networks of algorithms modeled after neural networks in the brain to distill and correlate large amounts of data. The more data you feed your network, the more accurate the model becomes.

You can functionally train deep learning models using sequential processing methods. However, the amount of data needed and the length of data processing make it impractical if not impossible to train models without parallel processing. Parallel processing enables multiple data objects to be processed at the same time, drastically reducing training time. This parallel processing is typically accomplished through the use of graphical processing units (GPUs).

GPUs are specialized processors created to work in parallel. These units can provide significant advantages over traditional CPUs, including up to 10x more speed. Typically, multiple GPUs are built into a system in addition to CPUs. While the CPUs can handle more complex or general tasks, the GPUs can handle specific, highly repetitive processing tasks.

Question 2

Multi GPU Deep Learning Strategies

Accepted Answer

Once multiple GPUs are added to your systems, you need to build parallelism into your deep learning processes. There are two main methods to add parallelism—models and data.

Model parallelism
Model parallelism is a method you can use when your parameters are too large for your memory constraints. Using this method, you split your model training processes across multiple GPUs and perform each process in parallel (as illustrated in the image below) or in series. Model parallelism uses the same dataset for each portion of your model and requires synchronizing data between the splits.

Data parallelism
Data parallelism is a method that uses duplicates of your model across GPUs. This method is useful when the batch size used by your model is too large to fit on a single machine, or when you want to speed up the training process. With data parallelism, each copy of your model is trained on a subset of your dataset simultaneously. Once done, the results of the models are combined and training continues as normal.

Question 3

How Does Multi GPU Work in Common Deep Learning Frameworks?

Accepted Answer

TensorFlow Multiple GPU
TensorFlow is an open source framework, created by Google, that you can use to perform machine learning operations. The library includes a variety of machine learning and deep learning algorithms and models that you can use as a base for your training. It also includes built-in methods for distributed training using GPUs.

Through the API, you can use the tf.distribute.Strategy method to distribute your operations across GPUs, TPUs or machines. This method enables you to create and support multiple user segments and to switch between distributed strategies easily.

Two additional strategies that extend the distribute method are MirroredStrategy and TPUStrategy. Both of these enable you to distribute your workloads, the former across multiple GPUs and the latter across multiple Tensor Processing Units (TPUs). TPUs are units available through Google Cloud Platform that are specifically optimized for training with TensorFlow.

Both of these methods use roughly the same data-parallel process, summarized as follows:

Your dataset is segmented so data is distributed as evenly as possible.
Replicas of your model are created and assigned to a GPU. Then, a subset of the dataset is assigned to that replica.
The subset for each GPU is processed and gradients are produced.
The gradients from all model replicas are averaged and the result is used to update the original model.
The process repeats until your model is fully trained.
Learn more in our guide to TensorFlow multiple GPU and Keras multiple GPU

‍

PyTorch Multi GPU
PyTorch is an open source scientific computing framework based on Python. You can use it to train machine learning models using tensor computations and GPUs. This framework supports distributed training through the torch.distributed backend.

With PyTorch, there are three parallelism (or distribution) classes that you can perform with GPUs. These include:

DataParallel—enables you to distribute model replicas across multiple GPUs in a single machine. You can then use these models to process different subsets of your data set.
DistributedDataParallel—extends the DataParallel class to enable you to distribute model replicas across machines in addition to GPUs. You can also use this class in combination with model_parallel to perform both model and data parallelism.
model_parallel—enables you to split large models across multiple GPUs with partial training happening on each. This requires syncing training data between the GPUs since operations are performed sequentially.

Deep Learning with Multiple GPUs

An In-Depth Guide

Related Articles

What Is Multi GPU in Deep Learning?

Multi GPU Deep Learning Strategies

Model parallelism

Data parallelism

How Does Multi GPU Work in Common Deep Learning Frameworks?

TensorFlow Multiple GPU

PyTorch Multi GPU

Multi GPU Deployment Models

GPU Server

GPU Cluster

Kubernetes with GPUs

Multi GPU With Run:AI

Learn More about Multi GPU Infrastructure

See Additional Guides on Key Machine Learning Topics

Advanced Threat Protection

Auto Image Crop

Feature Importance