Azure GPU

Best GPU-Optimized VMs and Optimization Best Practices

How Can You Access GPUs in the Azure Cloud?

The Microsoft Azure cloud offers specialized virtual machines (VMs) equipped with graphical processing units (GPUs), which can handle intensive graphics processing, parallel computation, and machine learning tasks. These VMs are equipped with powerful NVIDIA and AMD GPUs, providing computational power to accelerate a variety of workloads, from data analysis to complex simulations.

Azure’s GPU offering allows anyone, from individual developers to large organizations, to access any amount of GPU resources without an upfront investment. Azure GPU-powered VMs support a range of frameworks and tools, making them versatile for various computing needs.

This is part of our series of articles about cloud deep learning.

In this article, you will learn:

Use Cases for Azure GPUs

Here are some common use cases for GPUs on Azure:

Training and Inference of AI Models

Azure GPUs provide the necessary computational power to process large datasets quickly. They reduce the time it takes to train complex models, making iterative development and testing more feasible.

For model inference, Azure GPUs offer low-latency responses crucial for real-time applications. Their ability to handle multiple requests simultaneously ensures AI-driven applications are responsive and reliable.

High-Performance Computing (HPC)

Azure GPUs deliver high computational speeds for HPC use cases, enabling researchers and engineers to solve complex problems faster. Examples include weather simulation, genomic analysis, or quantum mechanics calculations.

Azure's scalable architecture allows users to deploy HPC workloads on-demand, scaling to any number of compute VMs, while efficiently managing computing resources.

Graphics and Visualization

Graphics and visualization tasks, such as 3D rendering, virtual reality (VR), and visual effects (VFX) production, benefit significantly from Azure GPUs. These GPUs deliver the graphical processing power needed for high-quality, real-time rendering, enhancing creativity and productivity.

By utilizing Azure GPUs, professionals can access high-performance rendering capabilities without investing in expensive local hardware, enabling more flexible and scalable production workflows.

Real-Time Data Analysis

Azure GPUs are capable of handling real-time data analysis, enabling businesses to derive insights from large volumes of data quickly. High-speed processing allows for the immediate identification of trends, anomalies, and patterns, supporting informed decision-making.

Applications like financial modeling, risk analysis, and fraud detection leverage Azure GPUs for their ability to process complex computations and large datasets efficiently, leading to more accurate and timely results.

Related content: Read our guide to Azure deep learning

Best GPU-Optimized Azure VMs

Here are some examples of Azure virtual machines equipped with GPUs.

NCv3-series and NC T4_v3-series

The NCv3-series and NC T4_v3-series VMs are optimized for computational tasks like AI and deep learning. Equipped with NVIDIA Tesla V100 and T4 GPUs, these VMs offer a balance of performance and cost, suitable for a variety of GPU-accelerated applications.

NC 100 v4-series

NC 100 v4-series VMs are designed for large-scale AI and machine learning workloads. Powered by NVIDIA A100 Tensor Core GPUs, these VMs provide significant computational power, accelerating model training and inference.

These VMs support GPU partitioning, allowing multiple users to share GPU resources, maximizing utilization and reducing costs for AI development projects.

Image credit for this and the following images: Azure

ND A100 v4-series

The ND A100 v4-series VMs are tailored for demanding deep learning and HPC applications. With NVIDIA A100 Tensor Core GPUs, they offer exceptional computational power, necessary for training complex models and performing sophisticated simulations.

NDm A100 v4-series

Designed for the most demanding AI and HPC tasks, NDm A100 v4-series VMs feature NVIDIA A100 Tensor Core GPUs. They support massive workloads, offering scalable performance that caters to needs ranging from deep learning model training to complex scientific simulations.

These VMs provide high bandwidth and low-latency networking, essential for workloads that require extensive communication between nodes.

NGads V620-series

NGads V620-series VMs are specialized for graphics-intensive applications, including game streaming, professional visualization, and virtual workstations. Powered by NVIDIA A40 GPUs, they deliver high-performance graphics capabilities, ensuring smooth and detailed visualizations.

NV-series and NVv3-series

NV-series and NVv3-series VMs cater to applications requiring powerful graphics processing, such as video editing, design, and visualization. Equipped with NVIDIA Tesla M60 GPUs, they deliver robust performance for demanding graphics workloads.

These VMs offer a cost-effective solution for graphics-intensive applications, balancing performance and affordability.


NVv4-series VMs focus on delivering efficient, scalable graphics performance for applications like remote visualization, CAD, and gaming. With AMD Radeon Instinct MI25 GPUs, they provide a flexible and cost-effective option for users needing moderate graphics processing capabilities.

Partitionable GPUs in NVv4-series VMs allow for more granular resource allocation, optimizing utilization for various use cases.

Best Practices for Azure GPU Optimization

Here are some best practices that can help you make effective use of Azure GPUs.

Choose an Azure GPU Series and Instance Size That Best Fits Your  Computational Needs

Selecting the right Azure GPU series and instance size is crucial for optimizing performance and minimizing costs. Users should evaluate their computational needs, considering factors like processing power, memory requirements, and network bandwidth. The appropriate GPU instance ensures efficient resource utilization, delivering optimal performance for specific workloads.

Use Azure Managed Disks for High I/O Performance

Azure Managed Disks offer high I/O performance, crucial for GPU-intensive workloads that require rapid data access and storage. Leveraging these disks can improve the performance of applications, ensuring fast data retrieval and processing. Managed Disks also provide reliability and scalability, supporting the dynamic needs of GPU-accelerated applications.

Leverage Azure Batch for Large-Scale GPU Workloads

Azure Batch simplifies processing of large-scale GPU workloads, automating the allocation and management of computational resources. This service enables efficient execution of parallel tasks, reducing the time and effort needed to process large datasets or perform complex simulations.

Use Azure Proximity Placement Groups to Colocate Your GPU Instances

Azure Proximity Placement Groups (PPGs) ensure low-latency communication between GPU instances by colocating them in the same datacenter. This is particularly useful for applications requiring rapid interaction between nodes, such as HPC and multiplayer gaming. Colocation reduces network latency, enhancing performance and responsiveness of GPU-accelerated applications.

Use Azure Reservations and Spot VMs for Relevant GPU Workloads

Azure reservations and Spot VMs offer cost-saving opportunities for GPU workloads. Reservations provide discounted rates for committed usage, suitable for predictable workloads. Spot VMs allow users to bid for unused Azure capacity at discounts of up to 90%, but can be interrupted at short notice, making them useful for flexible workloads not requiring 24/7 availability.

Automated Deep Learning GPU Management With Run:ai

Run:ai automates resource management and workload orchestration for machine learning infrastructure. With Run:ai, you can automatically run as many compute intensive experiments as needed.

Here are some of the capabilities you gain when using Run:ai:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:ai enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:ai accelerates deep learning on GPU by, helping data scientists optimize expensive compute resources and improve the quality of their models.

Learn more about the Run:ai GPU virtualization platform.