Taking HPC Clusters to the Next Level with GPUs

How Does HPC Leverage GPUs?

High-performance computing (HPC) is an umbrella term that refers to the aggregation of computing resources at large scale to deliver higher performance. This aggregation helps solve large, complex problems in many fields, including science, engineering, and business.

Graphics processing units (GPUs) offer a parallel architecture and high performance that speed up certain computing processes, especially those related to artificial intelligence (AI) and machine learning (ML) models. Combining GPUs with HPC clusters can increase the processing power of data centers. It enables data scientists and researchers to process massive amounts of data faster, more efficiently, and at lower cost, compared to computing nodes relying only on central processing units (CPUs).

In this article:

How HPC Can Help Build Better AI Applications

Typically, there are 16-64 nodes in an HPC system. Each node runs multiple CPUs, providing greater processing power than a traditional computing system. HPC nodes also have fast storage and memory resources, providing higher speed and greater capacity than a traditional system.

HPC systems often incorporate GPS to provide even higher processing power. They combine CPUs and GPUs in a hybrid computing system.

A hybrid computing HPC system is useful for AI projects in the following ways:

  • HPC technology is more cost-effective than traditional technologies for supercomputing use cases. HPC is available as a service in the cloud, allowing you to pay according to your actual usage and avoid upfront costs.
  • HPC provides parallelism and co-processing to accelerate computations and enable the fast processing of massive data sets for large-scale experiments.
  • GPUs enable more effective processing of AI algorithms (e.g., neural networks).
  • Greater memory and storage capacity enables the processing of large volumes of data, which increases the accuracy of machine learning models.
  • You can distribute workloads across multiple available resources to ensure optimal utilization.

Related content: Read our guide to HPC AI

How HPC Systems Use CPUs and GPUs

Here are the two main methods of processing information in HPC:

  • Serial processing—this method is used by CPUs. Typically, each CPU core handles one task at a time. CPUs are required to run operating systems and basic applications like office productivity and word processing.
  • Parallel processing—this method is performed by multiple CPUs or GPUs. GPUs can perform multiple arithmetic operations across a matrix of data simultaneously. This design enables GPUs to work on many data planes concurrently.

The majority of HPC systems enable parallel processing by aggregating multiple processors and memory modules across ultra-high-bandwidth interconnections. The use of CPUs and GPUs together is called heterogeneous computing, offered by some HPC systems.

HPC Applications: GPU vs CPU

HPC encompasses various technologies enabling large-scale parallel computing. HPC systems traditionally relied on CPUs, though today, HPC systems are increasingly incorporating GPUs. HPC servers often combine multiple GPUs and CPUs in a single system.

Hybrid HPC systems use a dual root configuration design that allows numerous processors to access memory efficiently. An optimized Peripheral Component Interconnect Express (PCIe) bus combines CPUs and GPUs. Dual root servers contain two main (root) processors, each with its own memory zone—these two processors share the PCIe bus, each receiving about half of the PCIe slots.

The PCIe architecture includes three types of high-speed data links:

  • Network link—a fast network interface, usually using InfiniBand.
  • Inter-root link—a fast connection allowing processors associated with different root processors to communicate with each other via the PCIe board. For example, this would be Ultra Path Interconnect (UPI) in an Intel system.
  • Inter-GPU link—an NVlink connection enabling rapid communication between GPUs at a rate of up to 300 GB per second. This connection lets programmers treat multiple GPUs as a single large GPU.

Dual root PCIe design patterns help optimize the use of CPU and GPU memory to support applications requiring a high degree of parallelism and sequentiality.

Best GPUs for HPC Workloads


The A100 is NVIDIA’s industrial-strength, data center GPU. It has 54 billion transistors and is considered the world’s largest processor in the 7nm range. NVIDIA’s Multi-Instance GPU (MIG) technology allows each A100 to be partitioned into up to seven virtual GPUs.

Basic specs:

  • GPU memory—40-80 GB
  • Memory bandwidth—up to 2,039 GB/s
  • Processing capacity—up to 624 teraFLOPS


The V100 is powered by the NVIDIA Volta architecture. It has 640 Tensor Cores, and was the first GPU to break the 100 teraFLOPS barrier.

Basic specs:

  • GPU memory—16/32 GB
  • Memory bandwidth—up to 1,134 GB/s
  • Processing capacity—up to 130 teraFLOPS

Tesla T4

The Tesla T4 was designed especially for HPC, deep learning, and data analytics workloads. It is a single-card slot with relatively low power consumption—only 70W. It has 320 Tensor Cores and 2560 shading units for graphical operations.

Basic specs:

  • GPU memory—16 GB
  • Memory bandwidth—300 GB/s
  • Processing capacity—8.1 teraFLOPS

Quadro RTX 8000

The RTX 8000 was optimized for operations like CAD and 3D modeling, but can also be useful in HPC deployments. It has 4,608 CUDA Cores, 576 Tensor Cores, and reaches 130.5 teraFLOPS for Tensor-based workloads.

Basic specs:

  • GPU memory—48 GB
  • Memory bandwidth—672 GB/s
  • Processing capacity—16.3 teraFLOPS

HPC Cluster Management with Run:AI

Run:AI automates resource management and orchestration for HPC clusters utilizing GPU hardware. With Run:AI, you can automatically run as many compute intensive workloads as needed.

Here are some of the capabilities you gain when using Run:AI:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:AI simplifies HPC infrastructure, helping teams accelerate their productivity and conserve costs by running more jobs on fewer resources.

Learn more about the Run:AI GPU virtualization platform.