High-performance computing (HPC) is an umbrella term that refers to the aggregation of computing resources at large scale to deliver higher performance. This aggregation helps solve large, complex problems in many fields, including science, engineering, and business.
Graphics processing units (GPUs) offer a parallel architecture and high performance that speed up certain computing processes, especially those related to artificial intelligence (AI) and machine learning (ML) models. Combining GPUs with HPC clusters can increase the processing power of data centers. It enables data scientists and researchers to process massive amounts of data faster, more efficiently, and at lower cost, compared to computing nodes relying only on central processing units (CPUs).
In this article:
Typically, there are 16-64 nodes in an HPC system. Each node runs multiple CPUs, providing greater processing power than a traditional computing system. HPC nodes also have fast storage and memory resources, providing higher speed and greater capacity than a traditional system.
HPC systems often incorporate GPS to provide even higher processing power. They combine CPUs and GPUs in a hybrid computing system.
A hybrid computing HPC system is useful for AI projects in the following ways:
Related content: Read our guide to HPC AI
Here are the two main methods of processing information in HPC:
The majority of HPC systems enable parallel processing by aggregating multiple processors and memory modules across ultra-high-bandwidth interconnections. The use of CPUs and GPUs together is called heterogeneous computing, offered by some HPC systems.
HPC encompasses various technologies enabling large-scale parallel computing. HPC systems traditionally relied on CPUs, though today, HPC systems are increasingly incorporating GPUs. HPC servers often combine multiple GPUs and CPUs in a single system.
Hybrid HPC systems use a dual root configuration design that allows numerous processors to access memory efficiently. An optimized Peripheral Component Interconnect Express (PCIe) bus combines CPUs and GPUs. Dual root servers contain two main (root) processors, each with its own memory zone—these two processors share the PCIe bus, each receiving about half of the PCIe slots.
The PCIe architecture includes three types of high-speed data links:
Dual root PCIe design patterns help optimize the use of CPU and GPU memory to support applications requiring a high degree of parallelism and sequentiality.
The A100 is NVIDIA’s industrial-strength, data center GPU. It has 54 billion transistors and is considered the world’s largest processor in the 7nm range. NVIDIA’s Multi-Instance GPU (MIG) technology allows each A100 to be partitioned into up to seven virtual GPUs.
Basic specs:
The V100 is powered by the NVIDIA Volta architecture. It has 640 Tensor Cores, and was the first GPU to break the 100 teraFLOPS barrier.
Basic specs:
The Tesla T4 was designed especially for HPC, deep learning, and data analytics workloads. It is a single-card slot with relatively low power consumption—only 70W. It has 320 Tensor Cores and 2560 shading units for graphical operations.
Basic specs:
The RTX 8000 was optimized for operations like CAD and 3D modeling, but can also be useful in HPC deployments. It has 4,608 CUDA Cores, 576 Tensor Cores, and reaches 130.5 teraFLOPS for Tensor-based workloads.
Basic specs:
Run:AI automates resource management and orchestration for HPC clusters utilizing GPU hardware. With Run:AI, you can automatically run as many compute intensive workloads as needed.
Here are some of the capabilities you gain when using Run:AI:
Run:AI simplifies HPC infrastructure, helping teams accelerate their productivity and conserve costs by running more jobs on fewer resources.
Learn more about the Run:AI GPU virtualization platform.