Question 1

What Is a High Performance Computing (HPC) Cluster?

Accepted Answer

A High Performance Computing (HPC) cluster is a collection of multiple servers, known as nodes, which enable the execution of applications. There are usually several various types of nodes in an HPC cluster. HPC software typically runs across hundreds or thousands of nodes to read and write data. Seamless communication is essential to enable communication between the storage system and large numbers of nodes.

HPC is rapidly evolving. We’ll discuss how HPC clusters are changing in response to cloud computing, the growing use of AI applications and models, the availability of powerful data center Graphical Processing Units (GPUs), and the overwhelming adoption of container infrastructure.

Question 2

HPC Cluster Architecture

Accepted Answer

An HPC cluster architecture typically includes the following components:

Underlying software—an HPC cluster requires an underlying infrastructure to execute applications, as well as software to control this infrastructure. The software used to run an HPC cluster must be able to handle massive volumes of input/output traffic and multiple, simultaneous tasks such as reading and writing data to the storage system from a large collection of CPUs.
Network switch—an HPC cluster requires high bandwidth and low latency. It is therefore common to use a network switch like Infiniband or a high-performing Ethernet switch.
Head/login node—this node validates users and allows them to set up software on compute nodes.
Compute nodes—these nodes perform numerical computations and typically have the highest possible clock rates for the number of cores. These nodes may have minimal persistent storage but high dynamic random-access memory (DRAM).
Accelerator nodes—these nodes might contain an accelerator (or multiple accelerators), although some applications cannot take advantage of them. In some cases, every node contains an accelerator (for instance, a smaller HPC cluster with a specific purpose in mind).
Storage system—HPC clusters can use storage nodes or a storage system such as a parallel file system (PFS), which enables simultaneous communication between multiple nodes and storage drives. Powerful storage is important for allowing the compute nodes to operate smoothly and with low latency.

Question 3

HPC in the Cloud

Accepted Answer

HPC is usually enabled by a purpose-built cluster for batch processing scientific workloads. You can combine on-premise clusters with cloud computing to handle usage spikes. The cloud is also suited to parallel jobs.

You can implement HPC using a public or private cloud. The major cloud providers allow you to build your own HPC services or leverage bundled services. Certain providers support ready-to-use HPC implementations in a hybrid environment.

Question 4

How HPC Clusters Drive AI Applications

Accepted Answer

Typically, an HPC system contains between 16 and 64 nodes, with at least two CPUs per node. The multiple CPUs ensure increased processing power compared to traditional, single-device systems with only one CPU. The nodes in an HPC system also provide additional storage and memory resources, increasing both speed and storage capacity.

Some HPC systems use graphics processing units (GPUs) to boost processing power even further. You can use GPUs as co-processors, in parallel with CPUs, in a model known as hybrid computing.

HPC can provide the following benefits for developing artificial intelligence (AI) applications:

Purpose-built processors—GPUs are specialized for efficient AI processing, including algorithms like neural networks.
Processing speed—co-processing and parallel processing enable fast computation, allowing you to process data and run AI experiments within shorter timeframes.
Volume of data—the large number of storage and memory resources let you process large volumes of data and run longer analyses, improving the accuracy of your AI models.
Efficient use of resources—by distributing workloads across the resources available, you can maximize the efficiency of your resource usage.
Cost savings—an HPC system is a cost-effective way to access supercomputing capabilities. Cloud-based HPC lets you reduce upfront costs with a pay-per-use pricing model for resources.

HPC Clusters

What’s Next For HPC Clusters?

What Is a High Performance Computing (HPC) Cluster?

HPC Cluster Architecture

HPC in the Cloud

How HPC Clusters Drive AI Applications

Best Practices for Modern HPC Workloads

Design Your Cluster for the Workload

Containers and HPC

Leverage GPU Virtualization

HPC GPU Virtualization with Run:ai

Learn More About HPC Clusters