Best GPU Instances and How to Optimize Your Costs

How Can You Get Access to GPUs on AWS?

Amazon Web Services (AWS), as part of its Elastic Compute Cloud (EC2) service, provides on-demand access to graphical processing units (GPUs). GPUs are specialized hardware accelerators designed to optimize graphic and compute-intensive applications, including AI, machine learning, and high performance computing (HPC).

Amazon EC2 GPU instances are equipped with NVIDIA and AMD graphics processing units, which deliver significant performance improvements over CPU-only instances. By offloading tasks to GPUs, users can achieve faster results and more efficient computation. However, GPU instances come at a premium cost compared to regular Amazon EC2 instances.

This is part of our series of articles about cloud deep learning.

In this article, you will learn:

Best AWS GPU Instances for Deep Learning

Deep learning architectures are computationally intensive, both in training and inference stages. Here are some examples of AWS GPUs that Amazon recommends for deep learning projects, and are also suitable for other heavy-duty workloads.

For each type of instance, we provide Amazon’s table of instance sizes and specs. Pricing shown is for the US East region, and correct as of the time of this writing (consult the relevant Amazon product pages for latest pricing).

Amazon EC2 P3 Instances

Amazon EC2 P3 instances are optimized for machine learning and high-performance computing workloads. Equipped with NVIDIA Tesla V100 GPUs, they deliver powerful performance for deep learning training and inference tasks.

P3 instances are suitable for applications requiring massive parallel compute power. They reduce the time required for training complex models, enabling faster iterations and quicker deployment of machine learning applications.

Credit for this and the following images: AWS

Amazon EC2 P4 Instances

Amazon EC2 P4 instances feature the latest NVIDIA A100 Tensor Core GPUs, providing acceleration for machine learning and HPC applications. They offer superior performance for the most demanding computational tasks, including deep learning model training at scale. Their high performance and scalability are suitable for research and complex model training.

Amazon EC2 G3 Instances

Amazon EC2 G3 instances are designed for graphics-intensive applications, including 3D visualization and video encoding. Powered by NVIDIA Tesla M60 GPUs, they deliver high performance for graphics workloads, making them suitable for remote graphics workstations and rendering applications. They support a wide range of graphics-intensive tasks.

Amazon EC2 G4 Instances

Amazon EC2 G4 instances are optimized for machine learning inference and graphics-intensive applications. Equipped with NVIDIA T4 Tensor Core GPUs, they provide a balanced performance for both compute and graphics workloads. G4 instances support real-time ray tracing and AI-driven applications, suitable for developers and businesses that require high-performance graphics and machine learning capabilities.

Note: Above we show G4 instances with NVIDIA GPUs. Amazon also offers similar instances with AMD GPUs, called g4ad.

Amazon EC2 G5 Instances

Amazon EC2 G5 instances are the newest GPU instance type, featuring NVIDIA A10G Tensor Core GPUs. They deliver high performance for machine learning inference, graphics rendering, and real-time gaming applications. G5 instances provide a versatile platform for a broad range of compute and graphics-intensive workloads.

G5 instances offer significant performance improvements over previous generation instances, enabling faster processing and more efficient resource utilization. They support modern applications demanding high throughput and responsiveness.

Amazon EC2 G5g Instances

Amazon EC2 G5g instances are powered by AWS Graviton2 processors and NVIDIA T4G GPUs, optimized for cost-effective machine learning inference and graphics workloads. They offer a unique combination of processing power and energy efficiency, suitable for a wide range of applications.

Learn more in our detailed guide to AWS deep learning

Cost Optimization Tips When Using AWS GPU Instances

When operating at large scale or over prolonged periods, AWS GPUs can represent a significant expense. Here are several measures you can take to optimize the cost of GPUs on AWS.

Utilizing Spot Instances

AWS Spot instances allow users to take advantage of unused EC2 capacity at a discounted rate, significantly reducing costs for GPU-intensive tasks. Spot instances offer the same performance as on-demand instances but at a discount of up to 90%, but can be interrupted with only 2 minutes’ notice, making them useful for flexible workloads that can tolerate interruptions.

These instances can be used for non-time-sensitive tasks such as batch processing, training machine learning models, or rendering. This can maximize cost efficiency while maintaining high performance.

Separate Your Development from GPU-Intensive Tasks

Separating development environments from GPU-intensive tasks helps optimize costs by ensuring that high-cost GPU resources are used only when necessary. Development and testing can be done on cost-effective instances, reserving GPU instances for final testing, training, or production runs.

This approach minimizes costs by aligning resource usage with the actual computing needs of different stages in the development lifecycle, avoiding unnecessary expenditure on high-performance resources during early stages.

Add GPU Acceleration to CPU Instances for Inference Tasks

For applications that primarily require CPU resources but occasionally need GPU acceleration, dynamically attaching GPUs to existing CPU instances for short periods can provide a cost-effective solution. This strategy allows for efficient resource utilization, employing GPUs only when their computational power is needed.

By only using GPUs for inference tasks or when specific computational needs arise, users can reduce costs compared to maintaining dedicated GPU instances.

Utilize Tensor Cores

Tensor Cores in NVIDIA GPUs provide accelerated performance for deep learning applications. By optimizing workloads to use Tensor Cores, users can achieve faster computation times and reduced costs. Tensor Cores offer efficient processing for specific types of calculations common in machine learning, enabling more efficient use of GPU resources.

For G4 and G5 Instances, Consider Using Reduced Precision Inference to Boost Throughput

Reduced precision inference involves using lower precision data types for computations, which can significantly increase throughput and reduce costs on G4 and G5 instances. This technique takes advantage of the hardware acceleration available in these instances for specific precision levels, optimizing resource utilization.

By adapting models to utilize reduced precision inference, users can achieve faster processing and lower costs, without substantially impacting the accuracy of the results. This approach is especially beneficial for large-scale machine learning inference tasks where cost and speed are critical considerations.

Automated Deep Learning GPU Management With Run:ai

Run:ai automates resource management and workload orchestration for machine learning infrastructure. With Run:ai, you can automatically run as many compute intensive experiments as needed.

Here are some of the capabilities you gain when using Run:ai:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:ai enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:ai accelerates deep learning on GPU by, helping data scientists optimize expensive compute resources and improve the quality of their models.

Learn more about the Run:ai GPU virtualization platform.