Question 1

Understanding Kubernetes Architecture for Data Science Workloads

Accepted Answer

This article explains how Kubernetes Architecture as a platform for containerized AI workloads came to be used inside many companies. The guide explains some of the things to consider when implementing Kubernetes architecture to orchestrate AI workloads.

Question 2

Kubernetes Overview

Accepted Answer

Originally developed inside Google, Kubernetes has been an open-source project since June 2014 and managed by the Cloud Native Computing Foundation (CNCF) since Google and Linux partnered to found the CNCF in July 2015. Kubernetes is an orchestration system that automates the processes involved in running thousands of containers in production. It eliminates the infrastructure complexity associated with deploying, scaling, and managing containerized applications.

There is a strong correlation between the growth in containers and microservice architectures and the adoption of Kubernetes. According to a recent Gartner report, “By 2023, more than 70% of global organizations will be running more than two containerized applications in production, up from less than 20% in 2019.” And Kubernetes usage will continue to grow as companies deepen their commitment to containerization. According to a recent survey of 250 IT professionals conducted by Dimensional Insight, “Well over half (59%) are running Kubernetes in a production environment, with one-third (33%) operating 26 clusters or more and one-fifth (20%) running more than 50 clusters.”

The Kubernetes website is full of case studies of companies from a wide range of verticals that have embraced Kubernetes to address business-critical use cases—from Booking.com, which leveraged Kubernetes to dramatically accelerate the development and deployment of new services; to CapitalOne, which uses Kubernetes as an “operating system” to multiply productivity while reducing costs; and the New York Times, which maximizes its cloud-native capabilities with Kubernetes-as-a-service on the Google Cloud Platform.

This guide looks specifically at how Kubernetes can be used to support data science workloads in general and machine/deep learning in particular. As data science workloads require some specific tooling for their needs, utilizing Kubernetes for deep learning has some challenges that we will identify in this post.

Question 3

How Does Kubernetes Address Data Science Challenges?

Accepted Answer

Containers and the Kubernetes ecosystem have been embraced by developers for their ability to abstract modern distributed applications from the infrastructure layer. Declarative deployments, real-time continuous monitoring, and dynamic service routing deliver repeatability, reproducibility, portability, and flexibility across diverse environments and libraries.

These same Kubernetes features address many of the most fundamental requirements of data science workloads:

Reproducibility across a complex pipeline: Machine/deep learning pipelines consist of multiple stages, from data processing through feature extraction to training, testing, and deploying models. With Kubernetes, research and operations teams can confidently share a combined infrastructure-agnostic pipeline.
Repeatability: Machine/deep learning is a highly iterative process. With Kubernetes data scientists can repeat experiments with full control over all environmental variables including data sets, ML libraries, and infrastructure resources.
Portability across development, staging, and production environments: When run with Kubernetes, ML-based containerized applications can be seamlessly and dynamically ported across diverse environments.
Flexibility: Kubernetes provides the messaging, deployment, and orchestration fabric that is essential for packaging ML-based applications as highly modular microservices capable of mixing and matching different languages, libraries, databases, and infrastructures.

Question 4

Considerations for Successful Kubernetes Architecture for AI Workloads

Accepted Answer

With all of the advantages described above, it is not surprising that Kubernetes has become the de facto container orchestration standard for data science teams. This section provides best practices for optimizing how data science workloads are run on Kubernetes.

Kubernetes Architecture for AI Workloads

AI & Machine Learning Guide

Related Articles

Understanding Kubernetes Architecture for Data Science Workloads

Kubernetes Overview

Kubernetes Architecture

How Does Kubernetes Address Data Science Challenges?

Considerations for Successful Kubernetes Architecture for AI Workloads

Kubernetes Monitoring

Kubernetes Architecture and Run:ai

Learn More About Kubernetes Architecture

See Additional Guides on Key Open Source Topics

Slurm

RTOS

Openshift Container Platform