Kubeflow Pipelines is a platform designed to help you build and deploy container-based machine learning (ML) workflows that are portable and scalable. Each pipeline represents an ML workflow, and includes the specifications of all inputs needed to run the pipeline, as well the outputs of all components.
This is part of our series of articles about Kubernetes architecture.
In this article, you will learn:
Here are three common use cases for implementation of Kubeflow Pipelines.
Trained models are usually compiled into a single file that sits on a server host or laptop. Next, you copy the file to a machine hosting the application, and load the model to a server process accepting network requests for model inference.
This process becomes complex when there are multiple applications requiring model inference output from a single model, especially when you need to deploy updates and initiate rollbacks.
Kubeflow lets you run updates and rollbacks across multiple applications or servers. You can update your model in one place, and ensure all client applications quickly get the updates, once the update transaction is complete.
ML algorithms need a lot of power in order to quickly run through linear algebra processes. Graphics processing units (GPUs) can meet this demand, but cannot usually be found on regular laptops and desktops.
To gain access to GPUs, data scientists often leverage Jupyter Notebooks in combination with python code and dependency management with container platforms like Docker. However, this process often creates security issues because data is distributed across unauthorized platforms and services.
Kubeflow Pipelines, on the other hand, enable data scientists to build their workflow into a container and execute it in an environment authorized by the security team.
The following diagram illustrates how the Kubeflow Pipelines platform is structured.
Image Source: Kubeflow
The Kubeflow architecture is composed of the following main components and elements:
This quick walkthrough can help you learn how to get started with Kubeflow Pipelines. This process uses a sample that comes with Kubeflow Pipelines.
Step 1:
Deploy Kubeflow on GCP.
Step 2:
Once Kubeflow is running, you need to access the Kubeflow UI.
To access the UI, use this URL:
https://.endpoints..cloud.goog/
Once you access the UI, you should see this dashboard:
Image Source: Kubeflow
Step 3:
Choose Pipelines.
Image Source: Kubeflow
In the pipelines UI, you can find several samples to use as a baseline for quickly launching pipelines. The below walkthrough explains how to run a basic sample with Python operations, but without a ML workload.
Step 1:
In the pipeline UI, locate a sample and choose its name. For example, [Sample] Basic—Parallel Execution.
Step 2:
Choose the Create experiment option.
Step 3:
You will be shown a series of prompts. Follow the instructions to create an experiment. When you complete the process, you can create a run.
Note that each sample provides default values for all required parameters. The following screenshots assume you have already created your experiment, named it “My experiment”. From this step forward, you are working on creating a run, which is named “My first run”.
Image Source: Kubeflow
Step 4:
To create your run, choose the Start option.
Step 5:
Choose the name of the run under Experiments.
Step 6:
You can now view information about the run and drill down into elements of the compute graph.
Image Source: Kubeflow
The below walkthrough explains how to run the XGBoost sample—source code is in Kubeflow Pipelines repo.
Prerequisites:
Create GCP services for the sample.
Step 1:
Enable the standard GCP APIs for Kubeflow, as well as the APIs for Cloud Storage and Dataproc.
Step 2:
To store pipeline results, create a bucket in Google Cloud Storage.
Step 3:
In the pipeline UI, choose the name of the sample: [Sample] ML - XGBoost - Training with Confusion Matrix.
Image Source: Kubeflow
Step 4:
Choose the Create experiment option.
Step 5:
You will be shown a series of prompts. Follow the instructions to create an experiment, and include the following run parameters:
Step 6:
Choose the Start option to create your run.
Step 7:
In the experiments dashboard, choose the name of your run.
Step 8:
You can now explore the graph and other aspects of the run by clicking on different components of the graph and UI. Here is how your pipeline should look like once the run is complete:
Image Source: Kubeflow
Run:AI’s Scheduler is a simple plug-in to Kubernetes clusters and enables optimized orchestration of high-performance containerized workloads. It adds high-performance orchestration to your containerized AI workloads. The Run:AI platform includes:
Run:AI simplifies Kubernetes scheduling for AI and HPC workloads, helping researchers accelerate their productivity and the quality of their work.
Learn more about the Run:AI Kubernetes Scheduler, or explore Kubernetes vs Slurm schedulers.