The Ultimate Guide to Machine Learning Operations

What is MLOps?

Machine learning operations (MLOps) is the practice of creating new machine learning (ML)  and deep learning (DL) models and running them through a repeatable, automated workflow that deploys them to production.

MLOps was inspired by DevOps. The DevOps movement defined a new, agile software development lifecycle (SDLC), which encouraged frequent innovation. Developers work on small, frequent releases, each of which undergoes automated testing and is automatically deployed to production.

Similarly, MLOps defines a new lifecycle for AI technology that allows rapid experimentation, in response to business need or live model performance, and seamless deployment of new models as a predictive service.

An MLOps pipeline provides a variety of services to data science teams, including model version control, continuous integration and continuous delivery (CI/CD), model service catalogs for models in production, infrastructure management, monitoring of live model performance, security, and governance.

This is part of an extensive series of guides about AI Technology.

In this article:

A view of the MLOps tools landscape today, by Neptune.AI

The Need for MLOps

MLOps started as a set of best practices to improve the communications between data scientists and DevOps teams—promoting workflows and processes that could accelerate the time to market for ML applications. Soon, open source MLOps frameworks began to emerge, such as MLflow and Kubeflow.

Today, MLOps capabilities are considered a key requirement for Data Science and Machine Learning (DSML) platforms. Gartner’s “2020 Magic Quadrant for Data Science and Machine Learning Platforms” cites MLOps as a key inclusion criterion, noting that “…[a]s DSML moves out of the lab and into the mainstream, it must be operationalized with seamless integration and carefully designed architecture and processes. Machine learning operations capabilities should also include explainability, versioning of models and business impact analysis, among others.” (Source: A report reprint, available to Gartner subscribers only.)

As shown in the diagram below, the next-generation data science lifecycle breaks down the silos among all the different stakeholders that need to be involved for ML projects to capture business value. This involves:

  • Data science teams performing modeling and data acquisition activities with a clear understanding of the business objectives
  • ML development that is closely aligned with governance and compliance requirements.
  • Data science, production, and operations teams working seamlessly together, with thorough testing of model iterations before deployment to production.
  • Automating ML workflows to ensure smooth deployments
  • Monitoring ML deployments on an ongoing basis, reflecting performance back to the data science team so that they can tune and improve the model.
Machine Learning Operations
Figure 1: MLOps Drives Data Science Success and Value. (Source: Azure)

Benefits of MLOps

MLOps is the critical missing link that allows IT to support the highly specialized infrastructure requirements of ML infrastructure. The cyclical, highly automated MLOps approach:

  • Reduces the time and complexity of moving models into production.
  • Enhances communications and collaboration across teams that are often siloed: data science, development, operations.
  • Streamlines the interface between R&D processes and infrastructure, in general, and operationalizes the use of specialized hardware accelerators (such as GPUs), in particular.
  • Operationalizes model issues critical to long-term application health, such as versioning, tracking, and monitoring.
  • Makes it easier to monitor and understand ML infrastructure and compute costs at all stages, from development to production.
  • Standardizes the ML process and makes it more auditable for regulation and governance purposes.

DevOps vs MLOps

MLOps was inspired by DevOps, and the two approaches are inherently similar. However, there are a few ways in which MLOps differs significantly from DevOps:

  • MLOps is experimental in nature - most of the activity of data science teams relates to experimentation. Teams constantly change features of their models to achieve better performance, while also managing an evolving codebase.
  • Hybrid teams - data science teams include both developers (machine learning engineers) and data scientists or researchers who analyze data and develop models and algorithms. The latter might not be experienced at software development.
  • Continuous testing (CT) - in addition to the regular testing stages of a DevOps pipeline, such as unit tests, functional tests and integration tests, an MLOps pipeline must also continually test the model itself - training it and validating its performance against a known dataset.
  • Automatic retraining - in most cases, a pre-trained model cannot be used as-is in production. The model needs to be retrained and deployed on an ongoing basis. This requires automating the process data scientists go through to train and validate their models.
  • Performance degradation - unlike regular software systems, even if a model is working perfectly, performance can degrade over time. This can happen due to unexpected characteristics of data consumed by the model, differences between training and inference pipelines, and unknown biases which can grow with each feedback loop.
  • Data monitoring - it is not sufficient only to monitor a model as a software system. MLOps teams also need to monitor the data and predictions, to see when the model needs to be refreshed or rolled back.

MLOps Maturity: Three Levels of MLOps

This discussion of MLOps maturity is based on a framework by Google Cloud.

MLOps Level 0: Manual Process

At this level of maturity, a team is able to build useful ML/DL models, but have a completely manual process for deploying them to production. The ML pipeline looks like this:

  • All steps in the pipeline are manual or based on experimental code executed in Jupyter Notebooks - including data analysis, preparation, training, and validation.
  • Data scientists work separately from engineers who deploy the final prediction service. The data science team provides a trained model to ML engineers, who are responsible for making it available as an API at low latency. The differences between experimental environments and production environments can lead to training-serving skew.
  • Models are not frequently released. The assumption is that the data science team has finished working on the model and it can now be deployed to production.
  • There is no CI/CD because the model is not planned to change on a regular basis. So at this level of maturity, there is no consideration of automated building of model code (CI) or automated deployment of a prediction service to production (CD).
  • There is no regular monitoring of model performance - under the assumption that the model will deliver consistent performance with new data.

MLOps Level 1: ML Pipeline Automation

At this level of maturity, there is an understanding that the model needs to be managed in a CI/CD pipeline, and training/validation needs to be performed continuously on new data. The ML Pipeline now evolves to look like this:

  • Experiments can happen much faster, due to orchestration of the entire ML process. Data scientists can think of a hypothesis and rapidly deploy it to production.
  • The model is continuously tested and re-trained with fresh data, based on feedback from live model performance.
  • The same setup is used in the experimental environment as in the production environment, to eliminate training-serving skew.
  • All components used to build and train the model are reusable and shareable across multiple pipelines.

MLOps Level 2: Full CI/CD Pipeline Automation

At this highest level of MLOps maturity, new experiments are seamlessly deployed to production with minimal involvement of engineers. A data scientist can easily create a new ML pipeline and automatically build, test, and deploy it to a target environment. This type of setup is illustrated in the following diagram.

Image Source: Google Cloud

A fully automated CI/CD pipeline works like this:

  1. Teams come up with new models and experiments, and generate source code that describes their efforts.
  2. Source code is automatically built by the CI engine, which runs automated tests. It generates artifacts that can be deployed at later stages.
  3. The pipeline deploys the artifacts to the target environment, which now has a fully functional new version of the model.
  4. The pipeline executes automatically based on a trigger, and the result is pushed to a model registry.
  5. The trained model is deployed and enables live predictions with low latency.
  6. The pipeline collects statistics on live model performance. Data scientists can evaluate this data and based on mode performance start a new experiment cycle (back to step 1).

Implementing MLOps in Your Organization

Here are a few steps to implementing MLOps in your organization.

Establish Hybrid Teams

To succeed in MLOps, establish a hybrid team including some or all of the following roles. All these roles should work together, assuming shared ownership for ML models working effectively in production:

  • Data scientists
  • Machine learning engineers
  • DevOps engineers
  • Data engineers

To form a true cross-functional team, each of these roles should have at least some of the skills of the other roles. Data scientists should be able to code and know the basics of DevOps; machine learning engineers should understand the experimentation process; and DevOps or data engineers should be familiar with machine learning concepts and should not treat models as a black box.

Build ML Pipelines

ML pipelines are the “factory floor” of a data science team. Ensure your ML pipeline includes:

  • Full machine learning data pipeline - takes raw training datasets and performs the transformations necessary to use them as model inputs. This substitutes for ad-hoc data transformations that were traditionally performed manually, via scripts, or in Jupyter notebooks.
  • Two versions of the pipeline - a training pipeline and a serving pipeline. This is because training data has different characteristics than real time requests. For example, the training pipeline might process all data features, while a serving pipeline extracts some features from a user request and retrieves the rest from a database.
  • MLOps pipeline packaged as code artifact - this allows the MLOps team to iterate over multiple versions of the pipeline, improving it to fix bugs and adapt it to changing requirements.

Model and Data Versioning

Ensure you track everything in the pipeline using version control. An MOps pipeline has two parallel versioning systems:

  • Versions of model code - this reflects how the model is implemented, trained, and how it performs inference in production.
  • Versions of model data and parameters - including datasets used, model hyperparameters, and running parameters.

Each version of the model should be tied to a version of model code, giving the MLOps team a clear audit trail showing what ran where. This way, if a specific version of the model resulted in great performance, or conversely performed poorly, it can be tied back to specific data, parameters, and implementation code.

Model Validation

Ensure the MLOps pipeline automatically validates mode performance. In a DevOps environment, software undergoes automated testing to see if it is good enough to run in production. This testing is usually of a “pass/fail” nature. An MLOps pipeline, by contrast, needs to test a model’s performance and determine if it is “good enough” to run in production.

Model validation typically involves:

  • Evaluating a model against labelled datasets
  • Testing it against different segments of the data to see it performs consistently
  • Comparing its performance to previous versions or known benchmarks

All this can be done automatically, and if the model passes a certain threshold, it is deployed. In other cases, data scientists can review model results and make a qualitative decision whether to push it to production or not.

Data Validation

It is not enough to validate the model - you must also automatically validate your datasets. An MLOps pipeline must validate that the data used to train the model has the required characteristics. This is similar to unit testing in a traditional DevOps pipeline. Use automated checks to verify that the data is of the correct format, there are no missing values (if none are expected), and there should be standardized tests of data quality.

Another important check is to compare the data to previous training runs. If the statistical properties of the data changed (for example, the mean or distribution is significantly different), this can affect the model’s predictions. This might mean the data is skewed, or that model inputs are really changing, and the model needs to change to adapt.


Ensure you monitor regular operational properties such as latency, system load and errors, and in addition, monitor the performance of your product ML models:

  • It is best to evaluate production data against labelled data.
  • If this is not possible, you can check model performance indirectly, for example by measuring user interaction. For example, in an ML-based spelling checker, if a user accepts a spelling correction, it means the model was correct.
  • Whatever the method, find a consistent method you can track over time, and set alerts if your chosen metric goes above or below a reasonable threshold.

Learn more about cutting edge MLOps tools in our guides to:

Stay Ahead of the ML Curve with Run:ai

In today’s highly competitive economy, enterprises are looking to Artificial Intelligence in general and Machine and Deep Learning in particular to transform big data into actionable insights that can help them better address their target audiences, improve their decision-making processes, and streamline their supply chains and production processes, to mention just a few of the many use cases out there. In order to stay ahead of the curve and capture the full value of ML, however, companies must strategically embrace MLOps.

Run:ai’s AI/ML virtualization platform is an important enabler for Machine Learning Operations teams. Focusing on deep learning neural network models that are particularly compute-intensive, Run:AI creates a pool of shared GPU and other compute resources that are provisioned dynamically to meet the needs of jobs in process. By abstracting workloads from the underlying infrastructure, organizations can embrace MLOps and allow data scientists to focus on models, while letting IT teams gain control and real-time visibility of compute resources across multiple sites, both on-premises and in the cloud.

See for yourself how Run:AI can operationalize your data science projects, accelerating their journey from research to production.

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Learn More About MLOps

Apache Airflow: Use Cases, Architecture, and Best Practices

Apache Airflow is an open-source platform for authoring, scheduling and monitoring data and computing workflows.

Understand how Apache Airflow can help you automate workflows for ETL, DevOps and machine learning tasks.

Read more: Apache Airflow: Use Cases, Architecture, and Best Practices

Edge AI: Benefits, Use Cases, and Deployment Models

Edge computing helps make data storage and computation more accessible to users. This is achieved by running operations on local devices like laptops, Internet of Things (IoT) devices, or dedicated edge servers. Edge processes are not affected by the latency and bandwidth issues that often hamper the performance of cloud-based operations. 

Learn how edge AI is making real-time AI inference a reality for mobile devices, IoT, video analytics, and more. 

Read more: Edge AI: Benefits, Use Cases, and Deployment Models

JupyterHub: A Practical Guide

Jupyter Notebook is an open source application, used by data scientists and machine learning professionals to author and present code, explanatory text, and visualizations. JupyterHub is an open source tool that lets you host a distributed Jupyter Notebook environment. 

Learn how JupyterHub works in depth, see two quick deployment tutorials, and learn to configure the user environment.

Read more: JupyterHub: A Practical Guide

MLflow: The Basics and a Quick Tutorial

MLflow is an open source platform for managing machine learning workflows. It is used by machine learning engineering teams and data scientists. MLflow has four main components:

Understand MLflow tracking, projects, and models, and see a quick tutorial showing how to train a machine learning model and deploy it to production.

Read more: MLflow: The Basics and a Quick Tutorial