Question 1

Machine Learning Workflow

Accepted Answer

Machine learning workflows define which phases are implemented during a machine learning project. The typical phases include data collection, data pre-processing, building datasets, model training and refinement, evaluation, and deployment to production. You can automate some aspects of the machine learning operations workflow, such as model and feature selection phases, but not all.

While these steps are generally accepted as a standard, there is also room for change. When creating a machine learning workflow, you first need to define the project, and then find an approach that works. Don’t try to fit the model into a rigid workflow. Rather, build a flexible workflow that allows you to start small and scale up to a production-grade solution.

This is part of our series of articles about machine learning engineering.

In this article, you will learn:

Core phases of machine learning workflows
Machine learning workflow best practices
How to automate machine learning workflows

Question 2

Understanding the Machine Learning Workflow

Accepted Answer

Machine learning workflows define the steps initiated during a particular machine learning implementation. Machine learning workflows vary by project, but four basic phases are typically included. Gathering machine learning data Gathering data is one of the most important stages of machine learning workflows. During data collection, you are defining the potential usefulness and accuracy of your project with the quality of the data you collect. To collect data, you need to identify your sources and aggregate data from those sources into a single dataset. This could mean streaming data from Internet of Things sensors, downloading open source data sets, or constructing a data lake from assorted files, logs, or media. Data pre-processing Once your data is collected, you need to pre-process it. Pre-processing involves cleaning, verifying, and formatting data into a usable dataset. If you are collecting data from a single source, this may be a relatively straightforward process. However, if you are aggregating several sources you need to make sure that data formats match, that data is equally reliable, and remove any potential duplicates. Building datasets This phase involves breaking processed data into three datasets—training, validating, and testing: Training set—used to initially train the algorithm and teach it how to process information. This set defines model classifications through parameters. Validation set—used to estimate the accuracy of the model. This dataset is used to finetune model parameters. Test set—used to assess the accuracy and performance of the models. This set is meant to expose any issues or mistrainings in the model. Training and refinement Once you have datasets, you are ready to train your model. This involves feeding your training set to your algorithm so that it can learn appropriate parameters and features used in classification. Once training is complete, you can then refine the model using your validation dataset. This may involve modifying or discarding variables and includes a process of tweaking model-specific settings (hyperparameters) until an acceptable accuracy level is reached. Machine learning evaluation Finally, after an acceptable set of hyperparameters is found and your model accuracy is optimized you can test your model. Testing uses your test dataset and is meant to verify that your models are using accurate features. Based on the feedback you receive you may return to training the model to improve accuracy, adjust output settings, or deploy the model as needed. Learn more about Machine Learning Engineering.

Question 3

What Are the Machine Learning Best Practices for Efficient Workflows?

Accepted Answer

When defining the workflow for your machine learning project, there are several best practices you can apply. Below are a few to start with.

Define the project

Carefully define your project goals before starting to ensure your models add value to a process rather than redundancy. When defining your project, consider the following aspects:

What is your current process—typically models are designed to replace an existing process. Understanding how the existing process works, what its goals are, who performs it, and what counts as success are all important. Understanding these aspects lets you know what roles your model needs to fill, what restrictions might exist in implementation, and what criteria the model needs to meet or exceed.
What do you want to predict—carefully defining what you want to predict is key to understanding what data you need to collect and how models should be trained. You want to be as detailed as possible with this step and make sure to quantify results. If your goals aren’t measurable you’ll have a hard time ensuring that each is met.
What are your data sources—evaluate what data your current process relies on, how it’s collected and in what volume. From those sources, you should determine what specific data types and points you need to form predictions.
Find an approach that works

The goal of implementing machine learning workflows is to improve the efficiency and/or accuracy of your current process. To find an approach that achieves this goal you need to:

Research—before implementing an approach, you should spend time researching how other teams have implemented similar projects. You may be able to borrow methods they used or learn from their mistakes, saving yourself time and money.
Experiment—whether you have found an existing approach to start from or created your own, you need to experiment with it. This is essentially the training and testing phases of your model training.
Build a full-scale solution

When developing your approach, your end result is typically a proof-of-concept. However, you need to be able to translate this proof into a functional product to meet your end goal. To transition from proof to deployable solution, you need the following:

A/B testing—enables you to compare your current model with the existing process. This can confirm or deny whether your model is effective and able to add value to your teams and users.
Machine learning API—creating an API for your model implementation is what enables it to communicate with data sources and services. This accessibility is especially important if you plan to offer your model as a machine learning service.
User-friendly documentation—includes documentation of code, methods, and how to use the model. If you want to create a marketable product it needs to be clear to users how they can leverage the model, how to access its results, and what kind of results they can expect.

Question 4

What is Automated Machine Learning?

Accepted Answer

AutoML essentially applies existing machine learning algorithms to the development of new models. Its purpose is not to automate the entire process of model development. Instead, it is to reduce the number of interventions that humans must make to ensure successful development.

AutoML helps developers get started with and complete projects significantly faster. It also has potential to improve deep learning and unsupervised machine learning training processes, potentially enabling self correction in developed models.

Question 5

What Can You Automate?

Accepted Answer

While it would be great to be able to automate all aspects of machine learning operations, this currently isn’t possible. What can be reliably automated includes:

Hyperparameter optimization—uses algorithms like grid search, random search, and Bayesian methods to test combinations of pre-defined parameters and find the optimal combination.
Model selection—the same dataset is run through multiple models with default hyperparameters to determine which is best suited to learn from your data.
Feature selection—tools select the most relevant features from pre-determined sets of features.

Machine Learning Workflow

Streamlining Your ML Pipeline

Related Articles

Understanding the Machine Learning Workflow

What Are the Machine Learning Best Practices for Efficient Workflows?

Automating Machine Learning Workflows

What is Automated Machine Learning?

What Can You Automate?

3 Frameworks You Can Use to Automate Machine Learning Workflows

Machine Learning Workflow Automation With Run:AI

See Our Additional Guides on Key Artificial Intelligence Infrastructure Topics

GPUs for Deep Learning

Kubernetes and AI