Amazon SageMaker is an end-to-end, cloud-based machine learning service. It allows you to:
SageMaker integrates with Jupyter Notebook, letting you set up managed notebook instances and easily connect them to data sources for exploration. It provides many common machine learning algorithms optimized to run efficiently in a distributed environment, and also lets you bring your own algorithms and frameworks you use today.
SageMaker provides a UI interface called SageMaker Studio that lets you manage the entire machine learning workflow for your models, and the SageMaker console which lets you create new notebook instances and operate the SageMaker service.
We’ll provide a tutorial showing how to bring your own algorithm and deploy it via the SageMaker console, without using SageMaker Studio.
Related content: Read our guide to AWS SageMaker
In this article:
Before we dive into the tutorial, it can be useful to understand the main building blocks of AWS SageMaker: preparation, training and deployment.
Amazon SageMaker creates managed compute instances in the Elastic Compute Cloud (EC2), pre-configured for machine learning projects. These instances support the open source Jupyter Notebook application, which is commonly used by data scientists to author and share code for their models.
SageMaker notebook instances come with everything you need to connect to your machine learning toolset, including drivers, packages and libraries for deep learning and machine learning frameworks. The default version of the notebook instance comes with many common algorithms including statistical models, natural language processing (NLP) and computer vision. You can customize the instance’s configuration to suit your specific needs.
A useful capability of SageMaker instances is that they can easily accept code previously developed in a supported machine learning framework. Developers can package the code in a Docker container and add it seamlessly to the instance. SageMaker can also pull data from Amazon S3, making it possible to work with datasets of any size.
Related content: Read our guide to SageMaker Notebooks
SageMaker makes it possible to specify location of training data in an Amazon S3 bucket, set a preferred instance type, and SageMaker automatically starts the training process, transforming data to facilitate feature engineering. SageMaker Model Monitor automatically monitors and tunes the model to find the parameters or hyperparameters that maximize performance.
When a model is trained and ready for inference, SageMaker automatically deploys it on Amazon infrastructure and scales it as needed. Amazon provides several SageMaker instance types with graphics processing units (GPUs) optimized for machine learning inference.
SageMaker takes care of several operational concerns for ML models in production:
After a model is running in production, SageMaker allows you to fine tune and improve the model in a continuous cycle:
Related content: Read our guide to SageMaker Pipelines
In this tutorial, we’ll show how to use Amazon SageMaker to build, train, and deploy a machine learning (ML) model using the XGBoost ML algorithm. XGBoost is an ensemble algorithm based on decision trees and a gradient boosting framework. It is the evolution of traditional decision trees and random forest models.
This tutorial is abbreviated from the official SageMaker Hands-on Tutorial.
In the Amazon SageMaker console, select a region and click Create notebook instance.
Image Source: AWS
Select your instance size, and under the Permissions and encryption section, under IAM role, click Create a new role and select Any S3 bucket.
Once your new notebook instance starts, click Open Jupyter and select conda_python3
Image Source: AWS
In your Jupyter notebook, add code cells that perform the following preparation steps:
Import the required libraries
Get the code for these preparation steps in the full tutorial, step 2.
We create a new code cell in the Jupyter notebook and copy this code, which converts the CSV into a format suitable for training the model:
We’ll create another code cell and copy this code, which sets up an XGBoost estimator and defines its hyperparameters. In a real project you can tweak hyperparameters to see which give you the best results.
We’ll create another code cell and copy this code to start the training:
We now have a tained XGBoost model, ready to deploy to production. We’ll create another code cell and use this code to create a SageMaker endpoint running our new trained model:
xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')
Finally, let’s create one more code cell that uses the XGBoost model to predict whether customers listed in the CSV file will buy a bank product or not:
That’s it! We launched a SageMaker instance, loaded training data, trained a model and deployed it to production.
When using AWS SageMaker, your organization might run a large number of machine learning experiments requiring massive amounts of computing resources. Run:AI automates resource management and orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments and inference workloads as needed.
Here are some of the capabilities you gain when using Run:AI:
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run.ai GPU virtualization platform.