amazon sagemaker and kubernetes - aws
Post on 12-Dec-2021
8 Views
Preview:
TRANSCRIPT
1© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | 1© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
SageMaker Operators and Components Overview
Alex Chung,Senior Product Manager
Hallie CrosbyService Solutions Architect
Amazon SageMakerand Kubernetes
2© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
AMAZON SAGEMAKER
KUBERNETES ECOSYSTEM
SCALING
Kubernetes Amazon SageMaker
Agenda
Overview of Amazon SageMaker
Adopting SageMaker
Overview of open source routes to SageMaker
Scaling ML with SageMaker
Resources to get started
2© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
3© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
The AWS ML stackBroadest and most complete set of Machine Learning capabilities
VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS
GroundTruth
AWS Marketplace
for ML
Neo Augmented AIBuilt-in
algorithms Notebooks Experiments ProcessingModel
training& tuning
Debugger Autopilot Modelhosting Model Monitor
Deep LearningAMIs & Containers
GPUs &CPUs
ElasticInference Inferentia FPGA
AmazonRekognition
AmazonPolly
AmazonTranscribe
+Medical
AmazonComprehend
+Medical
AmazonTranslate
AmazonLex
AmazonPersonalize
AmazonForecast
AmazonFraud Detector
AmazonCodeGuru
AI SERVICES
ML SERVICES
ML FRAMEWORKS & INFRASTRUCTURE
AmazonTextract
AmazonKendra
ContactLens
For Amazon Connect
SageMaker Studio IDE
AmazonSageMaker
DeepGraphLibrary
scikit
4© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
The machine learning workflow is iterative and complex
Collect and prepare training data
Choose or bring yourown ML algorithm
Set up and manage environments for training
Train, debug, and tune models
Managetraining runs
Deploy modelin production
Monitormodels
Validate predictions
Scale and manage the production environment
5© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Use Amazon SageMaker to train and deploy models into production
Collect and prepare training dataFully managed data processing jobs/data labeling workflows
Choose or bring yourown ML algorithmCollaborative notebooks,
built-in algorithms/models
Set up and manage environments for training
One-click training
Train, debug, and tune models
Debugging andoptimization
Managetraining runsVisually track and
compare experiments
Deploy modelin productionOne-click deploymentand auto-scaling
MonitormodelsAutomaticallyspot concept drift
Validate predictionsAdd human reviewof predictions
Scale and manage the production environmentFully managed withauto-scaling for 75% less
WEB-BASED IDE FOR ML ML OPS LIFECYCLE
6© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Experimentation, model development, and BYOS
Use script mode to build containers quickly that can be deployed in prod (or bring your own container)
Use SM Hosting for deployment of models
Experiment using SageMaker Studio and Experiments Manager
Iterate on new models when business use case changes
7© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker StudioFully integrated development environment (IDE) for Machine Learning
Collaboration at scaleWithout tracking code dependencies
Easy experiment managementOrganize, track, and compare thousands of experiments
Automatic model generationFull visibility and control without writing code
Higher quality ML modelsAutomatically debug errors, monitormodels, and maintain high quality
Increased productivityCode, build, train, deploy, and monitorin a unified visual interface
8© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker ExperimentsOrganize, track, and compare training experiments
Tracking at scaleTrack parameters and metrics across experiments and users
Custom organizationOrganize experiments by teams, goals, and hypotheses
VisualizationEasily visualize experiments and compare
Metrics and loggingLog custom metrics using the Python SDK and APIs
Fast iterationQuickly go back and forth, and maintain high-quality
9© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Use Amazon SageMaker Experiments totrack and manage thousands of experiments
10© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Pain points of self-managed ML PlatformsThe following are all obstacles to the core goal of building best-in-class models that solve business problems
Configuring proper scaling of compute has a learning curve
Right sizing instances for cost-efficiency is hard
Kubeflow needs additional configuration to use GPU or CPU nodes optimally
Libraries and toolkits need to be regularly updated, which increases technical debt that later needs to be paid off
Setting up k8s without prior experience is challenging
Additional management burden for the ops team
11© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
EXAMPLE CHALLENGE:Scaling Machine Learning
Single-instance Compute (CPU or GPU) Scaling to multiple instances to maximize performance
CLI
Cluster
…
…
…CLI
EC2 instance
12© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
SageMaker provides the building blocks for scalable machine learning
Includes over a dozen first party algorithms, such as XGBoost
Convert existing containers with minimal changes to run in SageMaker
Deep Learning Containers provide the base layer of Apache MXNet, PyTorch, and TensorFlow frameworks
Integrated Debugger
Ground Truth data labelling
Compute is entirely managed by SageMaker. You specify parameters, instances, etc., and SageMaker makes it happen
One field change for Spot Instances
13© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Common IT constraints where SageMaker can still provide managed ML services
Hybrid cloud mandates
Portability requirements of application stack
Prior technology investments such as DIY ML platforms
On-premise data restrictions
14© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Scaling ML with Amazon SageMaker from Kubernetes
Amazon SageMakerOperators for Kubernetes2
Amazon SageMaker Componentsfor Kubeflow Pipelines1
Pipelines
15© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
ARCHITECTUREKubernetes
Pod Pod Pod Pod Pod Pod Pod Pod Pod
Developer
kubectl
YAML
kubelet
Worker Node
Container runtime Kube-proxy kubelet
Worker Node
Container runtime Kube-proxy kubelet
Worker Node
Container runtime Kube-proxy
API Server
Scheduler
Controller Manager Etcd
Master
16© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Kubeflow Pipelines
End-to-end ML workflow orchestration
Experimentation and managing various trials/experiments
Re-useable componentsand pipelines to createend-to-end solutions without having to rebuild each time
17© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
SageMaker Components for Kubeflow Pipelines helps modularize your code
For each step you can develop code, package it into a container (or let SM package it for you), and have that be a default run that anyone within your company can mix and match
Batch transform
Training Model deployment and updates
Ground truth data labelling
Processing
18© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
SageMaker + Kubeflowfor Machine Learning
Amazon SageMaker
Model development
Modeltraining
Model deployment
Datapreparation
19© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Portability story and SageMaker (BYOC/BYOS)
Code and containers that run in SageMaker can run anywhere
Models developed in Kubeflow can be submitted to SageMaker for managed execution
Using opensource KFP components and Kubernetes operators, you can swap back to Kubernetes at any time
20© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
BYOC can run in standard K8s environment
Model code Amazon S3 Amazon SageMaker
BYOC Container Amazon Elastic Kubernetes Service
BYOC
Code can run in any generic container that you build yourself
BYOC is model code uploaded to S3 that then gets ingested by SageMaker
BYOC container built for SageMaker can be run in Kubernetes without SageMaker
21© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMakeraccessible from Kubernetes
KubeflowHybrid infrastructures Portability Composability
Amazon SageMakerFully-managed infrastructure
Ground Truthlabeling
Automatic model turning
Built-in optimized algorithms
Managed Spot Training
Scalable inference endpoints
Modelmonitoring
Easy to doscalability
22© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
SageMakerComponents VS SageMaker
Operators
KubeFlow Pipeline Components ARCHITECTURE Kubernetes Operator custom resources
Yes KUBERNETES Yes
Self-hosted Kubeflow Pipelines ORCHESTRATION Kubernetes tools (Ex. Flyte, Argo)
Python DEV INTERFACE YAML/custom extension by customer
KFP dashboard GUI None/custom
Medium EASE OF USE Advanced
23© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | 23© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
RESOURCES
Getting startedCloudFormation Quick Start
SageMaker Components for Kubeflow Pipelines
SageMaker Operators for Kubernetes
24© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Multiple ways to get started
Opensource standard APIs—K8s operators,
KFP components
Quickstart template Examples in Github repo
25© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker Componentsfor Kubeflow Pipelines
Kubeflow Pipelines
Amazon SageMaker
Other component
Pipeline step
Other component
Pipeline stepPipeline step
Component
Metadata
Input/output
Implementation(container)
Amazon EC2 Container registry
26© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Adapt your Container for SageMaker training
1. Switch to SageMaker maintained DeepLearning Container as a base or pip install sagemaker-trainingFROM tensorflow/tensorflow:2.2.0rc2-gpu-py3-jupyter
2. Place training code in /opt/ml directory COPY train.py /opt/ml/code/train.py
3. Defines train.py as script entrypointENV SAGEMAKER_PROGRAM train.py
https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html
27© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
ARCHITECTUREOperator
Pod Pod Pod
Developer
kubectl
YAML
kubelet
Worker Node
Container runtime Kube-proxyOperator
API Server Scheduler
Master
Etcd
28© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
SageMaker Operators
KEY FEATURES
Amazon SageMaker Operatorsfor training, tuning, inference
Natively interact with Amazon SageMaker jobs using Kubernetes tools (e.g., get pods, describe)
Stream and view logs fromAmazon SageMaker in Kubernetes
Helm Charts to assist withsetup and spec creation
Kubectl apply
YAML
Kubernetes
Amazon SageMaker Operator
API Server
Amazon SageMaker
29© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker Operators for Kubernetes
Train, tune, and deploy models in Amazon SageMaker without leaving Kubernetes environment
Use Kubernetes kubectl CLI to submit Amazon SageMaker jobs:
• Training jobs
• Hyperparameter tuning jobs
• Hosting deployments
• Batch transform jobs
30© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Summary
SageMaker provides a fully managed service for authoring ML (Studio, Notebooks) and compute infrastructure
Companies may have hybrid(on-premise and cloud) use cases can use SageMaker with existing Kubernetes tools
Customers may haveexisting infrastructure
on Kubernetes
Enterprises may want toadopt an open-source
ML platform or lookingfor multi-cloud strategy
31© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
ADDITIONALResources
ONLINE WORKSHOP
https://eksworkshop.com/advanced/420_kubeflow/pipelines/
DOCUMENTATION
https://www.kubeflow.org/docs/aws/
BLOGS
https://towardsdatascience.com/kubernetes-and-amazon-sagemaker-for-machine-learning-distributed-training-hyperparameter-tuning-187c821e25b4
https://towardsdatascience.com/kubernetes-and-amazon-sagemaker-for-machine-learning-best-of-both-worlds-part-1-37580689a92f
32© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Train and Deploy Detectron2 object detection model using Amazon Sagemaker Components
• Used Mask-RCNN model from Detectron2 model zoo trained on COCO2017 dataset.
• Further fine-tune this model on custom dataset with aerial imagery.
• Drone images from TuGraz university
• Goal: Detect people from high vantage point.
• Code to Reproduce: https://github.com/HallieCrosby/detectron2/
33© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Amazon SageMaker Components for Kubeflow Pipelines
Kubeflow Pipeline
Component
Training Job Create Model Deploy Model
Input/Output
Implementation(container)
Metadata
Container registry
SageMaker
Component
Input/Output
Implementation(container)
Metadata
SageMakerContainer registry
Component
Input/Output
Implementation(container)
Metadata
SageMakerContainer registry
34© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Thank you!
35© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
What is Kubeflow?Kubeflow is a machine learning toolkit for Kubernetes
Cloud/on-prem
INFRASTRUCTURE
Modelling, training, tuning, serving…
ML WORKLOADS
KUBERNETES
KUBEFLOW
36© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
AWS ML infrastructure and services
Jupyternotebook instances
High performance algorithms
Large-scaletraining
Optimization One-click deployment
Fully managed with auto-scaling
ML servicesFully-managed service that
covers the entire machine learning workflow
Amazon SageMaker
Image registryContainer image repository
Amazon Elastic Container Registry (Amazon ECR)
ManagementDeployment, scheduling,
scaling, and management of containerized applications
Amazon Elastic Container Service (Amazon ECS)
Amazon ElasticKubernetes Service (Amazon EKS)
ComputeWhere the containers run
Amazon EC2
top related