michelle casbon yow! brisbane december 4, 2018 kubeflow ... · run wherever k8s runs use k8s to...

40
Kubeflow Explained: NLP Architectures on Kubernetes Michelle Casbon YOW! Brisbane December 4, 2018

Upload: others

Post on 09-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

Kubeflow Explained: NLP Architectureson Kubernetes

Michelle CasbonYOW!BrisbaneDecember 4, 2018

Page 2: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

whoami

Page 3: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Agenda

Problems Goals What's inside Demo Future Direction

1 2 3 4 5

Page 4: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Move along

Is this a clearly defined

problem?

Can it be solved in a

deterministic way?

Do that

Dive in

No

No

Yes

YesML decision tree

Credit: David Andrzejewski

Page 5: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Counting things is still really hard.

MACHINELEARNING

Page 6: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

https://github.com/kubeflow/examples/demos

Page 7: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

A curated set of compatible tools and artifacts that lays a foundation for

running production ML appsEnables consistency across deployments by providing Kubernetes object

templates that bring together disparate components

Page 8: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Infrastructure

Application

Platform

GCP

Yelp Sentiment

Kubeflow

GCP

Sentiment

Kubeflow

Page 9: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

2

Agenda

Problems Goals What's inside Demo Future Direction

1 3 4 5

Page 10: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Production code

Page 11: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Moving from local to production

Credit: Jörg Wagner and Stefan Prehn

PortabilityPackage infrastructure components together

GCP

Sentiment

Kubeflow

Page 12: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in
Page 13: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in
Page 14: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

ComplexityGCP

Sentiment

Kubeflow

Page 15: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Perception

Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.

GCP

Sentiment

Kubeflow

Page 16: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Reality

Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.

GCP

Sentiment

Kubeflow

Page 17: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

GCP

Sentiment

Kubeflow

Feature ExtractionData Ingestion

Data Exploration

Data Transformation

Data Validation

Data Analysis

Training Data Segmentation

Model Building

Model Validation

Model Versioning

Model Auditing

Distributed Training

Continuous Training

Process Management

Configuration

Resource Management

Monitoring

Logging

Continuous Delivery

Authentication/ Authorization

Serving Infrastructure

UI

Business Logic

Load Balancing

Data Featurization Training Application Platform

Page 18: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Complexity

ComposabilityLogical groupings

Reusable components

GCP

Sentiment

Kubeflow

Page 19: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Maintainability

● Error resolution, recovery, & prevention● Speed of iteration● Versioning Composability

Shorten the development lifecycle

Automation

GCP

Sentiment

Kubeflow

Page 20: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Capacity Planning

ScalabilityKubernetes

Autoprovisioning

● Usage patterns● Demand spikes● Efficient resource usage

GCP

Sentiment

Kubeflow

Page 21: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

1 2

Agenda

3 4 5

Problems Goals What's inside Demo Future Direction

Page 22: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Make it easy for everyone to develop, deploy, and manage portable,

scalable ML everywhere

Page 23: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Kubeflow

Composability

Single, unified tool for common processes

Portability

Entire stack

Scalability

Native to k8s

Reduce variability between services & environments

Full product lifecycle

Support specialized hardware, like GPUs & TPUs

Reduce costs

Improve model performance

GCP

Sentiment

Kubeflow

Page 24: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

KubeflowWho

Data scientists

ML researchers

Software engineers

Product managers

Why

Because building a platform is too big of a problem to tackle alone

What

Portable ML products on k8s

v0.3.4 release

https://github.com/kubeflow/kubeflow

GCP

Sentiment

Kubeflow

Page 25: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

KubeflowKubernetes-native platform for ML

Run wherever k8s runs

Use k8s to manage ML tasks

CRDs for distributed training

Adopt k8s patterns

Microservices

Manage infra declaratively

Package infrastructure components together

Ksonnet

Move between local -> dev -> test -> prod -> onprem

Support multiple ML frameworks

Tensorflow

Pytorch

Scikit

Xgboost

Et al.

GCP

Sentiment

Kubeflow

Page 26: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

2

Agenda

1 3 4 5

Problems Goals What's inside Demo Future Direction

Page 27: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

But what is it?

Page 28: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Page 29: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

A curated set of compatible tools and artifacts that lays a foundation for

running production ML appsEnables consistency across deployments by providing Kubernetes object

templates that bring together disparate components

Page 30: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

GKEIngress

(e.g. Ambassador)

Pipelines Controllers

ArgoControllers

Katib HP Tuning Controllers

IAP

Central Dashboard

JupyterHub TF Job Dashboard

TF JobOperator

PipelinesDashboard

ArgoDashboard

Katib HP TuningDashboard

Pytorch Operator

What's Inside v0.3?GCP

Sentiment

Kubeflow

Page 31: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Click-to-deploy

GCP

Sentiment

Kubeflow

Page 32: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

What's new in v0.3?● Deploy

○ Click-to-deploy

○ CLI (kfctl)

● Develop

○ Argo

○ Pytorch operator

○ Hyperparameter tuning StudyJob CRD

○ Kubeflow Pipelines

● Autoprovisioning

GCP

Sentiment

Kubeflow

Page 33: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Pipelines● End-to-end ML workflows● Orchestration● Service integration● Components & sharing● Job tracking, experimentation,

monitoring● Notebook integration

GCP

Sentiment

Kubeflow

Page 34: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

2

Agenda

1 3 4 5

Problems Goals What's inside Demo Future Direction

Page 35: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

https://github.com/kubeflow/examples/demos

GKEIngress

(e.g. Ambassador)

Pipelines Controllers

ArgoControllers

Katib HP Tuning Controllers

IAP

Central Dashboard

JupyterHub TF Job Dashboard

TF JobOperator

PipelinesDashboard

ArgoDashboard

Katib HP TuningDashboard

Pytorch Operator

TF Serving

Application UI

Notebook

TF Job

Parameter Server1

TensorFlow Master

TensorFlow

Workers1 2 3Pipeline

StudyJob

GCP

Sentiment

Kubeflow

Page 36: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Try it Yourself● deploy.kubeflow.cloud (Click-to-deploy)● codelabs.developers.google.com

○ Intro to Kubeflow on Google Kubernetes Engine: https://goo.gl/192bs7○ Kubeflow End-to-End: GitHub Issue Summarization: https://goo.gl/qLXUTG

● Qwiklabs: https://qwiklabs.com● Katacoda: https://www.katacoda.com/kubeflow● GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization● http://gh-demo.kubeflow.org

GCP

Sentiment

Kubeflow

Page 37: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

2

Agenda

1 3 4 5

Problems Goals What's inside Demo Future Direction

Page 38: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Roadmap● 0.4 in December

● Enterprise readiness

○ 1.0 with hardened APIs

○ IAM/RBAC

○ Clean deployments, upgrades

● Better Jupyter Notebook integration

● Pipeline experiment comparison

● Model management

● Test release infrastructure

● You tell us! (Or better yet, help!)

GCP

Sentiment

Kubeflow

Page 39: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Just the Beginning● Kubeflow is open

○ Open community○ Open design○ Open source○ Open to ideas

● You tell us! Get involved○ github.com/kubeflow○ kubeflow.slack.com○ @kubeflow○ [email protected]○ Community call Tuesdays alternating

8:30am and 5:30pm Pacific○ Kubeflow Contributor Summit

■ Q1 2019

GCP

Sentiment

Kubeflow

Page 40: Michelle Casbon YOW! Brisbane December 4, 2018 Kubeflow ... · Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns ... What's new in

@texasmichelle

Questions?