managing ml models @ scale - usenix · 2020. 7. 18. · •any customer data that is used in the ml...

28
Srivathsan Canchi Tobias Wenzel Managing ML Models @ Scale Intuit’s ML Platform

Upload: others

Post on 14-May-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Srivathsan CanchiTobias Wenzel

Managing ML Models @ ScaleIntuit’s ML Platform

Page 2: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

ConsumersSmall businesses

Self-employed

Who we serve

Page 3: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

ML in Action

Page 4: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Forecasting cash flow

in QuickBooks

ML driven experiences

Automatic categorizationin Mint

Self-helpin TurboTax

Page 5: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Managing Models in Production is hard

Page 6: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 6

We spend more time bringing the model to production than developing and training it

— Data Scientists, 2018

Page 7: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 7

Data Science

ML Operationalization

Infrastructure Management

Model Development

Feature Engineering

Model Metadata

Orchestration

Deployment

Cloud Infrastructure

Data Processing

Security

Feature Store

“Hidden Debt” of ML

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Page 8: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 8

Data Science

ML Operationalization

Infrastructure Management

Model Development

Feature Engineering

Model Metadata

Orchestration

Deployment

Cloud Infrastructure

Data Processing

Security

ML Practitioners

ML Platform

Feature Store

How can ML Practitioners focus on their craft?

Page 9: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 9

Collaboration SecuritySelf-Service

Principles of Intuit ML Platform

Page 10: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 10

Data Scientist

“I can quickly train and deploy models that improve the customer experience”

Product Manager

“I can easily run experiments that rely on models to get a customer benefit”

Marketer

“I can easily target particular users

that have a certain characteristics”

ML Engineer

“I can quickly iterate and make models performant”

Data Analyst

“I can quickly train and deploy models

that inform stakeholders how

the business is doing”

Powering insights and experiences with ML

Page 11: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 11

Explore Train Infer

(Online and Batch)

Model Efficacy

Feedback

Inge

st a

nd C

urat

e D

ata

Model Lifecycle Management

Data Processing Infrastructure

Orchestration Tools

Data and Model Metadata

Feature Processing

Feature Store

Online &Offline

Evaluate

Generic Model Lifecycle

Page 12: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 12

Explore Train Infer

(Online and Batch)

Model Efficacy

Feedback

Ing

est a

nd C

urat

e D

ata

Model Lifecycle Management

Data Processing Infrastructure

Orchestration Tools

Data and Model Metadata

Feature Processing

Feature Store

Online &Offline

Evaluate

Intuit’s ML Platform Today

Page 13: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 13

Technologies in use

AWS SageMaker Kubernetes Argo

Page 14: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 14

Feature Store

Feature Store

Online(DynamoDB)

Offline(S3)

Ingestion

● Store Features in offline (durable) and online (transactional) stores● Metadata about Features● Find and re-use Features● Feature access during model inference and model training

Feature Metadata

Feature Processors

Model Training

Model Execution (Online, Batch)

APIs (Feature Management)

Feature Exploration(via Exploration tools)

SDKs(Ingestion, Consumption)

Page 15: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 15

Batch

Streaming

Feature Processing

Beam Feature Processor

Online(DynamoDB)

Offline(S3)

Kafka

Kafka

Data LakeSparkFeature Processor

Page 16: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 16

Model Training

Page 17: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 17

Model Training

Page 18: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 18

Self Service Interfaces

non-prod environments for

customersStatistics about

productivity increases

Page 19: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 19

Automate the simple things

Single-click Deploy

Model’s Runtime Status

Live Metrics, Alerts

Versioning

Running Performance Tests

Page 20: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 20

Platform success = Lots of experiments

Cost Transparency• Upfront pricing

• Cost dashboards

Cost Assignment• Tagging

• Business Unit assignments

– Platform ensures correct BU chargebacks

With great speed, comes costVisibility and Self-service

Page 21: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 21

Cost transparency

Page 22: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 22

Cost transparency

Page 23: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 23https://www.bleepingcomputer.com/news/security/malicious-python-package-available-in-pypi-repo-for-a-year/

What can go wrong?

Page 24: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 24

Security

ML Model Vulnerabilities• ML models vulnerable for attacks

• Need security gates at every stage of development

Quality Gates• Source image validations

• Standardized framework for container images

• Image hardening, certification and signing

ML Models are as vulnerable as any other piece of software

Security Gates in our AWS setup• Secure VPC endpoints

• VPC flow logs - monitor traffic

• Security groups - restrict outbound access

• KMS encryption - secure data at rest

• Least privilege IAM roles - secure management of the service

Security check and Certification

Page 25: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 25

Compliance

Complex regulations

• The well known

– CCPA, GDPR

• More common

– PCI

• The arcane (very specific to tax compliance)

– INDOR, NIST

Platform enforces compliance

• Any customer data that is used in the ML model lifecycle is managed centrally

• Achieving compliance for all models reduces complexity in the process for applications using them

• Automating the necessary controlling actions for the platform ensures that future models remain compliant

Page 26: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 26

Models as Software- Accelerated deployment

through self service- Using a GitOps approach

allows for declarative development of models

- Production monitoring and alerting

- Security, compliance built-in

Learnings

Automated Workflows- Workflows for standard ML

operations help with complexity

- Hosting a workflow engine allows for easy extensibility

- Small threshold for customers starting out with the platform through templates with short configuration

Curated Feature Store

- Model quality ∝ Feature quality

- Curated feature store offers library of meaningful features

- Search, explore, share features across models enables acceleration of model development

Page 27: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 27

In Future...

● Custom ML Operator to orchestrate and manage ML Resources (Spark, SageMaker Training/Deploy, Feature Store)

● Declarative Management of MDLC such as re-training of models, monitoring of features etc.

● Managed Notebook Service

Page 28: Managing ML Models @ Scale - USENIX · 2020. 7. 18. · •Any customer data that is used in the ML model lifecycle is managed centrally •Achieving compliance for all models reduces

Intuit Confidential and Proprietary 28

Contact Us

[email protected][email protected]

Tobias Wenzel Srivathsan Canchi