machine learning success: the key to easier model management

44
© 2017 MapR Technologies 1 Machine Learning Success: The Key to Easier Model Management

Upload: mapr-data-technologies

Post on 22-Jan-2018

773 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 1

Machine Learning Success:

The Key to Easier Model Management

Page 2: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 2

Contact Information

Ellen Friedman, PhD

Principal Technologist, MapR Technologies

Committer Apache Drill & Apache Mahout projects

O’Reilly author

Email [email protected] [email protected]

Twitter @Ellen_Friedman

Page 3: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 3

Machine Learning Everywhere

Image courtesy Mtell used with permission.Images © Ellen Friedman.

Page 4: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 4

Traditional View

Page 5: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 5

Traditional View: This isn’t the whole story

Page 6: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 6

90% of the effort in successful

machine learning isn’t the

algorithm or the model…

It’s the logistics

Page 7: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 7

Why?

• Just getting the training data is hard

– Which data? How to make it accessible? Multiple sources!

– New kinds of observations force restarts

– Requires a ton of domain knowledge

• The myth of the unitary model

– You can’t train just one

– You will have dozens of models, likely hundreds or more

– Handoff to new versions is tricky

Page 8: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 8

What Machine Learning Tool is Best?

• Most successful groups keep several “favorite” machine

learning tools at hand

– No single tool is best in every situation

• The most important tool is a platform that supports logistics well

– Don’t have to do everything at the application level

– Lots of what matters can be handled at the platform level

• A good design can make a big difference

Page 9: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 9

Rendezvous Architecture

Input Scores

RendezvousModel 1

Model 2

Model 3

request

response

Results

Page 10: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 10

Rendezvous to the Rescue: Better ML Logistics

• Stream-1st architecture is a powerful approach with surprisingly

widespread advantages

– Innovative technologies emerging to for streaming data

• Microservices approach provides flexibility

– Streaming supports microservices (if done right)

• Containers remove surprises

– Predictable environment for running models

Page 11: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 11

Rendezvous: Mainly for Decisioning Type Systems

• Decisioning style machine learning

– Looking for a “right answer”

– Simpler than interactive machine learning (such as in self-driving car)

• Examples include:

– Fraud detection

– Predictive analytics / market prediction

– Churn prediction (as in telecommunications)

– Yield optimization

– Deep learning in form of speech or image recognition, in some cases

Page 12: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 12

Why Stream?

Munich surfing wave Image © 2017 Ellen Friedman

Page 13: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 13

Streaming data has value beyond

real-time insights

Page 14: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 14

Heart of Stream-1st Architecture: Message Transport

Real-time analytics

EMRPatient Facilities

management

Insurance audit

A

B

Medical tests

C

Medical test results

The right messaging tool

supports multiple classes of use

cases (A, B, C in figure)

Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly

book Streaming Architecture used with permission

Page 15: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 15

Stream Transport that Decouples Producers & Consumers

P

P

P

C

C

C

Transport Processing

Kafka /

MapR Streams

Page 16: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 16

MapR Streams in the MapR Converged Data Platform

Enterprise StorageMapR-FS MapR-DB MapR Streams

Database Event Streaming

Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy

• Helps build a global data fabric

• Multiple types of storage engineered into one technology

• Under the same security & administration

Page 17: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 17

With MapR, Geo-Distributed Data Appears Local

streamData

sourceConsumer

Page 18: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 18

With MapR, Geo-Distributed Data Appears Local

stream

streamData

source

Consumer

Page 19: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 19

With MapR, Geo-distributed Data Appears Local

stream

streamData

source

ConsumerGlobal Data Center

Regional Data Center

Page 20: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 20

Stream transport supports microservices

Page 21: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 21

Stream-1st Architecture: Basis for MicroServices

Stream instead of database as the shared “truth”

POS 1..n

Fraud detector

Last card use

Updater

Card analytics

Other

card activity

Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission

Page 22: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 22

Features of Good Streaming

• It is Persistent– Messages stick around for other consumers

– Consumers don’t affect producers

– Consumer doesn’t have to be online when message arrives

• It is Performant– You don’t have to worry if a stream can keep up

• It is Pervasive– It is there whenever you need it, no need to deploy anything

– How much work is it to create a new file? Why harder for a stream?

Page 23: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 23

Raw data is gold!

Page 24: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 24

Raw Data & Training Data Are Key to Success

Model 1

Model 2

Model 3

request

Raw

Add external

dataInput

Database

The world

Raw data may contain features you’ll want in future

Page 25: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 25

Quality & Reproducibility of Input Data is Important!

• Recording raw-ish data is really a big deal

– Data as seen by a model is worth gold

– Data reconstructed later often has time-machine leaks

– Databases were made for updates, streams are safer

• Raw data is useful for non-ML cases as well (think flexibility)

• Decoy model records training data as seen by models under

development & evaluation

Page 26: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 26

Decoy Model in the Rendezvous Architecture

InputScores

Decoy

Model 2

Model 3

Archive

• Looks like a server, but it just archives inputs

• Safe in a good streaming environment, less safe without good isolation

Page 27: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 27

Scores

ArchiveDecoy

m1

m2

m3

Features / profiles

Input Raw

Page 28: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 28

ResultsRendezvousScores

ArchiveDecoy

m1

m2

m3

Features / profiles

Input Raw

Page 29: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 29

MetricsMetrics

ResultsRendezvousScores

ArchiveDecoy

m1

m2

m3

Features / profiles

Input Raw

Page 30: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 30

Models in production live in the

real world:

Conditions may (will) change

Page 31: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 31

How to Do Better – Deployment in Production

• Keep models running “in the wings” – Don’t wait until conditions change to start building the next model

– Keep new models ready

• Hot hand-off– With rendezvous: just stop ignoring the model of interest

• Deploy a canary server– Keep an old model active as a reference

– If it was 90% correct, difference with any better model should be small

– Score distribution should be roughly constant

Page 32: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 32

Advantages of Rendezvous Architecture

Real

model∆

Result

Canary

Decoy

Archive

Input

Page 33: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 33

DataOps: Brings Flexibility & Focus

• You don’t have to be a data scientist to contribute to machine learning

• Software engineer/ developer plays a role: but you need good data skills

Page 34: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 34

Example: Tensor Chicken

Label

training

data

Run the

model

Deploy

model

Gather

training

data

Labeled

image files

Train

model

Update

model

Deep learning project by

software engineer Ian Downard

(see blog + @tensorchicken)

Page 35: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 35

Rendezvous Architecture

Input Scores

RendezvousModel 1

Model 2

Model 3

request

response

Results

Page 36: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 36

How to Do Better

• Data + the right question + domain knowledge matter!

• Prioritize – put serious effort into infrastructure

– DataOps requires more than just data science

• Persist – use streams to keep data around

• Measure – everything, and record it

• Meta-analyze – understand and see what is happening

• Containerize – make deployment repeatable, easy

• Oh… don’t forget to do some machine learning, too

Page 37: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 37

Sign Up for ML Logistics Workshop Series

Three deep-dive machine learning workshops

by Ted Dunning, Chief Applications Architect at MapR:

1. A New Architecture for Machine Learning Logistics: How to use streaming, containers & a microservices design

2. Machine Learning Evaluation: How to do model-to-model comparisons

3. Machine Learning in the Enterprise: How to do model management in production

http://bit.ly/mapr-machine-learning-logistics-series

Page 38: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 38

Additional Resources

O’Reilly report by Ted Dunning & Ellen Friedman © March 2017

Read free courtesy of MapR:

https://mapr.com/geo-distribution-big-data-and-analytics/

O’Reilly book by Ted Dunning & Ellen Friedman

© March 2016

Read free courtesy of MapR:

https://mapr.com/streaming-architecture-using-

apache-kafka-mapr-streams/

Page 39: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 39

Additional Resources

O’Reilly book by Ted Dunning & Ellen Friedman

© June 2014

Read free courtesy of MapR:

https://mapr.com/practical-machine-learning-

new-look-anomaly-detection/

O’Reilly book by Ellen Friedman & Ted Dunning

© February 2014

Read free courtesy of MapR:

https://mapr.com/practical-machine-learning/

Page 40: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 40

Additional Resources

by Ellen Friedman 8 Aug 2017 on MapR blog:

https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/

by Ted Dunning 13 Sept 2017 in

InfoWorld:

https://www.infoworld.com/article/3223

688/machine-learning/machine-

learning-skills-for-software-

engineers.html

Page 41: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 41

New book:

O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017

Pre-register for a free pdf copy of book when it becomes

available 25th September, courtesy of MapR:

http://info.mapr.com/2017_Content_Machine-Learning-

Logistics_eBook_Prereg_RegistrationPage.html

Page 42: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 42

Please support women in tech – help build

girls’ dreams of what they can accomplish

© Ellen Friedman 2015#womenintech #datawomen

Page 43: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 43

Thank you !

Page 44: Machine Learning Success: The Key to Easier Model Management

© 2017 MapR Technologies 44

Q&A

@mapr

Maprtechnologies

[email protected]

ENGAGE WITH US

@ Ellen_Friedman