introduction to machine learning with h2o - jo-fai (joe) chow, h2o

48
Introduction to Machine Learning with H2O Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulous Data Science Milan Politecnico di Milano 10 th October, 2016

Upload: data-science-milan

Post on 09-Jan-2017

48 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Introduct ion to Machine Learning with H2O

Jo-fai (Joe) Chow

Data Scientist

[email protected]

@matlabulous

Data Science Milan

Politecnico di Milano

10th October, 2016

Page 2: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

About Me: C iv i l Engineer → Data Sc ient ist

• 2005 - 2015

• Water Engineero Consultant for Utilities

o Industrial PhD• Water Engineering +

Machine Learning

• Discovered H2O in 2014!

• 2015 - Present

• Data Scientisto Virgin Media (UK)

o Domino Data Lab (US)

o H2O.ai (US)

2Why? Long story – see bit.ly/joe_h2o_talk2

Page 3: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Agenda

• First Talk (25 mins)o About H2O.aio Demo

• A Simple Classification Task• H2O’s Web Interface

o Why H2O?• Our Community• Our Customers

o What’s Next?• New H2O Features

• Second Talk (25 mins)o H2O for IoT

• Predictive Maintenance• Anomaly Detection• H2O’s R Interface

• Third Talk (25 mins)o Deep Watero Demo

• H2O + mxnet on GPU• H2O’s Python Interface

3

Page 4: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

About H2O.ai

Page 5: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

About H2O.ai

• H2O.ai, the Companyo Team: 80 (70 shown)o Founded in 2012o HQ: Mountain View, California

• H2O, the Platformo Open Source (Apache 2.0)o Algorithms written in Java

• Fast, distributed and scalable

o Multiple interfaces to suit different users• Web, R, Python, Java, Scala, REST/JSON

o Works with desktop/laptop, cloud, Spark and Hadoop

Joe

Page 6: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Scientif ic Advisory Counci l

6

Page 7: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Current Algorithm Overview

7

Joe’s Strata Hadoop

London Talk

bit.ly/joe_h2o_talk4

Today’s

Demos

Joe’s LondonR Talk

bit.ly/joe_h2o_talk3

Page 8: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O Overview

8

Page 9: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O’s Mission

9

Making Machine Learning Accessible to Everyone

Photo credit: Virgin Media

Page 10: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O Web Interface Demo

Page 11: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

A Typical Machine Learning Task

• Demo

o Dataset – MNIST• LeCun et al. (1999)

• Hand-written Digits

o Import & Explore Data

o Build & Evaluate Models

o Make Predictions

11Photo credit: http://www.opendeep.org/v0.0.5/docs/tutorial-classifying-handwritten-mnist-images

Page 12: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

MNIST Hand-Written Digits

• 784 Inputso 28 x 28 = 784 pixels

• 1 Outputo 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9o Classification

• Fileso Train (60k Records)o Test (10k)

• Linkso https://s3.amazonaws.com/h2o-public-test-

data/bigdata/laptop/mnist/train.csv.gzo https://s3.amazonaws.com/h2o-public-test-

data/bigdata/laptop/mnist/test.csv.gz

12

Photo credit: https://ml4a.github.io/ml4a/neural_networks/

Page 13: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O Flow (Web Interface) Demo

• Download and unzip jarfrom www.h2o.ai

• In terminal:o java -jar h2o.jar

• Web browser:o localhost:54321

13

Page 14: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O Live Demo

Page 15: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

More H2O Flow Examples

15

Page 16: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Other H2O Interfaces

• R

• Python

• docs.h2o.ai

16

Key Resources

Page 17: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

More Advanced Topics

• Advanced Features

o Hyperparameters Tuning

o Model Stacking

o Saving/Loading Models

o Export Plain Old Java Object (POJO)

• Key Resources

o docs.h2o.ai

• Joe’s Previous H2O Talks

o bit.ly/joe_h2o_talk3

o bit.ly/h2o_budapest_1

o bit.ly/h2o_paris_1

17

Page 18: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Why H2O?

Page 19: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

19

Page 20: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

S z i l a r d Pa f k a – C h i e f D a ta S c i e nt i s t a t E p o c h

• Sziland’s talks / blog posts about H2O:

o ML Benchmark

o Intro to ML with H2O

o H2O Scoring

o Tweets

20

Page 21: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Szi lard Pafka – Why H2O?

21

• Szilard’s Summary Slide

Page 22: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O for Kaggle

22

Page 23: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O Community Support

23

Google forum – h2osteam community.h2o.ai

Please try

Page 24: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

#AroundTheWorldWithH2Oai

24

Strata HadoopLondon

PyDataAmsterdam

useR! 2016Stanford

satRdaysBudapest

London KaggleMeetup

Chelsea FC

Paris MLMeetup

Big Data London

Page 25: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

#AroundTheWorldWithH2Oai

25

Data Science Milan

Thank you

Page 26: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O Usage in Italy

26

www.h2o.ai/community

Page 27: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

27

Page 28: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

28

www.h2o.ai/customers

Page 29: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O in Action

29

Thank you

Data Science Milan – May 19, 2016Bringing Deep Learning into production - Paolo Platter, AgileLab

http://www.slideshare.net/ds_mi/bringing-deep-learning-into-production-paolo-platter-agilelab

Page 30: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

What’s Next?

Page 31: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O is Evolving

• H2O Open Tour NYC YouTube Playlisto Advanced data munging

o Visual ML

o Deep Water (3rd talk)

o Sparkling Water• PySparkling & RSparkling

o Steam

31

Next time?

Page 32: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

H2O’s Mission

32

Making Machine Learning Accessible to Everyone

Photo credit: Virgin Media

Page 33: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

End of First Talk – Thanks!

33

• Data Science Milan

• Gianmario Spacagna

• Politecnico di Milano

• Resourceso bit.ly/h2o_milan_1

o www.h2o.ai

o docs.h2o.ai

• Contacto [email protected]

o @matlabulous

o github.com/woobe

Page 34: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

Extra Slides(H2O Flow Demo Screenshots – just in case)

Page 35: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

35

Upload the file without decompressing it first

Page 36: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

36

Change the data type of “label” from “Numeric” to “Enum” (categorical)

Page 37: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

37

Note: Size in Memory

Click on individual labels to explore data

Page 38: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

38

Page 39: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

39

Split the full dataset into training (80% = 48k records) and

validation (20% = 12k) – a common machine learning

practice

Page 40: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

40

Click and select parameters

for model training

Page 41: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

41

Users have full access to all available parameters

– fine-tune model training process

For example, I am using

rectifier with dropout as the activation

to train the model for 20 epochs

with classes balancing

Leaving other settings as default

Page 42: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

42

Training the model with estimated remaining time

– users can stop the process early if they want to

Page 43: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

43

Performance (logloss) on validation set

Performance (logloss) on training set

Page 44: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

44

Confusion Matrix on Training Set (48k Records)

About 2% Error

Confusion Matrix on Validation Set (12k Records)

About 4% Error

Page 45: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

45

Using the model for prediction on test set

Page 46: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

46

Confusion Matrix on Test Set (10k Records)

About 4% Error (similar to validation)

Page 47: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O

47

Full prediction outputs including individual

probabilities and predicted label

Page 48: Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O