kaggle competitions, new friends, new skills and new opportunities

28
Kaggle Competitions, New Friends, New Skills and New Opportunities Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulus

Upload: jo-fai-chow

Post on 06-Jan-2017

349 views

Category:

Presentations & Public Speaking


2 download

TRANSCRIPT

Page 1: Kaggle competitions, new friends, new skills and new opportunities

Kag g le Competitions, New Fr iends , New Sk i l l s and New Opportunities

Jo-fai (Joe) ChowData [email protected]

@matlabulus

Page 2: Kaggle competitions, new friends, new skills and new opportunities

2

About Me

• 2005 - 2015

• Water Engineero Consultant for Utilitieso EngD Research

• 2015 - Present

• Data Scientisto Virgin Mediao Domino Data Labo H2O.ai

Page 3: Kaggle competitions, new friends, new skills and new opportunities

3

About This Talk

• What happenedo Things I did since I

started participating in Kaggle competitions.

o New opportunities – results of new skills and friends.

Page 4: Kaggle competitions, new friends, new skills and new opportunities

4

First MOOC Experience

• One of the first Massive Open Online Courses.o Met some new friends.o Decided to collaborate

for fun.o “How about Kaggle?”o “What is Kaggle?”

Page 5: Kaggle competitions, new friends, new skills and new opportunities

5

First Kaggle Experience

• First time in my lifeo Supervised learning

• Random Forest• Support Vector Machine• Neural Networks

o Train, Validate & Predict.o “Is it black magic?”

Page 6: Kaggle competitions, new friends, new skills and new opportunities

6

First Kaggle Experience

• Problemso “Hey Joe, you are a nice

guy but we can’t work together.”

o “You love MATLAB so much. You even call yourself @matlabulous on twitter!”

o “We prefer R/Python.”

• Resultso I kept using MATLABo Lone wolfo No collaboration

Page 7: Kaggle competitions, new friends, new skills and new opportunities

7

Identifying Ski l ls Gap

• Obvious skills gap:o Open-source

Programming langaugeso Machine learning

techniqueso Collaboration

• Kind of relatedo Data visualisationo Handling large datasetso Explaining results

• That competition was a good wake up call.

Page 8: Kaggle competitions, new friends, new skills and new opportunities

8

From MATLAB to R/Python

MATLAB Python R

Neural Networks ✔️ ✔️ ✔️

Random Forest ✔️ ✔️ ✔️

SVM ✔️ ✔️ ✔️

Other Machine Learning Libraries

Toolboxes (commercial + open

source)

Scikit-learn and many more

CRAN, GitHub(A LOT!)

Data Visualisation I wasn’t good at it anyway …

Matplotlib(plus a lot more

since then)

ggplot2 (WOW!)(plus a lot more

since then)

Page 10: Kaggle competitions, new friends, new skills and new opportunities

10

Fi l l ing the Ski l ls Gap

• More MOOCo Machine Learning

• Andrew Ng (Coursera)o Data Analysis

• Jeff Leek (Coursera)• R

o Intro to Programming• Dave Evans (Udacity)• Python

• Things I also picked up:o Linux (Ubuntu)o Git o Cloud computingo HTML / CSS

Page 11: Kaggle competitions, new friends, new skills and new opportunities

11

Learning from other Kagglers

• Continuous learningo Kaggle’s forums and blogs.o New tools and tricks.o Many things you cannot

learn from school.o I am standing on the

shoulders of many Kagglers.

Page 12: Kaggle competitions, new friends, new skills and new opportunities

12

Side Project 1 – Crime Data Viz

shiny::runGitHub("rApps", "woobe", subdir = "crimemap")

Page 13: Kaggle competitions, new friends, new skills and new opportunities

13

Other Side Projects

• Colour Palette

o github.com/ woobe/rPlotter

• World Cup 2014 Correct Score Predictiono ML vs. my friendso 10 out of 64 (15.6%)o Friends’ avg. = 4 (6.3%)o github.com/

woobe/wc2014

Page 14: Kaggle competitions, new friends, new skills and new opportunities

14

Open Up Myself

• Before Kaggle/MOOCo I was drawing a circle

around myself.o Fear of change.o Domain-specific problem

solving.

• After Kaggle/MOOCo Data-driven approach.o Not a subject matter

expert? No worries o Free to try new tools, to

learn and to create.

Page 15: Kaggle competitions, new friends, new skills and new opportunities

15

New Opportunities

• LondonRo First presentation

outside water industry / academia.

o Very positive feedback.o Led to other projects.o bit.ly

/londonr_crimemap

Page 16: Kaggle competitions, new friends, new skills and new opportunities

16

New Opportunities

• useR! 2014 (UCLA)o Presented a poster.o Met new friends.o Life-changing event.o github.com/

woobe/useR_2014

Page 17: Kaggle competitions, new friends, new skills and new opportunities

17

New Friends

Ramnath Vaidyanathanhtmlwidgets

DataRobot

Nick @ DominoDataLab

H2O.ai &John Chambers!

rOpenSciRStudio

Matt Dowledata.table (also at H2O.ai)

Page 18: Kaggle competitions, new friends, new skills and new opportunities

18

About Domino and H2O

Page 19: Kaggle competitions, new friends, new skills and new opportunities

19

More Opportunities

• First blog post about H2Oo Things to try after useR!

– Part1: Deep Learning with H2O

Page 20: Kaggle competitions, new friends, new skills and new opportunities

20

More Opportunities

• Blog post about Domino and H2Oo I did it for fun. I did not

have any expectation.o It helped attract

customers to both Domino and H2O.

Page 21: Kaggle competitions, new friends, new skills and new opportunities

21

Becoming a Data Scientist

Page 22: Kaggle competitions, new friends, new skills and new opportunities

22

London Kagglers Assemble

• London Kaggle Meetupo Sep 2015o I met my Kaggle buddy

Mickael Le Galo He is a product data

scientist at Tictrac

Page 23: Kaggle competitions, new friends, new skills and new opportunities

23

London Kagglers Assemble

• Rossmann Store Saleso We got stuck at top 10% for a long period.o Mickael had a breakthrough in feature

engineering with 48 hours to go.o I re-trained all models and completed

model stacking just a few hours before the deadline (thanks to Domino Data Lab).

o Top 2% finish (our best result so far).

Page 24: Kaggle competitions, new friends, new skills and new opportunities

24

Joining H2O.ai

Page 25: Kaggle competitions, new friends, new skills and new opportunities

25

Summary of Benefits

• Directo Identify data science

skills gap.o Learn quickly from the

community.o Expand your network.o Prepare yourself for real-

life data challenges.

• Indirecto You also learn non-ML

skills along the way.o You learn to build small

data products (e.g. graph, web app, REST API) and help others gain insight.

Page 26: Kaggle competitions, new friends, new skills and new opportunities

26

Strata Hadoop London Talks

• “Model stacking and parameters tuning.”o Wednesday 1 Juneo 7pmo LondonR at Stratao Free event

• “Generalised Low Rank Models.”o Thursday 2 Juneo 2pmo Strata Hadoop main

conference event

Page 27: Kaggle competitions, new friends, new skills and new opportunities

27

Big Thank You

• London Kaggle Meetup Organisers

• Mickael• Marios • You (Kagglers)

• Prof. Dragan Savic (supervisor at uni. of Exeter)

• Mango Solutions• Virgin Media• Domino Data Lab• H2O.ai

Page 28: Kaggle competitions, new friends, new skills and new opportunities

28

Any Questions?

• Contacto [email protected] @matlabulouso github.com/woobe

• Links (All Slides)o github.com/h2oai/h2o-

meetups

• H2O in Londono Coming soon!

• Meetups• Office

o We’re hiring!o www.h2o.ai/careers