kaggle competitions, new friends, new skills and new opportunities
TRANSCRIPT
Kag g le Competitions, New Fr iends , New Sk i l l s and New Opportunities
Jo-fai (Joe) ChowData [email protected]
@matlabulus
2
About Me
• 2005 - 2015
• Water Engineero Consultant for Utilitieso EngD Research
• 2015 - Present
• Data Scientisto Virgin Mediao Domino Data Labo H2O.ai
3
About This Talk
• What happenedo Things I did since I
started participating in Kaggle competitions.
o New opportunities – results of new skills and friends.
4
First MOOC Experience
• One of the first Massive Open Online Courses.o Met some new friends.o Decided to collaborate
for fun.o “How about Kaggle?”o “What is Kaggle?”
5
First Kaggle Experience
• First time in my lifeo Supervised learning
• Random Forest• Support Vector Machine• Neural Networks
o Train, Validate & Predict.o “Is it black magic?”
6
First Kaggle Experience
• Problemso “Hey Joe, you are a nice
guy but we can’t work together.”
o “You love MATLAB so much. You even call yourself @matlabulous on twitter!”
o “We prefer R/Python.”
• Resultso I kept using MATLABo Lone wolfo No collaboration
7
Identifying Ski l ls Gap
• Obvious skills gap:o Open-source
Programming langaugeso Machine learning
techniqueso Collaboration
• Kind of relatedo Data visualisationo Handling large datasetso Explaining results
• That competition was a good wake up call.
8
From MATLAB to R/Python
MATLAB Python R
Neural Networks ✔️ ✔️ ✔️
Random Forest ✔️ ✔️ ✔️
SVM ✔️ ✔️ ✔️
Other Machine Learning Libraries
Toolboxes (commercial + open
source)
Scikit-learn and many more
CRAN, GitHub(A LOT!)
Data Visualisation I wasn’t good at it anyway …
Matplotlib(plus a lot more
since then)
ggplot2 (WOW!)(plus a lot more
since then)
9
What can people do with R?
James Cheshire, UCLLink
Paul Butler, FacebookLink
10
Fi l l ing the Ski l ls Gap
• More MOOCo Machine Learning
• Andrew Ng (Coursera)o Data Analysis
• Jeff Leek (Coursera)• R
o Intro to Programming• Dave Evans (Udacity)• Python
• Things I also picked up:o Linux (Ubuntu)o Git o Cloud computingo HTML / CSS
11
Learning from other Kagglers
• Continuous learningo Kaggle’s forums and blogs.o New tools and tricks.o Many things you cannot
learn from school.o I am standing on the
shoulders of many Kagglers.
12
Side Project 1 – Crime Data Viz
shiny::runGitHub("rApps", "woobe", subdir = "crimemap")
13
Other Side Projects
• Colour Palette
o github.com/ woobe/rPlotter
• World Cup 2014 Correct Score Predictiono ML vs. my friendso 10 out of 64 (15.6%)o Friends’ avg. = 4 (6.3%)o github.com/
woobe/wc2014
14
Open Up Myself
• Before Kaggle/MOOCo I was drawing a circle
around myself.o Fear of change.o Domain-specific problem
solving.
• After Kaggle/MOOCo Data-driven approach.o Not a subject matter
expert? No worries o Free to try new tools, to
learn and to create.
15
New Opportunities
• LondonRo First presentation
outside water industry / academia.
o Very positive feedback.o Led to other projects.o bit.ly
/londonr_crimemap
16
New Opportunities
• useR! 2014 (UCLA)o Presented a poster.o Met new friends.o Life-changing event.o github.com/
woobe/useR_2014
17
New Friends
Ramnath Vaidyanathanhtmlwidgets
DataRobot
Nick @ DominoDataLab
H2O.ai &John Chambers!
rOpenSciRStudio
Matt Dowledata.table (also at H2O.ai)
18
About Domino and H2O
19
More Opportunities
• First blog post about H2Oo Things to try after useR!
– Part1: Deep Learning with H2O
20
More Opportunities
• Blog post about Domino and H2Oo I did it for fun. I did not
have any expectation.o It helped attract
customers to both Domino and H2O.
21
Becoming a Data Scientist
22
London Kagglers Assemble
• London Kaggle Meetupo Sep 2015o I met my Kaggle buddy
Mickael Le Galo He is a product data
scientist at Tictrac
23
London Kagglers Assemble
• Rossmann Store Saleso We got stuck at top 10% for a long period.o Mickael had a breakthrough in feature
engineering with 48 hours to go.o I re-trained all models and completed
model stacking just a few hours before the deadline (thanks to Domino Data Lab).
o Top 2% finish (our best result so far).
24
Joining H2O.ai
25
Summary of Benefits
• Directo Identify data science
skills gap.o Learn quickly from the
community.o Expand your network.o Prepare yourself for real-
life data challenges.
• Indirecto You also learn non-ML
skills along the way.o You learn to build small
data products (e.g. graph, web app, REST API) and help others gain insight.
26
Strata Hadoop London Talks
• “Model stacking and parameters tuning.”o Wednesday 1 Juneo 7pmo LondonR at Stratao Free event
• “Generalised Low Rank Models.”o Thursday 2 Juneo 2pmo Strata Hadoop main
conference event
27
Big Thank You
• London Kaggle Meetup Organisers
• Mickael• Marios • You (Kagglers)
• Prof. Dragan Savic (supervisor at uni. of Exeter)
• Mango Solutions• Virgin Media• Domino Data Lab• H2O.ai
28
Any Questions?
• Contacto [email protected] @matlabulouso github.com/woobe
• Links (All Slides)o github.com/h2oai/h2o-
meetups
• H2O in Londono Coming soon!
• Meetups• Office
o We’re hiring!o www.h2o.ai/careers