predictive modeling competitions - predictive analytics … · anthony goldbloom ceo, kaggle e-mail...
TRANSCRIPT
Anthony Goldbloom CEO, Kaggle
e-mail [email protected] twitter @antgoldbloom
Predictive modeling competitions
Photo by mikebaird, www.flickr.com/photos/mikebaird
making data science a sport
Global competitions
1½ weeks 70.8%
Competition closes 77%
State of the art 70%
Predicting HIV viral load
HIV Research Stock Price Prediction Chess Ratings
Dr. Derek
Gatherer
UK
Diverse experts solving diverse problems
John Blatz
Baltimore
Edmund & Adrian
London & USA Jason Trigg
Pennsylvania
Chih-Li Sung & Roy Tseng
Penghu & Taipei Jure Zbontar
Ljubljana
Thomas Mahony
Canberra
Emir Delic
Australia
Glen Maher
Canberra
Chris Raimondi
Batimore
Claudio Perlich
USA
Gzegorz Swiszcz
Gera
Edmund & Adrian
London & USA
Rajstennaj
Barrabas
USA
Jason Trigg
Pennsylvania
Felipe Maia
Uppsala University
Lee Baker
Las Cruces,
NM
Cole Harris
Texas
Nan Zhou
Pittsburgh
Uri Blass
Tel-Aviv
Giuseppe Ragusa
Rome
Robert
Warsaw
Jeremy Howard
Australia
Ivan
Russian Federation
Chris DuBois
Portland
Philipp Emanuel
Widmann
Heidelberg, DE
Dr. Christopher
Hefele, New York
Travel Time Prediction Grant Application Forecasting Sales Forecasting
1. Motivation
2. Why host a competition?
3. Why compete?
4. How it works
5. Heritage Health Prize
6. Questions
“I keep saying the sexy job in the next ten years will be statisticians.” Hal Varian
Google Chief Economist
2009
Mismatch between those with data and
those with the skills to analyse it
Crowdsourcing
Additional slides Not MIT, not SAS … UoL?
1. Motivation
2. Why host a competition?
3. Why compete?
4. How it works
5. Heritage Health Prize
6. Questions
Forecast Error
(MASE)
Existing model
Tourism Forecasting Competition
Aug 9 2 weeks
later
1 month
later
Competition
End
Existing model (ELO)
Chess Ratings Competition
Aug 4 1 month
later
2 months
later
Today
Error Rate
(RMSE)
Our User Base
• neural networks
• logistic regression
• support vector machine
• decision trees
• ensemble methods
• adaBoost
• Bayesian networks
• genetic algorithms
• random forest
• Monte Carlo methods
• principal component analysis
• Kalman filter
• evolutionary fuzzy modeling
Users apply different techniques
Benchmarking
Successful
grant applications
~25%
NASA tried, now it’s our turn
Untouched problems
Successful
grant applications
Outcomes of a competition to predict
the success of grant applications:
- Better identify likely successes to
avoid wasting resources on
hopeless applications
- Identify and communicate the
characteristics of a successful
application to future applicants
~25%
18
Who to hire?
Branding: “we do analytics”
1. Motivation
2. Why host a competition?
3. Why compete?
4. How it works
5. Heritage Health Prize
6. Questions
Clean, Real world data Professional Reputation & Experience
Interactions with experts in related fields Prizes
1
4
2
3
Why Participants Compete
User base
User base
1. Motivation
2. Why host a competition?
3. Why compete?
4. How it works
5. Heritage Health Prize
6. Questions
1 2 3
Upload Submit Evaluate &
Exchange
Use the wizard to post a competition
Participants make their entries
Competitions are judged based on predictive accuracy
Competition Mechanics
Competitions are judged on objective criteria
1. Motivation
2. Why host a competition?
3. Why compete?
4. How it works
5. Heritage Health Prize
6. Questions
NetFlix Prize
2006 – 2009
$1 million prize
50,000 registrations
2011
$3 million prize
Projected 100,000 registrations
1. Motivation
2. Why host a competition?
3. Why compete?
4. How it works
5. Heritage Health Prize
6. Questions
Photo by gidzy, www.flickr.com/photos/gidzy
What could the world’s best analysts find in your data?
e-mail [email protected]
phone +61438400053