how to get started in kaggle competition

31
How to get started in Merja Kajava September 1, 2015 Helsinki Data Analytics and Science Meetup

Upload: merja-kajava

Post on 16-Apr-2017

969 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: How to get started in Kaggle competition

How to get started in

Merja Kajava September 1, 2015

Helsinki Data Analytics and Science Meetup

Page 2: How to get started in Kaggle competition

is a competition platform for (aspiring) data

scientists

Page 3: How to get started in Kaggle competition
Page 4: How to get started in Kaggle competition

Why participate in Kaggle The data Competition steps Tips

Page 5: How to get started in Kaggle competition

Why participate in Kaggle competition?

Page 6: How to get started in Kaggle competition

1 Learn from the best.

Forums Scripts Solutions from prize winners

Page 7: How to get started in Kaggle competition

2 Work with cool datasets.

Flights in GE Flights Quest

Driver telematic analysis

Amazon employee access rights

Page 8: How to get started in Kaggle competition

+1 You can also win money

Page 9: How to get started in Kaggle competition

What kinds of competitions Kaggle has?

Page 10: How to get started in Kaggle competition

Public

In-class

Private

Page 11: How to get started in Kaggle competition

What languages can you use?

Any open-source language (sometimes also sponsor’s

proprietary languages)

Gnu Octave (no Matlab)

Page 12: How to get started in Kaggle competition

What is the data like?

Page 13: How to get started in Kaggle competition

Data comes from companies and non-profit organizations

Page 14: How to get started in Kaggle competition

Data sizes vary

Zip ~1 MB

Zip ~6 GB

Page 15: How to get started in Kaggle competition

Data comes in all shapes

Customer data Log files Timeseries HTML pages Images Documents

Page 16: How to get started in Kaggle competition

How does Kaggle competition work?

Page 17: How to get started in Kaggle competition

Competition flow

Duration typically 4 to 8 weeks Max. 5 entries per day

Page 18: How to get started in Kaggle competition

test

Model

train

submission.csvPredict

Build prediction model

Calculate CV to cross-validate

Page 19: How to get started in Kaggle competition

Evaluate submission

Typical evaluations

Area under the ROC curve Normalized Gini coefficient RMSLE …

Page 20: How to get started in Kaggle competition

Public leaderboard

Private leaderboard

~10-30% of test data

submission.csv

Submit entry

Page 21: How to get started in Kaggle competition

Choose two entries for final

Practical choice

Best entry in public leaderboard +

Best CV from local entries

Page 22: How to get started in Kaggle competition

Tips

Page 23: How to get started in Kaggle competition

Look at data. Visualize it.

Source https://www.kaggle.com/justfor/liberty-mutual-group-property-inspection-prediction/explore-data/notebook https://www.kaggle.com/odiseo1982/liberty-mutual-group-property-inspection-prediction/compare-variables-between-train-and-test/files

Page 24: How to get started in Kaggle competition

Focus on feature engineering

Feature selection Feature construction

Dates Locations Categories Segmentation Statistics

Page 25: How to get started in Kaggle competition

Build different models

3 target variables 4 cities = Build 12 models

Source https://www.kaggle.com/c/see-click-predict-fix/visualization/1390

Page 26: How to get started in Kaggle competition

Try different algorithms

Random forest Vowpal Wabbit GBM Xgboost

Page 27: How to get started in Kaggle competition

Build ensembles

Average of submissions Weighted average of submissions Ranked average of submissions

Stacked generalization Blending

Page 28: How to get started in Kaggle competition

Keep track of your submissions

Submission id

Page 29: How to get started in Kaggle competition

Next steps

Page 30: How to get started in Kaggle competition

Start competing

Create Kaggle account Choose competition Go for it!

Page 31: How to get started in Kaggle competition

Useful linksKaggle Blog

http://blog.kaggle.com

Kaggle Competitions: Where to begin

http://www.analyticsvidhya.com/blog/2015/06/start-journey-kaggle/

Kaggle Feature Engineering

http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/

Kaggle Ensembling Guide

http://mlwave.com/kaggle-ensembling-guide/