uvrgrp ml
DESCRIPTION
TRANSCRIPT
![Page 1: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/1.jpg)
David Callender• Finished in top 2% (18th out of >1300) on 3 year
$3 million Machine Learning competition.
• Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College
• Studied computational protein design at the University of Washington
• Studied Mathematical foundations of Quantum Mechanics at Macalester College
![Page 2: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/2.jpg)
Machine Learning in Rcirca 2013
David Callender
![Page 3: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/3.jpg)
a.k.a. Using R on Kaggle
who will end up in the hospital
} drug effectiveness
Computer Security:Determining employee
access needs
What will the salary be for a given job advertisement
![Page 4: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/4.jpg)
Not Just Kaggle
•Movie recomendations•Popular productions
•Product recomendations•Good business oportunities
•The Entire Internet•Probably a lot more too
![Page 5: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/5.jpg)
Talk Outline
• Motivation
• Concepts
• Algorithms
• Decision Trees and Forests
• Neural networks
• Kaggle
• Interactive session with R packages
• randomForest
• gbm
• neuralnet
![Page 6: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/6.jpg)
Supervised Learning
Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q? 3 male 27 0 0 8.6625 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S
Train model with examples where
you know value of “survived”
Use model to predict value of
“survived”
Predicting survival for passengers of Titanic
binary
numeric
catagorical
![Page 7: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/7.jpg)
Overfitting
http://en.wikipedia.org/wiki/File:Overfitting_on_Training_Set_Data.pdf Tomaso Poggio
![Page 8: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/8.jpg)
Decision Trees
http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R
Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q
? 3 male 27 0 0 8.7 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S
![Page 9: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/9.jpg)
Random Forest (RF)Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C
Random Sub-SpacesBagging
{
{Voting/Avg
Prediction
Training
![Page 10: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/10.jpg)
Adaboost &Gradient Boosting
• Initialize a set of weights, One for each training example, with equal value
• Train a tree with weighted training examples
• Add tree to set of trees
• Make predictions with set of trees
• Adjust weights so that the training examples you got wrong have more weight
• repeat
![Page 11: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/11.jpg)
Logistic Regressiona.k.a The Perceptron
ActivationFunction
Weighted sum
![Page 12: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/12.jpg)
Multilayer Feed-forwardNeural Network
![Page 13: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/13.jpg)
R’s Popularity
Tools mentioned in Kaggle user profiles
From blog entry by Ben Hammerhttp://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/
![Page 14: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/14.jpg)
Summary of Recent Competition Winners
Position Algorithm Other Algs. Tools
AdzunaSalary
1stAdzunaSalary
2ndAdzunaSalary
3rd
Merck
1st
Merck 2ndMerck
3rd
NN* - Python GPU
NN - C++
NN NB, SVM, LR Python
NN* - Python GPU
GBM & SVM RF, PCA,KNN, SVM R & Python
RF & SVM GBM, NN R
![Page 15: Uvrgrp ml](https://reader034.vdocument.in/reader034/viewer/2022051819/54c64e5a4a7959ea028b456b/html5/thumbnails/15.jpg)
Learning More
• Pedro Domingos at University of Washington
• www.coursera.org/course/machlearning
• www.coursera.org/uw
• A Few Useful Things to Know about Machine Learning. Communications of the ACM
• homes.cs.washington.edu/~pedrod
• blog.kaggle.com
• ufldl.stanford.edu/wiki/