hot topics in machine learning (or how to win a kaggle competition)
TRANSCRIPT
Hot Topics in Machine Learning (or how to win aKaggle competition)
Benedikt Wilbertz1
1Trendiction S.A., Luxembourg
June 17, 2016
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 1 / 41
Introduction
Setting for Supervised Learning
Y: prediction space (labels for classification, RK for Regression)
Y : Y-valued random variable to predict
X : typicallly X = Rd, space of predictors (aka features)
X: X -valued random variable modeling the distribution of the predictors
N : N = (Y × X )N , space containing all training sample of size N
N: random variable representing all training samples of size N .Independent of X and Y .
All random variables X,Y,N are defined on a joint probabiliy space (Ω,S,P).Let fN be a model trained on a realization of the random variable N (thismeans a random sample from Y × X of size N)The optimal model in the least square sense is then given by
Supervised Learning Problem
E (Y − fN(X))2 → min
fN∈models(N)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 2 / 41
Introduction
Error decomposition
In order to assess the performance of a prediction fN(X) which was trained bya random sample of N observation from Y × X , we fix a random predictorx := X(ω) and derive for the mean squared error:
MSE(x) := E(
[Y − fN(X)]2 |X=x
)= . . .
= E(
[Y − E(Y |X=x)]2 |X=x
)+ [E(Y |X=x)− E(fN(X)|X=x)]
2
+E(
[fN(X)− E(fN(X)|X=x)]2 |X=x
)σ2(Y |X=x) irreducible error
(E(Y |X=x)− E(fN(X)|X=x))2
model bias
σ2(fN(X)|X=x) model variance
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 3 / 41
Deep Learning
Neural Networks and Deep Learning
Convolutional Networks
Very popular in the late 80s and 90s:
– 7 layers – 60k parameters
Training
Stochastic Gradient Descent
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 4 / 41
Deep Learning
Neural Networks and Deep Learning
Renaissance of neural networks in 2012
Krizhevsky et al ’12: ImageNet Classification with Deep ConvolutionalNeuralNetworks
trained on 1.2 million labeled images
highly optimized GPU code
achieved absolute new state-of-the-art results for classification on 1000 objects.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 5 / 41
Deep Learning
Neural Networks and Deep Learning
GoogLeNet (Szegedy et al. ’14)
–27 layers deep – 1.5 GFLOP/forward pass – 7M parameters
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 6 / 41
Deep Learning
Neural Networks and Deep Learning
Pushing deep learning to the limits
He et al ’16: Residual networks with 1k layers and 10M parameters
But what makes the difference to the 90s (apart from sample/parameter size)?
Data augmentation and bootstrapping
Drop out layers
ReLU activation
fast GPUs
Improvements on the training
Regularization
Nesterov/AdaGrad/AdaDelta/Adam variants for SGD
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 7 / 41
Deep Learning
Neural Networks and Deep Learning
Pushing deep learning to the limits
He et al ’16: Residual networks with 1k layers and 10M parameters
But what makes the difference to the 90s (apart from sample/parameter size)?
Data augmentation and bootstrapping
Drop out layers
ReLU activation
fast GPUs
Improvements on the training
Regularization
Nesterov/AdaGrad/AdaDelta/Adam variants for SGD
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 7 / 41
Deep Learning
Neural Networks and Deep Learning
Pushing deep learning to the limits
He et al ’16: Residual networks with 1k layers and 10M parameters
But what makes the difference to the 90s (apart from sample/parameter size)?
Data augmentation and bootstrapping
Drop out layers
ReLU activation
fast GPUs
Improvements on the training
Regularization
Nesterov/AdaGrad/AdaDelta/Adam variants for SGD
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 7 / 41
Deep Learning
Neural Networks and Deep Learning
Software packages
Hardware
NVIDIA GTX 1080: 8.8 TFLOPS for 800 EUR
Cuda 8.0 and Pascal Architecture: Half-precision arithmetic (FP16) willdouble the computing power
Google’s TPUs (Tensor Processing Units): custom ASIC of certainoperation in Tensorflow
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 8 / 41
Deep Learning
Neural Networks and Deep Learning
Software packages
Hardware
NVIDIA GTX 1080: 8.8 TFLOPS for 800 EUR
Cuda 8.0 and Pascal Architecture: Half-precision arithmetic (FP16) willdouble the computing power
Google’s TPUs (Tensor Processing Units): custom ASIC of certainoperation in Tensorflow
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 8 / 41
Deep Learning
Neural Networks and Deep Learning
Applications
- Processing of 40M images/day (600img/s) from social media for logo/brandrecognition (only 0.5% contain a logo)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 9 / 41
Deep Learning
Neural Networks and Deep Learning
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 10 / 41
Gradient Boosting
Trees and Boosting
History
Random Forest (Breiman ’97)
Gradient Tree Boosting (Friedman ’99)
Gradient Tree Boosting + Regularization (XGBoost)
Basic idea of tree ensembles
Model:
y =
K∑k=1
fk(x), fk ∈ F
Tree: fk(x) = wq(x), w ∈ RT , q : Rd → 1, 2, . . . , T
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 11 / 41
Gradient Boosting
Trees and Boosting
History
Random Forest (Breiman ’97)
Gradient Tree Boosting (Friedman ’99)
Gradient Tree Boosting + Regularization (XGBoost)
Basic idea of tree ensembles
Model:
y =
K∑k=1
fk(x), fk ∈ F
Tree: fk(x) = wq(x), w ∈ RT , q : Rd → 1, 2, . . . , T
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 11 / 41
Gradient Boosting
Trees and Boosting
History
Random Forest (Breiman ’97)
Gradient Tree Boosting (Friedman ’99)
Gradient Tree Boosting + Regularization (XGBoost)
Basic idea of tree ensembles
Model:
y =
K∑k=1
fk(x), fk ∈ F
Tree: fk(x) = wq(x), w ∈ RT , q : Rd → 1, 2, . . . , T
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 11 / 41
Gradient Boosting
Trees and Boosting
History
Random Forest (Breiman ’97)
Gradient Tree Boosting (Friedman ’99)
Gradient Tree Boosting + Regularization (XGBoost)
Basic idea of tree ensembles
Model:
y =
K∑k=1
fk(x), fk ∈ F
Tree: fk(x) = wq(x), w ∈ RT , q : Rd → 1, 2, . . . , T
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 11 / 41
Gradient Boosting
Trees and Boosting
Optimizing the tree structure
Objective:
min←n∑
i=1
l(yi, yi) +
K∑k=1
Ω(fk)
with regularization Ω(fk) = γT + 12λ∑T
j=1 w2j and general loss function l.
Problem: Tree construction is a batch process, so we cannot apply some onlinemethod like SGD
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 12 / 41
Gradient Boosting
Trees and Boosting
Optimizing the tree structure
Objective:
min←n∑
i=1
l(yi, yi) +
K∑k=1
Ω(fk)
with regularization Ω(fk) = γT + 12λ∑T
j=1 w2j and general loss function l.
Problem: Tree construction is a batch process, so we cannot apply some onlinemethod like SGD
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 12 / 41
Gradient Boosting
Trees and Boosting
Additive Training (Boosting)
Start from constant prediction, add a new function each time
y(0)i = 0
y(1)i = f1(xi) = y(0) + f1(xi)
y(2)i = f1(xi) + f2(xi) = y(1) + f2(xi)
. . .
y(t)i =
t∑k=1
fk(xi) = y(t−1) + ft(xi)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 13 / 41
Gradient Boosting
Trees and Boosting
The prediction at round t is y(t)i = y
(t−1)i + ft(xi)
Using 2nd order taylor expansion on l, this can be applied on any smooth lossfunction like Euclidean loss (Regression), Softmax loss (Classification), NDCG(Ranking problems), etc.
Using gradient gi and hessian hi of l, an optimal tree is grown (stopping whenregularized gini gain becomes negative), which optimizes in interation t
min ←n∑
i=1
l(yi, y(t−1)i + ft(xi)) + Ω(ft(xi))
≈n∑
i=1
l(yi, y(t−1)i ) + gift(xi) +
1
2hif
2t (xi) + Ω(ft) + const
(Explicit solution for leaf weights w. Splits are constructed in a greedy way)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 14 / 41
Gradient Boosting
Trees and Boosting
Software packages
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 15 / 41
Ensembles
Model ensembles
Stacking
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 16 / 41
Kaggle
Kaggle
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 17 / 41
Kaggle The Competition
The Task
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 18 / 41
Kaggle The Competition
Task / Rules
Timeframe: Oct 2015 – Feb 2016
Multiple-Choice Question with 4 answers (NIR 25%)
Trainingset: 2500 questions
Validationset: 8192 questions
Public Leaderboard based on 12.5%
Mandatory model submission one week before end
Final testset (12000 new questions) released one week before end
Private Leaderboard only based on this new questions
external data explicitely allowed
800 teams participated in stage I
170 in stage II
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 19 / 41
Kaggle The Competition
Task / Rules
Timeframe: Oct 2015 – Feb 2016
Multiple-Choice Question with 4 answers (NIR 25%)
Trainingset: 2500 questions
Validationset: 8192 questions
Public Leaderboard based on 12.5%
Mandatory model submission one week before end
Final testset (12000 new questions) released one week before end
Private Leaderboard only based on this new questions
external data explicitely allowed
800 teams participated in stage I
170 in stage II
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 19 / 41
Kaggle The Competition
Challenges
What do you need to solve this problems?
External Data (Wikipedia, CK12, etc.)
NLP knowledge
Search infrastructure (Elasticsearch/Lucene)
Feature engineering and machine learning
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 20 / 41
Kaggle IBM Watson
Invitation for IBM Watson
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 21 / 41
Kaggle IBM Watson
Decline to Participate
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 22 / 41
Competition Questions
Examples
Question: A scientist claims he has found a cure for a skin disease. Afterpublishing the results, the experiment was found to be biased. Why didpublishing the results allow bias to be recognized within the experiment?
a) It allowed others to replicate the experiment.
b) It helped the scientist gain notoriety within his field.
c) It allowed the cure to be manufactured by the best company.
d) It helped other researchers find out more about the skin disease.
Our Answer: a) (509.6, 461.9, 427.0, 495.6)Correct Answer: a)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 23 / 41
Competition Questions
Examples
Question: A scientist claims he has found a cure for a skin disease. Afterpublishing the results, the experiment was found to be biased. Why didpublishing the results allow bias to be recognized within the experiment?
a) It allowed others to replicate the experiment.
b) It helped the scientist gain notoriety within his field.
c) It allowed the cure to be manufactured by the best company.
d) It helped other researchers find out more about the skin disease.
Our Answer: a) (509.6, 461.9, 427.0, 495.6)Correct Answer: a)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 23 / 41
Competition Questions
Examples
Question: A scientist claims he has found a cure for a skin disease. Afterpublishing the results, the experiment was found to be biased. Why didpublishing the results allow bias to be recognized within the experiment?
a) It allowed others to replicate the experiment.
b) It helped the scientist gain notoriety within his field.
c) It allowed the cure to be manufactured by the best company.
d) It helped other researchers find out more about the skin disease.
Our Answer: a) (509.6, 461.9, 427.0, 495.6)
Correct Answer: a)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 23 / 41
Competition Questions
Examples
Question: A scientist claims he has found a cure for a skin disease. Afterpublishing the results, the experiment was found to be biased. Why didpublishing the results allow bias to be recognized within the experiment?
a) It allowed others to replicate the experiment.
b) It helped the scientist gain notoriety within his field.
c) It allowed the cure to be manufactured by the best company.
d) It helped other researchers find out more about the skin disease.
Our Answer: a) (509.6, 461.9, 427.0, 495.6)Correct Answer: a)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 23 / 41
Competition Questions
Examples
Question: What is the primary function of skin cells?
a) to deliver messages to the brain
b) to generate movement of muscles
c) to provide a physical barrier to the body
d) to produce carbohydrates for energy
Our Answer: c) (253.0, 261.9, 302.3, 277.0)Correct Answer: c)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 24 / 41
Competition Questions
Examples
Question: What is the primary function of skin cells?
a) to deliver messages to the brain
b) to generate movement of muscles
c) to provide a physical barrier to the body
d) to produce carbohydrates for energy
Our Answer: c) (253.0, 261.9, 302.3, 277.0)Correct Answer: c)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 24 / 41
Competition Questions
Examples
Question: What is the primary function of skin cells?
a) to deliver messages to the brain
b) to generate movement of muscles
c) to provide a physical barrier to the body
d) to produce carbohydrates for energy
Our Answer: c) (253.0, 261.9, 302.3, 277.0)
Correct Answer: c)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 24 / 41
Competition Questions
Examples
Question: What is the primary function of skin cells?
a) to deliver messages to the brain
b) to generate movement of muscles
c) to provide a physical barrier to the body
d) to produce carbohydrates for energy
Our Answer: c) (253.0, 261.9, 302.3, 277.0)Correct Answer: c)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 24 / 41
Competition Questions
Examples
Question: Which of the following would be most useful for calculating thedensity of a rock sample?
a) microscope and balance
b) graduated cylinder and balance
c) microscope and graduated cylinder
d) beaker and graduated cylinder
Our Answer: b) (267.5, 276.4, 271.3, 275.8)Correct Answer: b)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 25 / 41
Competition Questions
Examples
Question: Which of the following would be most useful for calculating thedensity of a rock sample?
a) microscope and balance
b) graduated cylinder and balance
c) microscope and graduated cylinder
d) beaker and graduated cylinder
Our Answer: b) (267.5, 276.4, 271.3, 275.8)Correct Answer: b)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 25 / 41
Competition Questions
Examples
Question: Which of the following would be most useful for calculating thedensity of a rock sample?
a) microscope and balance
b) graduated cylinder and balance
c) microscope and graduated cylinder
d) beaker and graduated cylinder
Our Answer: b) (267.5, 276.4, 271.3, 275.8)
Correct Answer: b)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 25 / 41
Competition Questions
Examples
Question: Which of the following would be most useful for calculating thedensity of a rock sample?
a) microscope and balance
b) graduated cylinder and balance
c) microscope and graduated cylinder
d) beaker and graduated cylinder
Our Answer: b) (267.5, 276.4, 271.3, 275.8)Correct Answer: b)
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 25 / 41
Competition Our Approach
Information Retrieval IR
Idea: For each answer a)-d), create pairs of question + answer and scorethese 4 pairs in a search engine. The pair with the highest score wins.
Example
Put (This is a question) AND (this is an answer) into Google andrank by the number of hits.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 26 / 41
Competition Our Approach
Information Retrieval IR
Idea: For each answer a)-d), create pairs of question + answer and scorethese 4 pairs in a search engine. The pair with the highest score wins.
Example
Put (This is a question) AND (this is an answer) into Google andrank by the number of hits.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 26 / 41
Competition Our Approach
TF/IDF
Term Frequency-Inverse Document Frequency(tf/idf), is a numerical statisticthat is intended to reflect how important a word is to a document in acollection or corpus.
Definition
TFIDF(t, d,D) := ft,d · IDF(t,D),
where ft,d is the frequency of term t in document d and
IDF(t,D) = logN
|d ∈ D : t ∈ d|
with N being the toal number of documents in the corpus.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 27 / 41
Competition Our Approach
BM25
Okapi BM25 (BM stands for Best Matching) is a ranking function used bysearch engines to rank matching documents according to their relevance to agiven search query. It is based on the probabilistic retrieval frameworkdeveloped in the 1970s and 1980s by Stephen E. Robertson, Karen SparckJones, and others.
Definition
BM25(t, d,D) := IDF(t,D) · ft,d · (k1 + 1)
ft,d + k1 ·(
1− b+ b · |D|avgdl
) ,where
IDF(t,D) = logN − n(t) + 0.5
n(t) + 0.5,
and k1 ∈ [1.2, 2.0] and b = 0.75.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 28 / 41
Competition Our Approach
Word embeddings
Word Embeddings (Word2Vec, GloVe) are shallow, two-layer neural networks,that are trained to reconstruct linguistic contexts of words: the network isshown a word, and must guess at which words occurred in adjacent positionsin an input text.They build up a mapping femb :W → Rd, where d typically has size 100 or300.One important feature of this mapping is, that they map semantically closewords into similar locations of the d-dimensional vector space.This even allows doing some kind of arithmetic on words, i.e.
femb(Berlin)− femb(Germany) + femb(Italy) = femb(Rom)
Problem
How to score question + answer? Sum/Average/Weighted by IDF?
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 29 / 41
Competition Our Approach
PMI
Pointwise Mutual Information (PMI), is a measure of association used ininformation theory and statistics.
Definition
pmi(x; y) := logp(x, y)
p(x)p(y)= log
p(x|y)
p(x)= log
p(y|x)
p(y).
Choosing p(x, y) as the probability for the co-occurences of words x and y, wecan use this measure to compare each single word in the question to all theanswers words.The average (or median) of all these scores is then taken as the overall score ofa question-answer pair.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 30 / 41
Competition Our Approach
Feature Hashing
Embedding
Use a hashing algorithm of fixed length (say 4096) in order to encode word /sentences as fixed length vectors.
Learning
(Motivated by T. Mikolov’s negative sampling in word2vec)
Using Quizlet’s flashcards we generated an extended dataset
N positive term-definition pairs
3N negative term-definition pairs (i.e. term paired with a randomdefinition)
Train binary classifier using XGBoost with max.depth=10 and 1000srounds.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 31 / 41
Competition Our Approach
Feature Hashing
Embedding
Use a hashing algorithm of fixed length (say 4096) in order to encode word /sentences as fixed length vectors.
Learning
(Motivated by T. Mikolov’s negative sampling in word2vec)
Using Quizlet’s flashcards we generated an extended dataset
N positive term-definition pairs
3N negative term-definition pairs (i.e. term paired with a randomdefinition)
Train binary classifier using XGBoost with max.depth=10 and 1000srounds.
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 31 / 41
Competition Our Approach
Final Learning
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 32 / 41
Competition Our Approach
14 days to go. . .
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 33 / 41
Competition Last minute changes
Large Scale XGBoost
Pushing XGBoost to the limits. . .
50M quizlet cards
3 + 1 negative sampling yields 200M observations
Feature Hashing produces sparse matrix with 2147863398 entries
Result from XGBoost
long vectors not supported yet:../../src/include/Rinlinedfuns.h:137
Running with 150M samples was fine but needed fast machine to finish tilcompetition deadline. . .
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 34 / 41
Competition Last minute changes
Large Scale XGBoost
Pushing XGBoost to the limits. . .
50M quizlet cards
3 + 1 negative sampling yields 200M observations
Feature Hashing produces sparse matrix with 2147863398 entries
Result from XGBoost
long vectors not supported yet:../../src/include/Rinlinedfuns.h:137
Running with 150M samples was fine but needed fast machine to finish tilcompetition deadline. . .
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 34 / 41
Competition Last minute changes
Large Scale XGBoost
Pushing XGBoost to the limits. . .
50M quizlet cards
3 + 1 negative sampling yields 200M observations
Feature Hashing produces sparse matrix with 2147863398 entries
Result from XGBoost
long vectors not supported yet:../../src/include/Rinlinedfuns.h:137
Running with 150M samples was fine but needed fast machine to finish tilcompetition deadline. . .
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 34 / 41
Competition Last minute changes
Last Minute Learning
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 35 / 41
Competition Last minute changes
Public Leaderboard / Model submission deadline
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 36 / 41
Competition Competitors
Cardal
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 37 / 41
Competition Competitors
Cardal’s approach
Data sources
Wikipedia, CK12, Quizlet, StudyStack, Saylor, Openstax, UtahOER, miscsources from AI2/Aristo
Processing
Hand-written parsers for all the sources (regex!!)
Uses 4 different stemmers
28 sets of features
Lucene Search/Scoring plus homebrewed search/score
Learning
Gradient boosting
Ensemble of 6 models, each uses its own feature mix
Lots of handtuned parameters
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 38 / 41
Competition Competitors
Cardal’s approach
Data sources
Wikipedia, CK12, Quizlet, StudyStack, Saylor, Openstax, UtahOER, miscsources from AI2/Aristo
Processing
Hand-written parsers for all the sources (regex!!)
Uses 4 different stemmers
28 sets of features
Lucene Search/Scoring plus homebrewed search/score
Learning
Gradient boosting
Ensemble of 6 models, each uses its own feature mix
Lots of handtuned parameters
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 38 / 41
Competition Competitors
Cardal’s approach
Data sources
Wikipedia, CK12, Quizlet, StudyStack, Saylor, Openstax, UtahOER, miscsources from AI2/Aristo
Processing
Hand-written parsers for all the sources (regex!!)
Uses 4 different stemmers
28 sets of features
Lucene Search/Scoring plus homebrewed search/score
Learning
Gradient boosting
Ensemble of 6 models, each uses its own feature mix
Lots of handtuned parameters
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 38 / 41
Competition Results
Private Leaderboard
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 39 / 41
Competition Aftermath
Private Leaderboard
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 40 / 41
Competition Summary
Summary
THANK YOU!
Benedikt Wilbertz (Trendiction) Hot Topics in Machine Learning June 17, 2016 41 / 41