Андрей Гулин "Знакомство с matrixnet"
DESCRIPTION
Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012Андрей Гулин, разработчик, ЯндексTRANSCRIPT
Moscow, 03.07.2012
Andrey Gulin
MatrixNet
Why is this relevant? — CERN event classification problem
— CERN solution quality 94.4%
— Yandex MatrixNet solution quality 95.8%
Machine Learning — Deterministic processes -> programming, computer
science etc
— Noisy data -> statistics, machine learning etc
— Supervised / Unsupervised / Semisupervised
— Offline/online learning
ML applications — Yandex: ranking, spam classification, user behavior
modeling etc
— CERN: event classification etc
— Finance: fraud detection, credit scoring etc
— …
Binary classification problem — Offline Supervised problem
— Given samples of 2 classes predict the class of unseen sample
— For each sample we know N real valued features {xi}
Solution quality measures — ROC (receiver operating characteristic) curve
— AUC (area under curve)
Solution quality measures — Precision Recall curve
— BEP (Break Even Point), precision == recall
Solution quality measures — Log likelihood / cross entropy = sum {log P}
— Convex function with derivatives
— Used as a proxy for non-continuous functions like AUC/BEP etc
Methods — Nearest Neighbors
— SVM
— Logistic regression (linear regression with logistic transform of the result)
— “Neural” networks = non linear regression
— Decision Trees
— Boosted Decision Trees
Decision Tree F1 > 3
F2 > 3
F1 > 6
Bootstrapping — Take N random samples with replacement from
original set
— Easy way to estimate all sorts of statistics over the set
— Including building model of the set
Boosting — Building strong model as a combination of “weak
models”
— Iterative process, on each iteration we
— Approximate current residual with the best “weak model”
— Scale new “weak model” by small number
— Add it to the solution
— Approximating loss function gradient instead of residual gives gradient boosting
Overfitting
Boosting — Greedy selection and scaling produce regularization
effect
— If “weak model” is a scaled feature, then Boosting produces L1 regularized solution
— If “weak model” is a greedily constructed decision tree, then Boosting gives a form hierarchical sparsity constraint
MatrixNet
MatrixNet — MatrixNet is an implementation of gradient boosted
decision trees algorithm
— MatrixNet is a bit different from standard
—Using Oblivious Trees
—Accounting for sample count in each leaf
Oblivious Trees F1 > 3
F2 > 3
F2 > 3
Accounting leaf sample count — Prefer trees with large average in leafs with many
samples
— F.e. multiplying leaf average by sqrt(N/(N+100)) (N – leaf sample count) produces better model
Questions?