Андрей Гулин "Знакомство с matrixnet"

19
Moscow, 03.07.2012 Andrey Gulin MatrixNet

Upload: yandex

Post on 16-Jun-2015

2.668 views

Category:

Technology


8 download

DESCRIPTION

Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012Андрей Гулин, разработчик, Яндекс

TRANSCRIPT

Page 1: Андрей Гулин "Знакомство с MatrixNet"

Moscow, 03.07.2012

Andrey Gulin

MatrixNet

Page 2: Андрей Гулин "Знакомство с MatrixNet"

Why is this relevant? — CERN event classification problem

— CERN solution quality 94.4%

— Yandex MatrixNet solution quality 95.8%

Page 3: Андрей Гулин "Знакомство с MatrixNet"

Machine Learning — Deterministic processes -> programming, computer

science etc

— Noisy data -> statistics, machine learning etc

— Supervised / Unsupervised / Semisupervised

— Offline/online learning

Page 4: Андрей Гулин "Знакомство с MatrixNet"

ML applications — Yandex: ranking, spam classification, user behavior

modeling etc

— CERN: event classification etc

— Finance: fraud detection, credit scoring etc

— …

Page 5: Андрей Гулин "Знакомство с MatrixNet"

Binary classification problem — Offline Supervised problem

— Given samples of 2 classes predict the class of unseen sample

— For each sample we know N real valued features {xi}

Page 6: Андрей Гулин "Знакомство с MatrixNet"

Solution quality measures — ROC (receiver operating characteristic) curve

— AUC (area under curve)

Page 7: Андрей Гулин "Знакомство с MatrixNet"

Solution quality measures — Precision Recall curve

— BEP (Break Even Point), precision == recall

Page 8: Андрей Гулин "Знакомство с MatrixNet"

Solution quality measures — Log likelihood / cross entropy = sum {log P}

— Convex function with derivatives

— Used as a proxy for non-continuous functions like AUC/BEP etc

Page 9: Андрей Гулин "Знакомство с MatrixNet"

Methods — Nearest Neighbors

— SVM

— Logistic regression (linear regression with logistic transform of the result)

— “Neural” networks = non linear regression

— Decision Trees

— Boosted Decision Trees

Page 10: Андрей Гулин "Знакомство с MatrixNet"

Decision Tree F1 > 3

F2 > 3

F1 > 6

Page 11: Андрей Гулин "Знакомство с MatrixNet"

Bootstrapping — Take N random samples with replacement from

original set

— Easy way to estimate all sorts of statistics over the set

— Including building model of the set

Page 12: Андрей Гулин "Знакомство с MatrixNet"

Boosting — Building strong model as a combination of “weak

models”

— Iterative process, on each iteration we

— Approximate current residual with the best “weak model”

— Scale new “weak model” by small number

— Add it to the solution

— Approximating loss function gradient instead of residual gives gradient boosting

Page 13: Андрей Гулин "Знакомство с MatrixNet"

Overfitting

Page 14: Андрей Гулин "Знакомство с MatrixNet"

Boosting — Greedy selection and scaling produce regularization

effect

— If “weak model” is a scaled feature, then Boosting produces L1 regularized solution

— If “weak model” is a greedily constructed decision tree, then Boosting gives a form hierarchical sparsity constraint

Page 15: Андрей Гулин "Знакомство с MatrixNet"

MatrixNet

Page 16: Андрей Гулин "Знакомство с MatrixNet"

MatrixNet — MatrixNet is an implementation of gradient boosted

decision trees algorithm

— MatrixNet is a bit different from standard

—Using Oblivious Trees

—Accounting for sample count in each leaf

Page 17: Андрей Гулин "Знакомство с MatrixNet"

Oblivious Trees F1 > 3

F2 > 3

F2 > 3

Page 18: Андрей Гулин "Знакомство с MatrixNet"

Accounting leaf sample count — Prefer trees with large average in leafs with many

samples

— F.e. multiplying leaf average by sqrt(N/(N+100)) (N – leaf sample count) produces better model

Page 19: Андрей Гулин "Знакомство с MatrixNet"

Questions?