sgn-41006 (4 cr) chapter 6 (+ extra material): non-linear ... · material chapter 6 (excluding...

17
SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 28, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant Analysis January 28, 2014 1 / 14

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-LinearDiscriminant Analysis

Jussi Tohka & Jari Niemi

Department of Signal ProcessingTampere University of Technology

January 28, 2014

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 1 / 14

Page 2: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Contents of This Lecture

1 Non-Linear Techniques (Slides by Andrew W. Moore)Support Vector Machines (Linear and Non-Linear)Non-Linear Basis FunctionsNon-Linear Kernel MethodsNon-Linear Logistic Regression

2 Linear RegressionRidge RegressionSVM Regression

3 RegularizationSVM in Regularization

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 2 / 14

Page 3: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Material

Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 ,Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009 , Slides by Andrew W.Moore (Available at Moodle)

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 3 / 14

Page 4: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

What Should You Already Know?

Regression, Logistic regression, Linear Discriminant Analysis

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 4 / 14

Page 5: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Linear Regression

Linear Regression and Classification

Regression: response variablescontinuous (instead of categorical as inclassification)

For modelling the relations betweencontinuous variables:

y = w0 + wTx.

Try to learn (w0,w) = β given (yi , xi )pairs.

Solution: β̂ = (XTX)−1XTy

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 5 / 14

Page 6: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Linear Regression Ridge Regression

Ridge Regression

When the columns of the design matrix X are correlated, (XTX)becomes close to singular and the standard estimate becomes highlysensitive to random errors in observed yi .

Classic solution is the ridge regression:

β̂ = (XTX + λI )−1XTy,

where λ is the ridge parameter.

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 6 / 14

Page 7: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Linear Regression SVM Regression

SVM Regression

SVMs can be used to solve regressionproblems: Minimize

1

2wTw + C

n∑i=1

(ξ + ξ̂)

subject to

wT xi + w0 − yi ≤ ε+ ξ, i = 1, . . . , n(1)

yi − (wTxi + w0) ≤ ε+ ξ̂, i = 1, . . . , n(2)

ξ, ξ̂ ≥ 0, i = 1, . . . , n (3)

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 7 / 14

Page 8: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Regularization

Regularization

If the number of variables/features compared to the number trainingsamples, it is useful to try to find a good subset of features/variablesand build a classifier/predictor based on these only.

One way to achieve this is add a regularization to theclassifier/predictor learning.

We have already talked about regularized discriminant analysis.

Now, we consider how add regularization to logistic/linear regressionand show how SVMs can be interpreted as regularization method.

Relation to feature/variable selection

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 8 / 14

Page 9: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Regularization

LASSO, Ridge and Elastic Net

To learn a classifier, minimize:

D(β|X, y) + λR(β)

D is data term (logistic or linearregression)

R is a regularization term

Elastic net (Zou, Hastie 2005): LASSO+ Ridge

R(β) = (1− α)||β||1 + α||β||2

Intercept not penalized (although thenotation indicates otherwise)

Regularized solutionequivalent to constrainedsolution:

minD(β|X, y)s.t.R(β) ≤ t

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 9 / 14

Page 10: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Regularization

Example: MEG Mind Reading

MEG mind reading competition: Identify the movie the subject is watchingbased on 1 sec of MEG data:

Problem: Training data N = 727 samples with the dimensionality 40800(measurements at 200 Hz from 204 gradiometer channels).

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 10 / 14

Page 11: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Regularization

Example: MEG Mind Reading: What Works?

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 11 / 14

Page 12: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Regularization

Example: MEG Mind Reading: The Classifier

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 12 / 14

Page 13: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Regularization SVM in Regularization

SVM in Regularization

SVM can be also interpreted as a penalization (regularization) method: Itcan be solved by minimizing

N∑i=1

[1− yig(xi )]+1

2C||w||2

where []+ denotes the positive part and yi is the class label (either 1 or−1).This is of form loss + penalty, where the loss function is called hinge loss.

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 13 / 14

Page 14: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Summary

Summary

1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore

2 Regression: response variables continuous , Classification: responsevariable categorical

3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.

4 SVM can be interprted as regularization method.

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14

Page 15: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Summary

Summary

1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore

2 Regression: response variables continuous , Classification: responsevariable categorical

3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.

4 SVM can be interprted as regularization method.

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14

Page 16: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Summary

Summary

1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore

2 Regression: response variables continuous , Classification: responsevariable categorical

3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.

4 SVM can be interprted as regularization method.

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14

Page 17: SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear ... · Material Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 , Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009

Summary

Summary

1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore

2 Regression: response variables continuous , Classification: responsevariable categorical

3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.

4 SVM can be interprted as regularization method.

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14