sgn-41006 (4 cr) chapter 6 (+ extra material): non-linear ... · material chapter 6 (excluding...

SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-LinearDiscriminant Analysis

Jussi Tohka & Jari Niemi

Department of Signal ProcessingTampere University of Technology

January 28, 2014

J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 1 / 14

Contents of This Lecture

1 Non-Linear Techniques (Slides by Andrew W. Moore)Support Vector Machines (Linear and Non-Linear)Non-Linear Basis FunctionsNon-Linear Kernel MethodsNon-Linear Logistic Regression

2 Linear RegressionRidge RegressionSVM Regression

3 RegularizationSVM in Regularization


Material

Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 ,Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009 , Slides by Andrew W.Moore (Available at Moodle)


What Should You Already Know?

Regression, Logistic regression, Linear Discriminant Analysis


Linear Regression

Linear Regression and Classification

Regression: response variablescontinuous (instead of categorical as inclassification)

For modelling the relations betweencontinuous variables:

y = w0 + wTx.

Try to learn (w0,w) = β given (yi , xi )pairs.

Solution: β̂ = (XTX)−1XTy


Linear Regression Ridge Regression

Ridge Regression

When the columns of the design matrix X are correlated, (XTX)becomes close to singular and the standard estimate becomes highlysensitive to random errors in observed yi .

Classic solution is the ridge regression:

β̂ = (XTX + λI )−1XTy,

where λ is the ridge parameter.


Linear Regression SVM Regression

SVM Regression

SVMs can be used to solve regressionproblems: Minimize

1

2wTw + C

n∑i=1

(ξ + ξ̂)

subject to

wT xi + w0 − yi ≤ ε+ ξ, i = 1, . . . , n(1)

yi − (wTxi + w0) ≤ ε+ ξ̂, i = 1, . . . , n(2)

ξ, ξ̂ ≥ 0, i = 1, . . . , n (3)


Regularization

Regularization

If the number of variables/features compared to the number trainingsamples, it is useful to try to find a good subset of features/variablesand build a classifier/predictor based on these only.

One way to achieve this is add a regularization to theclassifier/predictor learning.

We have already talked about regularized discriminant analysis.

Now, we consider how add regularization to logistic/linear regressionand show how SVMs can be interpreted as regularization method.

Relation to feature/variable selection


Regularization

LASSO, Ridge and Elastic Net

To learn a classifier, minimize:

D(β|X, y) + λR(β)

D is data term (logistic or linearregression)

R is a regularization term

Elastic net (Zou, Hastie 2005): LASSO+ Ridge

R(β) = (1− α)||β||1 + α||β||2

Intercept not penalized (although thenotation indicates otherwise)

Regularized solutionequivalent to constrainedsolution:

minD(β|X, y)s.t.R(β) ≤ t


Regularization

Example: MEG Mind Reading

MEG mind reading competition: Identify the movie the subject is watchingbased on 1 sec of MEG data:

Problem: Training data N = 727 samples with the dimensionality 40800(measurements at 200 Hz from 204 gradiometer channels).


Regularization

Example: MEG Mind Reading: What Works?


Regularization

Example: MEG Mind Reading: The Classifier


Regularization SVM in Regularization

SVM in Regularization

SVM can be also interpreted as a penalization (regularization) method: Itcan be solved by minimizing

N∑i=1

[1− yig(xi )]+1

2C||w||2

where []+ denotes the positive part and yi is the class label (either 1 or−1).This is of form loss + penalty, where the loss function is called hinge loss.


Summary

Summary

1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore

2 Regression: response variables continuous , Classification: responsevariable categorical

3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.

4 SVM can be interprted as regularization method.


sgn-41006 (4 cr) chapter 6 (+ extra material): non-linear ... · material chapter 6 (excluding...

Documents