sgn-41006 (4 cr) chapter 6 (+ extra material): non-linear ... · material chapter 6 (excluding...
Post on 14-Jul-2020
2 Views
Preview:
TRANSCRIPT
SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-LinearDiscriminant Analysis
Jussi Tohka & Jari Niemi
Department of Signal ProcessingTampere University of Technology
January 28, 2014
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 1 / 14
Contents of This Lecture
1 Non-Linear Techniques (Slides by Andrew W. Moore)Support Vector Machines (Linear and Non-Linear)Non-Linear Basis FunctionsNon-Linear Kernel MethodsNon-Linear Logistic Regression
2 Linear RegressionRidge RegressionSVM Regression
3 RegularizationSVM in Regularization
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 2 / 14
Material
Chapter 6 (excluding 6.4), Sections 1.7 and 5.4.5 in WebCop:2011 ,Sections 3.4, 4.4.4, and 12.3.2 in HasTibFri:2009 , Slides by Andrew W.Moore (Available at Moodle)
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 3 / 14
What Should You Already Know?
Regression, Logistic regression, Linear Discriminant Analysis
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 4 / 14
Linear Regression
Linear Regression and Classification
Regression: response variablescontinuous (instead of categorical as inclassification)
For modelling the relations betweencontinuous variables:
y = w0 + wTx.
Try to learn (w0,w) = β given (yi , xi )pairs.
Solution: β̂ = (XTX)−1XTy
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 5 / 14
Linear Regression Ridge Regression
Ridge Regression
When the columns of the design matrix X are correlated, (XTX)becomes close to singular and the standard estimate becomes highlysensitive to random errors in observed yi .
Classic solution is the ridge regression:
β̂ = (XTX + λI )−1XTy,
where λ is the ridge parameter.
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 6 / 14
Linear Regression SVM Regression
SVM Regression
SVMs can be used to solve regressionproblems: Minimize
1
2wTw + C
n∑i=1
(ξ + ξ̂)
subject to
wT xi + w0 − yi ≤ ε+ ξ, i = 1, . . . , n(1)
yi − (wTxi + w0) ≤ ε+ ξ̂, i = 1, . . . , n(2)
ξ, ξ̂ ≥ 0, i = 1, . . . , n (3)
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 7 / 14
Regularization
Regularization
If the number of variables/features compared to the number trainingsamples, it is useful to try to find a good subset of features/variablesand build a classifier/predictor based on these only.
One way to achieve this is add a regularization to theclassifier/predictor learning.
We have already talked about regularized discriminant analysis.
Now, we consider how add regularization to logistic/linear regressionand show how SVMs can be interpreted as regularization method.
Relation to feature/variable selection
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 8 / 14
Regularization
LASSO, Ridge and Elastic Net
To learn a classifier, minimize:
D(β|X, y) + λR(β)
D is data term (logistic or linearregression)
R is a regularization term
Elastic net (Zou, Hastie 2005): LASSO+ Ridge
R(β) = (1− α)||β||1 + α||β||2
Intercept not penalized (although thenotation indicates otherwise)
Regularized solutionequivalent to constrainedsolution:
minD(β|X, y)s.t.R(β) ≤ t
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 9 / 14
Regularization
Example: MEG Mind Reading
MEG mind reading competition: Identify the movie the subject is watchingbased on 1 sec of MEG data:
Problem: Training data N = 727 samples with the dimensionality 40800(measurements at 200 Hz from 204 gradiometer channels).
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 10 / 14
Regularization
Example: MEG Mind Reading: What Works?
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 11 / 14
Regularization
Example: MEG Mind Reading: The Classifier
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 12 / 14
Regularization SVM in Regularization
SVM in Regularization
SVM can be also interpreted as a penalization (regularization) method: Itcan be solved by minimizing
N∑i=1
[1− yig(xi )]+1
2C||w||2
where []+ denotes the positive part and yi is the class label (either 1 or−1).This is of form loss + penalty, where the loss function is called hinge loss.
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 13 / 14
Summary
Summary
1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore
2 Regression: response variables continuous , Classification: responsevariable categorical
3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.
4 SVM can be interprted as regularization method.
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14
Summary
Summary
1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore
2 Regression: response variables continuous , Classification: responsevariable categorical
3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.
4 SVM can be interprted as regularization method.
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14
Summary
Summary
1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore
2 Regression: response variables continuous , Classification: responsevariable categorical
3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.
4 SVM can be interprted as regularization method.
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14
Summary
Summary
1 Summary for Non-Linear Techniques: see the slides by Andrew W.Moore
2 Regression: response variables continuous , Classification: responsevariable categorical
3 Regularization can help to make both regression and classificationproblems tractable when (for example) data dimensionality is large.
4 SVM can be interprted as regularization method.
J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter 6 (+ Extra Material): Non-Linear Discriminant AnalysisJanuary 28, 2014 14 / 14
top related