Download - Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan
![Page 1: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/1.jpg)
Linear Methods for Classification
20.04.2015:
Presentation for MA seminar in statistics
Eli Dahan
![Page 2: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/2.jpg)
Outline
Introduction - problem and solution LDA - Linear Discriminant Analysis LR :
Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes
![Page 3: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/3.jpg)
Introduction - the problem
X
Group k
Observation
Or Group l?
*We can think of G as “group label”
Posteriori
Pj=P(G=j|X=x)
![Page 4: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/4.jpg)
Introduction - the solution
Linear Decision boundary:
pk=pl
pk>plchoose K
pl>pkchoose L
![Page 5: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/5.jpg)
Linear Discriminant Analysis
Let P(G = k) = k and P(X=x|G=k) = fk(x) Then by bayes rule:
Decision boundary:
![Page 6: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/6.jpg)
Linear Discriminant Analysis
Assuming fk(x) ~ gauss(k, k) and 1 =2 = …
=K= We get Linear (in x) decision boundary
For not common we get QDA (RDA)
![Page 7: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/7.jpg)
Linear Discriminant Analysis
Using empirical estimation methods:
Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability
![Page 8: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/8.jpg)
Logistic Regression
Models posterior prob. Of K classes; they sum to one and remain in [0,1]:
• Linear Decision boundary:
![Page 9: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/9.jpg)
Logistic Regression
Model fit:
• In max. ML Newton-Raphson algorithm is used
![Page 10: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/10.jpg)
Linear Regression
Recall the common features of multivariate regression:
• +Lack of multicollinearity etc.• Here: Assuming N instances (N*p observation
matrix X), Y is a N*K indicator response matrix (K classes).
![Page 11: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/11.jpg)
Linear Regression
![Page 12: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/12.jpg)
Linear Regression
![Page 13: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/13.jpg)
LDA Vs. LR
Similar results, LDA slightly better (56% vs. 67% error rate for LR)
Presumably, they are identical because of the linear end-form of decision boundaries (return to see).
![Page 14: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/14.jpg)
LDA Vs. LR
LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction)
Linearity is derived
LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood.
Linearity is assumed
![Page 15: Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan](https://reader035.vdocument.in/reader035/viewer/2022062518/5697bf931a28abf838c8f583/html5/thumbnails/15.jpg)
In a word – separating hyperplanes