linear methods for classification 20.04.2015: presentation for ma seminar in statistics eli dahan
Post on 17-Jan-2016
217 Views
Preview:
TRANSCRIPT
Linear Methods for Classification
20.04.2015:
Presentation for MA seminar in statistics
Eli Dahan
Outline
Introduction - problem and solution LDA - Linear Discriminant Analysis LR :
Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes
Introduction - the problem
X
Group k
Observation
Or Group l?
*We can think of G as “group label”
Posteriori
Pj=P(G=j|X=x)
Introduction - the solution
Linear Decision boundary:
pk=pl
pk>plchoose K
pl>pkchoose L
Linear Discriminant Analysis
Let P(G = k) = k and P(X=x|G=k) = fk(x) Then by bayes rule:
Decision boundary:
Linear Discriminant Analysis
Assuming fk(x) ~ gauss(k, k) and 1 =2 = …
=K= We get Linear (in x) decision boundary
For not common we get QDA (RDA)
Linear Discriminant Analysis
Using empirical estimation methods:
Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability
Logistic Regression
Models posterior prob. Of K classes; they sum to one and remain in [0,1]:
• Linear Decision boundary:
Logistic Regression
Model fit:
• In max. ML Newton-Raphson algorithm is used
Linear Regression
Recall the common features of multivariate regression:
• +Lack of multicollinearity etc.• Here: Assuming N instances (N*p observation
matrix X), Y is a N*K indicator response matrix (K classes).
Linear Regression
Linear Regression
LDA Vs. LR
Similar results, LDA slightly better (56% vs. 67% error rate for LR)
Presumably, they are identical because of the linear end-form of decision boundaries (return to see).
LDA Vs. LR
LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction)
Linearity is derived
LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood.
Linearity is assumed
In a word – separating hyperplanes
top related