7_4 linear discriminant analysis
TRANSCRIPT
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 1/13
1
Advanced Statistical Methods inInsurance
7. Multivariate Data
7.4 Linear Discriminant Analysis
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Problem Definition
` Given are samples from g different populations
` The main question of discriminantanalysis
u a r i a l S t u d i e s
based on a so called training sample, which allowsthe correct classification of future observations intotheir unknown population they belong to
` d: S → {1, ..., g} S ⊂ RP sampling spaced is a decision rule which can be applied to an
S a l z b u r g I n s t i t u t e o f A c t
observation xω : d(xω)=k
` If ω ∈ Ωk and d(xω) = k Correct decision
` If ω ∈ Ωk and d(xω) ≠ k Wrong decision
2 ©Hudec & Schlögl
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 2/13
2
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Data Structure Training sample with known group membership
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl3
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Bayes Theorem
` Prior probability of group memebership
p k P k( ) { }= ∈ >ω Ω 0
u a r i a l S t u d i e s
` Class specific (conditional) distribution of x
` Unconditional distribution of xg
*
f( k)x
S a l z b u r g I n s t i t u t e o f A c t
` Posterior Probability
4 ©Hudec & Schlögl
p(k) * f( k)P(k|x)
f( )=
x
x
k 1
=
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 3/13
3
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Decision Principles` Bayes Decision Rule
` Assign an object to that class kest for which the
u a r i a l S t u d i e s
` kest= e(x) with p(kest|x) ≥ p (l|x) for l = 1, ..., g
` p(kest ) * f(x| ) ≥ p(l) * f(x|l) for l = 1, ..., g
` Maximum-Likelihood Rule
` Assign an object to that class kest for which the
S a l z b u r g I n s t i t u t e o f A
c t
` kest= e(x) with f(x| kest) ≥ f(x|l) for l = 1, ..., g
` In case of equality of all prior probabilities both rulesare equivalent
5 ©Hudec & Schlögl
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Optimality of Bayes Decision
Conditional error rate:
ε(d|x) = P (d(x) ≠ k|x) = 1 - P(d(x) = k|x)
u a r i a l S t u d i e s
As the Bayes rule maximizes the second term on the rightside, it minimizes the conditional error rate. Integration over
the sampling space S leads to the minimization of theunconditional error rate ε(e) = P (d(x) ≠ k) for an object frompopulation k
S a l z b u r g I n s t i t u t e o f A c t
Visualization of Optimality of Bayes Decision rule for p=1
6 ©Hudec & Schlögl
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 4/13
4
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Bayes Rule forg=2
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl7
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Linear Decision Rules
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A c t
©Hudec & Schlögl8
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 5/13
5
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Discrimination with p=2 and g=2
Assumption:
u a r i a l S t u d i e s
t n eac popuatonthe data aremultivariate normalwith different centersbut constantcovariance matrix
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl9
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Discrimination with p=2 and g=2
12
Obviously the Bayes
u a r i a l S t u d i e s
2
4
6
8
10
0.010.02
0.03
0.03
0.04
0.04
0.05
0.050.06
0.06
and the MaximumLikelihood decisionrule both lead to a
linear separationbetween the groups
S a l z b u r g I n s t i t u t e o f A c t
©Hudec & Schlögl10
0
0 2 4 6 8 10 12
grid
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 6/13
6
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Discrimination with p=2 and g=2
1 2
homoscedastic
u a r i a l S t u d i e s
1
4
6
8
1 0 From the training set
we can estimate theunknown parametersof the multivariatenormal and cancalculate an estimatefor the optimum
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl11
1
0 2 4 6 8 10 12
0
2
separating line
7.4 Linear DiscriminantAnalysis7 Multivariate Data
4
Bayes-Regel
-
Bayes versus Maximum Likelihood Rule
4
u a r i a l S t u d i e s
- 2
0
2
- 2
0
2
S a l z b u r g I n s t i t u t e o f A c t
-4 -2 0 2 4
- 4
Priori-Wahrscheinlickeiten 0,8 0,2
©Hudec & Schlögl12
-4 -2 0 2 4
- 4
Priori-Wahrscheinlickeiten 0,5 0,5
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 7/13
7
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Case of Non-Homogeneous Variances
12
u a r i a l S t u d i e s
2
4
6
8
10
0.010.02
0.030.040.05
0.06
0.06
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl13
0
0 2 4 6 8 10 12
Results in quadratic separation of populations
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Empirical Data
1 2
heteroscedastic
u a r i a l S t u d i e s
1
4
6
8
1 0
S a l z b u r g I n s t i t u t e o f A c t
©Hudec & Schlögl14
1
0 2 4 6 8 10 12
0
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 8/13
8
7.4 Linear DiscriminantAnalysis7 Multivariate Data
LDA & QDA
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl15
7.4 Linear DiscriminantAnalysis7 Multivariate Data
0 . 4
Nonparametric DiscriminantAnalysis
u a r i a l S t u d i e s
. 1
0 . 2
0 . 3
e as e nes s owthe true class specificdensities.
In this situation, wherethe true distributionsare normal, the non-parametric densityestimation will ive
S a l z b u r g I n s t i t u t e o f A c t
-2 0 2 4
0 . 0
16 ©Hudec & Schlögl
sub-optimalclassification results
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 9/13
9
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Nonparametric DiscriminantAnalysis
0 . 5
u a r i a l S t u d i e s
e as e nes s owthe true class specificdensities.In this situation non-parametric densityestimation will givebetter classificationresults than the
0 . 2
0 . 3
0 . 4
S a l z b u r g I n s t i t u t e o f A
c t
17 ©Hudec & Schlögl
parametric estimateshown on the nextslide
-2 0 2 4
0 . 0
0 . 1
7.4 Linear DiscriminantAnalysis7 Multivariate Data
0 . 5
Inadequacy of Parametric Estimate
u a r i a l S t u d i e s
0 . 2
0
. 3
0 . 4
Compare thisexample with the
S a l z b u r g I n s t i t u t e o f A c t
-2 0 2 4
0 . 0
0 . 1
©Hudec & Schlögl18
considerations onrobustness fromchapter 3.2!
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 10/13
10
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Naïve Bayes` In case of large p non-parametric methods suffer from
the so called curse of dimensionality (Bellmann, 1961).
u a r i a l S t u d i e s
` assumptions tend to break down, as the number of datapoints needed to derive reliable estimates increases veryfast.
` In these situations the “Naïve Bayes” principle has itsmerits. It assumes that the class densities are products
S a l z b u r g I n s t i t u t e o f A
c t
of marginal densities, which corresponds to assumingconditional independence of the variables within eachclass
19 ©Hudec & Schlögl
p
1 p j j 1
ˆ ˆ ˆf( |k) f((x , ,x ) '|k) f(x |k)=
= = ∏x …
7.4 Linear DiscriminantAnalysis7 Multivariate Data
LDA versus Logistic Regression
green Logistic Regression magenta LDA
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A c t
©Hudec & Schlögl20
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 11/13
11
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Fisher - LDA` Popularity of LDA is due to Fisher
who developed the method
u a r i a l S t u d i e s
assumption of multivariateGaussian distributions within eachclass.
` Fisher showed that the sameresult can be achieved by
S a l z b u r g I n s t i t u t e o f A
c t
searching for the most informative(with regard to the class structure)low-dimensional projections of thedata.
21 ©Hudec & Schlögl
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Not the Most Informative Projection
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A c t
22 ©Hudec & Schlögl
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 12/13
12
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Discriminant Analysis due to Fisher
u a r i a l S t u d i e s
This projection gives
S a l z b u r g I n s t i t u t e o f A
c t
the best discriminationbetween the groups
23 ©Hudec & Schlögl
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Fisher‘s Approach with g=2 Groups
` Looking for a linear combination of the observedvariables y a xk k= ′ k=1,2
u a r i a l S t u d i e s
` which maximizes the variance criterion
` leads to the solution
( )
Q S S( )a
y y
=
−
+
1 22
12
22 → max
a W x x= −−1
S a l z b u r g I n s t i t u t e o f A c t
` W ~ Within sumof squares cross productmatrix
24 ©Hudec & Schlögl
( )W x x x x x x x x= + − = − − ′ + − − ′= =
∑ ∑N N Sn
N
n nn
N
n n1 21
1 1 1 11
2 2 2 221 2
* ( )( ) ( )( )
7/29/2019 7_4 Linear Discriminant Analysis
http://slidepdf.com/reader/full/74-linear-discriminant-analysis 13/13
13
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Fisher‘s Approach with g>2 Groups (1)
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A
c t
©Hudec & Schlögl25
7.4 Linear DiscriminantAnalysis7 Multivariate Data
Fisher‘s Approach with g>2 Groups (2)
u a r i a l S t u d i e s
S a l z b u r g I n s t i t u t e o f A c t
©Hudec & Schlögl26
It can be shown that the approach of Fisher leads to the same results asthe LDA based on multivariate Gaussians with constant variance-matrices