kernel density estimation theory and application in discriminant analysis thomas ledl universität...
Post on 22-Dec-2015
224 views
TRANSCRIPT
![Page 1: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/1.jpg)
Kernel Density EstimationTheory and Application in Discriminant Analysis
Thomas Ledl
Universität Wien
![Page 2: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/2.jpg)
Introduction Theory Aspects of Application Simulation Study Summary
Contents:
![Page 3: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/3.jpg)
Introduction
![Page 4: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/4.jpg)
Introduction
Theory
Application Aspects
Simulation Study
Summary
Introduction
0 1 2 3 4
25 observations:
Which distribution?
![Page 5: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/5.jpg)
0 1 2 3 4
0 1 2 3 4
0.0
0.1
0.2
0.3
0 1 2 3 4
0.0
0.2
0.4
0 1 2 3 4
0.0
0.2
0.4
0.6
0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
? ?
?
? ?
![Page 6: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/6.jpg)
0 1 2 3 4
Kernel density estimator model:
K(.) and h to choose
Theory
Application Aspects
Simulation Study
Summary
Introduction
![Page 7: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/7.jpg)
0 1 2 3 4
0 1 2 3 4
0.0
0.2
0.4
0.6
triangular
gaussian
„small“ h „large“ h
0 1 2 3 4
0.0
0.2
0.4
0.6
0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
kernel/ bandwidth:
![Page 8: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/8.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction
Question 1:
Which choice of K(.) and h is the best for a descriptive purpose?
![Page 9: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/9.jpg)
Introduction
Theory
Application Aspects
Simulation Study
Summary
Introduction
-3 -2,2 -1,4 -0,6 0,
2 1
1,8 2,6
-3
-1,3
0,4
2,1
0,000,010,020,03
0,040,05
0,06
0,07
0,08
0,09
-3
-1,5
0
1,5
3
-3 -2,3
-1,6
-0,9
-0,2 0,5 1,2 1,9 2,6
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14Classification:
![Page 10: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/10.jpg)
Introduction
Theory
Application Aspects
Simulation Study
Summary
Introduction
Levelplot – LDA (based on assumption of a multivariate normal distribution):
-5
-1
3
7
0.34
0.48
0.63
0.78
0.93
-1 1 3 5 7V1
V2
1
11
1
1
111 1
1
1
1
1
1
1
11
1
1
1
2
2
22
2
22
2 2
2
22
2
2
2
22
22
2
33
333
33
333
33
3
3
3333
33
4
4
4
4
4
444
44
44
44
4
444 4
4
5
55
5
5 5
5
5
5
5
5555
55
5
5
55
Classification:
![Page 11: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/11.jpg)
Introduction
Theory
Application Aspects
Simulation Study
Summary
Introduction
-5
-1
3
7
0.34
0.48
0.63
0.78
0.93
-1 1 3 5 7V1
V2
1
11
1
1
111 1
1
1
1
1
1
1
11
1
1
1
2
2
22
2
22
2 2
2
22
2
2
2
22
22
2
33
333
33
333
33
3
3
3333
33
4
4
4
4
4
444
44
44
44
4
444 4
4
5
55
5
5 5
5
5
5
5
5555
55
5
5
55
Classification:
![Page 12: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/12.jpg)
Introduction
Theory
Application Aspects
Simulation Study
Summary
Introduction
Levelplot – KDE classificator:
-5
-1
3
7
0.34
0.48
0.63
0.78
0.93
-1 1 3 5 7V1
V2
1
11
1
1
111 1
1
1
1
1
1
1
11
1
1
1
2
2
22
2
22
2 2
2
22
2
2
2
22
22
2
33
333
33
333
33
3
3
3333
33
4
4
4
4
4
444
44
44
44
4
444 4
4
5
55
5
5 5
5
5
5
5
5555
55
5
5
55
-1 1 3 5 7
-5
-1
3
7
0.34
0.48
0.63
0.78
0.93
V1
V2
1
11
1
1
111 1
1
1
1
1
1
1
11
1
1
1
2
2
22
2
22
2 2
2
22
2
2
2
22
22
2
33
333
33
333
33
3
3
3333
33
4
4
4
4
4
444
44
44
44
4
444 4
4
5
55
5
5 5
5
5
5
5
5555
55
5
5
55
Classification:
![Page 13: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/13.jpg)
Introduction
Theory
Application Aspects
Simulation Study
Summary
Introduction
Question 2:
Performance of classification based on KDE in more than 2 dimensions?
![Page 14: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/14.jpg)
Theory
![Page 15: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/15.jpg)
Essential issues
Optimization criteria Improvements of the standard
model Resulting optimal choices of the
model parameters K(.) and h
Theory
Application Aspects
Simulation Study
Summary
Introduction
![Page 16: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/16.jpg)
Essential issues
Optimization criteria Improvements of the standard
model Resulting optimal choices of the
model parameters K(.) and h
Theory
Application Aspects
Simulation Study
Summary
Introduction
![Page 17: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/17.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction
Lp-distances:
Optimization criteria
![Page 18: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/18.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction
f(.)
g(.)
-2 -1 0 1 2 3 4
0.0
0.2
0.4
![Page 19: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/19.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction
-2 -1 0 1 2 3 4
0.0
0.0
50
.10
0.1
5
-2 -1 0 1 2 3 4
0.0
0.2
0.4
0.0
0.2
0.4
![Page 20: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/20.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction
-2 -1 0 1 2 3 4
0.0
0.0
50
.10
0.1
5
„Integrated absolute error“
=IAE
=ISE
„Integrated squared error“
![Page 21: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/21.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction
-2 -1 0 1 2 3 4
0.0
0.0
50
.10
0.1
5
=IAE
„Integrated absolute error“
=ISE
„Integrated squared error“
![Page 22: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/22.jpg)
Theory
Application Aspects
Simulation Study
Summary
Introduction Consideration of horizontal distances for
a more intuitive fit (Marron and Tsybakov, 1995)
Compare the number and position of modes
Minimization of the maximum vertical distance
Other ideas:
![Page 23: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/23.jpg)
Overview about some minimization criteria
L1-distance=IAE
L-distance=Maximum difference
„Modern“ criteria, which include a kind of measure of the horizontal distances
L2-distance=ISE, MISE,AMISE,...
Difficult mathematical tractability
Does not consider overall fit
Difficult mathematical tractability
Theory
Application Aspects
Simulation Study
Summary
Introduction
Most commonly used
![Page 24: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/24.jpg)
ISE, MISE, AMISE,...
Theory
Application Aspects
Simulation Study
Summary
Introduction
log10(h)
-1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2
0.0
0.0
10
.02
0.0
30
.04
0.0
5
MISE,IV,ISBAMISE,AIV,AISB
x
De
nsi
ty-3 -1 1 2 3
0.0
0.2
0.4
MISE=E(ISE), the expectation of ISEAMISE=Taylor approximation of MISE, easier to calculate
ISE is a random variable
![Page 25: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/25.jpg)
Essential issues
Optimization criteria Improvements of the standard
model Resulting optimal choices of the
model parameters K(.) and h
Theory
Application Aspects
Simulation Study
Summary
Introduction
![Page 26: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/26.jpg)
The AMISE-optimal bandwidth
Theory
Application Aspects
Simulation Study
Summary
Introduction
![Page 27: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/27.jpg)
The AMISE-optimal bandwidth
Theory
Application Aspects
Simulation Study
Summary
Introductionminimized by
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
„Epanechnikov kernel“
dependent on the kernel function K(.)
![Page 28: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/28.jpg)
The AMISE-optimal bandwidth
Theory
Application Aspects
Simulation Study
Summary
Introduction
dependent on the unknown density f(.)
How to proceed?
![Page 29: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/29.jpg)
Data-driven bandwidth selection methods
Theory
Application Aspects
Simulation Study
Summary
Introduction
Maximum Likelihood Cross-Validation
Least-squares cross-validation (Bowman, 1984)
Leave-one-out selectors
Criteria based on substituting R(f“) in the AMISE-formula
„Normal rule“ („Rule of thumb“; Silverman, 1986)
Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990)
Smoothed bootstrap
![Page 30: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/30.jpg)
Data-driven bandwidth selection methods
Theory
Application Aspects
Simulation Study
Summary
Introduction
Leave-one-out selectors
Criteria based on substituting R(f“) in the AMISE-formula
„Normal rule“ („Rule of thumb“; Silverman, 1986)
Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990)
Smoothed bootstrap
Maximum Likelihood Cross-Validation
Least-squares cross-validation (Bowman, 1984)
![Page 31: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/31.jpg)
Least squares cross-validation (LSCV)
Undisputed selector in the 1980s Gives an unbiased estimator for the ISE Suffers from more than one local
minimizer – no agreement about which one to use
Bad convergence rate for the resulting bandwidth hopt
Theory
Application Aspects
Simulation Study
Summary
Introduction
![Page 32: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/32.jpg)
Data-driven bandwidth selection methods
Theory
Application Aspects
Simulation Study
Summary
Introduction
Maximum Likelihood Cross-Validation
Least-squares cross-validation (Bowman, 1984)
Leave-one-out selectors
Criteria based on substituting R(f“) in the AMISE-formula
„Normal rule“ („Rule of thumb“; Silverman, 1986)
Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990)
Smoothed bootstrap
![Page 33: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/33.jpg)
Normal rule („Rule of thumb“)
Assumes f(x) to be N(,2) Easiest selector Often oversmooths the function
Theory
Application Aspects
Simulation Study
Summary
Introduction The resulting bandwidth is given by:
![Page 34: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/34.jpg)
Data-driven bandwidth selection methods
Theory
Application Aspects
Simulation Study
Summary
Introduction
Maximum Likelihood Cross-Validation
Least-squares cross-validation (Bowman, 1984)
Leave-one-out selectors
Criteria based on substituting R(f“) in the AMISE-formula
„Normal rule“ („Rule of thumb“; Silverman, 1986)
Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990)
Smoothed bootstrap
![Page 35: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/35.jpg)
Plug in-methods (Sheather and Jones, 1991; Park and Marron,1990)
Does not substitute R(f“) in the AMISE-formula, but estimates it via R(f(IV)) and R(f(IV)) via R(f(VI)),etc.
Another parameter i to chose (the number of stages to go back) – one stage is mostly sufficient
Better rates of convergence Does not finally circumvent the problem
of the unknown density, eitherTheory
Application Aspects
Simulation Study
Summary
Introduction
![Page 36: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/36.jpg)
The multivariate case
Theory
Application Aspects
Simulation Study
Summary
Introduction
h H...the bandwidth matrix
![Page 37: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/37.jpg)
Issues of generalization in d dimensions
Theory
Application Aspects
Simulation Study
Summary
Introduction
d2 instead of one bandwidth parameter Unstable estimates Bandwidth selectors are essentially
straightforward to generalize For Plug-in methods it is „too difficult“ to
give succint expressions for d>2 dimensions
![Page 38: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/38.jpg)
Aspects of Application
![Page 39: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/39.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Essential issues
Curse of dimensionality Connection between goodness-of-fit and
optimal classification Two methods for discrimatory purposes
![Page 40: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/40.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Essential issues
Curse of dimensionality Connection between goodness-of-fit and
optimal classification Two methods for discrimatory purposes
![Page 41: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/41.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
The „curse of dimensionality“
The data „disappears“ into the distribution tails in high dimensions
: a good fit in the tails is desired!d
Probability mass NOT in the "Tail" of a Multivariate Normal Density
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# of dimensions
![Page 42: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/42.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
The „curse of dimensionality“
Much data is necessary to obey a constant estimation error in high dimensions
Dimensionality Required sample size1 42 193 674 2235 7686 27907 107008 437009 18700010 842000
![Page 43: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/43.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Essential issues
Curse of dimensionality Connection between goodness-of-fit and
optimal classification Two methods for discrimatory purposes
![Page 44: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/44.jpg)
Essential issues
Estimation of tails important
Worse fit in the tails
Calculation intensive for large n
Many observations required for a reasonable fit
L2-optimal L1-optimal (Misclassification rate)
AMISE-optimal parameter choice
Optimal classification (in high dimensions)
![Page 45: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/45.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Essential issues
Curse of dimensionality Connection between goodness-of-fit and
optimal classification Two methods for discrimatory purposes
![Page 46: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/46.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Method 1:
Reduction of the data onto a subspace which allows a somewhat accurate estimation, however does not destoy too much information „trade-off“
Use the multivariate kernel density concept to estimate the class densities
![Page 47: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/47.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Method 2:
Use the univariate concept to „normalize“ the data nonparametrically
Use the classical methods like LDA and QDA for classification
Drawback: calculation intensive
![Page 48: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/48.jpg)
Application Aspects
Theory
Simulation Study
Summary
Introduction
Method 2:
x
f(x)
0 2 4 6 80.
050.
100.
150.
20
x
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
F(x)G(x)
t(x) t(x+) x x+
a)
b)
![Page 49: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/49.jpg)
Simulation Study
![Page 50: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/50.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Criticism on former simulation studies
Carried out 20-30 years ago Out-dated parameter selectors Restriction to uncorrelated normals Fruitless estimation because of
high dimensions No dimension reduction
![Page 51: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/51.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
21 datasets 14 estimators 2 error criteria 21x14x2=588 classification
scores Many results
The present simulation study
![Page 52: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/52.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
The present simulation study
21 datasets 14 estimators 2 error criteria 21x14x2=588 classification
scores Many results
![Page 53: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/53.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Each dataset has...
...2 classes for distinction ...600 observations/class ...200 test observations, 100
produced by each class ... therfore dimension 1400x10
![Page 54: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/54.jpg)
Normal
-4 -2 0 2 4
0.0
0.2
0.4
Normal-noise small
-4 -2 0 2 4
0.0
0.2
0.4
Normal-noise medium
-4 -2 0 2 4
0.0
0.2
0.4
Normal-noise large
-4 -2 0 2 4
0.0
0.15
0.30
Exponential (1)
0 1 2 3 4 5 6
0.0
0.4
0.8
Bimodal - close
-2 0 2 4 6 8
0.0
0.10
0.20
Bimodal - far
-2 0 2 4 6 8
0.0
0.10
0.20
Univariate prototype distributions:
![Page 55: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/55.jpg)
+10 datasets having unequal covariance matrices
21 datasets total
+ 1 insurance dataset
10 datasets having equal covariance matrices
Dataset Nr. Abbrev. contains
1 NN1 10 normal distributions with "small noise"2 NN2 10 normal distributions with "medium noise"3 NN3 10 normal distributions with "small noise"4 SkN1 2 skewed (exp-)distributions and 7 normals5 SkN2 5 skewed (exp-)distributions and 5 normals6 SkN3 7 skewed (exp-)distributions and 3 normals7 Bi1 4 normals, 4 skewed and 2 bimodal (close)-dist.8 Bi2 4 normals, 4 skewed and 2 bimodal (close)-dist.9 Bi3 8 skewed and 2 bimodal (far)-dist.10 Bi4 8 skewed and 2 bimodal (far)-dist.
![Page 56: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/56.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Simulation Study
21 datasets 14 estimators 2 error criteria 21x14x2=588 classification
scores Many results
![Page 57: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/57.jpg)
Principal component reduction onto 2,3,4 and 5 dimensions (4) x multivariate „normal rule“ and multivariate LSCV-criterion ,resp. (2) 8 estimators
Method 2(„marginal normalizations“):
Method 1(multivariate density estimator):
Classical methods:
14 estimators
2 estimatorsLDA and QDA (2)
Univariate normal rule and Sheather-Jones plug-in (2) x subsequent LDA and QDA (2) 4 estimators
![Page 58: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/58.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Simulation Study
21 datasets 14 estimators 2 misclassification criteria 21x14x2=588 classification
scores Many results
![Page 59: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/59.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Misclassification Criteria
The classical Misclassification rate („Error rate“)
The Brier score
![Page 60: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/60.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Simulation Study
21 datasets 14 estimators 2 error criteria 21x14x2= 588 classification
scores Many results
![Page 61: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/61.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Results
Error rate vs. Brier score
0,00
0,10
0,20
0,30
0,40
0,50
0,60
0,70
0,80
0,0 0,1 0,2 0,3 0,4 0,5 0,6
Error rate
Bri
er s
core
The choice of the misclassification criterion is not essential
![Page 62: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/62.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Results The choice of the multivariate bandwidth parameter (method 1) is not essential in most cases
Error rates for method 1
0,000
0,050
0,100
0,150
0,200
0,250
0,300
0,350
0,400
0,450
0,500
0,000 0,100 0,200 0,300 0,400 0,500 0,600
"Normal rule"
LS
CV
Superiority of LSCV in case of bimodals having unequal covariance matrices
![Page 63: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/63.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Results The choice of the univariate bandwidth parameter (method 2) is not essential
Error rates for method 2
0,000
0,050
0,100
0,150
0,200
0,250
0,300
0,000 0,050 0,100 0,150 0,200 0,250 0,300
"Normal rule"
She
athe
r-Jo
nes
sele
ctor
![Page 64: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/64.jpg)
Simulation Study
Theory
Application Aspects
Summary
Introduction
Results The best trade-off is a projection onto 2-3 dimensions
Error rate regarding different subspaces
0,000
0,050
0,100
0,150
0,200
0,250
0,300
0,350
2 3 4 5
# dimensions
NN-distributions
SkN-distributions
Bi-distributions
![Page 65: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/65.jpg)
Results
Equal covariance matrices: Method 1 performs inferior against LDA
0,0000,0500,1000,1500,2000,2500,3000,3500,400
Err
or ra
te
LDA (classical)
LSCV(3) - method1
Equal covariance matrices: Method 2 sometimes slightly improves
0,000
0,050
0,100
0,150
0,200
0,250
0,300N
N11
NN
21
NN
31
SkN
11
SkN
21
SkN
31
Bi1
1
Bi2
1
Bi3
1
Bi4
1
Err
or ra
te
LDA (classical)
Normal rule (in method 2)
![Page 66: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/66.jpg)
Results
Unqual covariance matrices: Method 2 performs quite poor, but
not for skewed distributions
0,000
0,050
0,100
0,150
0,200
0,250
Err
or ra
te QDA (classical)
LSCV(3) - method1
Unequal covariance matrices: Method 2 often improves essentially
0,000
0,050
0,100
0,150
0,200
0,250
Err
or ra
te QDA (classical)
Normal rule (inmethod 2)
![Page 67: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/67.jpg)
Is the additional calculation time justified?
Results
Simulation Study
Theory
Application Aspects
Summary
Introduction
Required calculation time
LDA,QDA multivariate "normal rule" Preliminary univariatenormalizations,LSCV,
Sheather-Jones plug-in
![Page 68: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/68.jpg)
Summary
![Page 69: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/69.jpg)
Summary (1/3) – Classification Performance
Restriction to only a few dimensions Improvements with respect to the classical discrimination
methods by marginal normalizations (especially for unequal covariance matrices)
Poor performance of the multivariate kernel density classificator
LDA is undisputed in the case of equal covariance matrices and equal prior probabilities
Additional computation time seems not to be justified
![Page 70: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/70.jpg)
Summary (1/3) – Classification Performance
Restriction to only a few dimensions Improvements with respect to the classical discrimination
methods by marginal normalizations (especially for unequal covariance matrices)
Poor performance of the multivariate kernel density classificator
LDA is undisputed in the case of equal covariance matrices and equal prior probabilities
Additional computation time seems not to be justified
![Page 71: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/71.jpg)
Summary (1/3) – Classification Performance
Restriction to only a few dimensions Improvements with respect to the classical discrimination
methods by marginal normalizations (especially for unequal covariance matrices)
Poor performance of the multivariate kernel density classificator
LDA is undisputed in the case of equal covariance matrices and equal prior probabilities
Additional computation time seems not to be justified
![Page 72: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/72.jpg)
Summary (1/3) – Classification Performance
Restriction to only a few dimensions Improvements with respect to the classical discrimination
methods by marginal normalizations (especially for unequal covariance matrices)
Poor performance of the multivariate kernel density classificator
LDA is undisputed in the case of equal covariance matrices and equal prior probabilities
Additional computation time seems not to be justified
![Page 73: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/73.jpg)
Summary (1/3) – Classification Performance
Restriction to only a few dimensions Improvements with respect to the classical discrimination
methods by marginal normalizations (especially for unequal covariance matrices)
Poor performance of the multivariate kernel density classificator
LDA is undisputed in the case of equal covariance matrices and equal prior probabilities
Additional computation time seems not to be justified
![Page 74: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/74.jpg)
Summary (2/3) – KDE for Data Description
Great variety in error criteria, parameter selection procedures and additional model improvements (3 dimensions)
No correspondence about a feasible error criterion Nobody knows, what is finally optimized („upper bounds“
in L1-theory, L2-theory: ISEMISEAMISE,several minima in LSCV,...)
Different parameter selectors are of varying quality with respect to different underlying densities
![Page 75: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/75.jpg)
Summary (2/3) – KDE for Data Description
Great variety in error criteria, parameter selection procedures and additional model improvements (3 dimensions)
No correspondence about a feasible error criterion Nobody knows, what is finally optimized („upper bounds“
in L1-theory, L2-theory: ISEMISEAMISE,several minima in LSCV,...)
Different parameter selectors are of varying quality with respect to different underlying densities
![Page 76: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/76.jpg)
Summary (2/3) – KDE for Data Description
Great variety in error criteria, parameter selection procedures and additional model improvements (3 dimensions)
No correspondence about a feasible error criterion Nobody knows, what is finally optimized („upper bounds“
in L1-theory, L2-theory: ISEMISEAMISE,several minima in LSCV,...)
Different parameter selectors are of varying quality with respect to different underlying densities
![Page 77: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/77.jpg)
Summary (2/3) – KDE for Data Description
Great variety in error criteria, parameter selection procedures and additional model improvements (3 dimensions)
No correspondence about a feasible error criterion Nobody knows, what is finally optimized („upper bounds“
in L1-theory, L2-theory: ISEMISEAMISE,several minima in LSCV,...)
Different parameter selectors are of varying quality with respect to different underlying densities
![Page 78: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/78.jpg)
Summary (3/3) – Theory vs. Application
Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification
For discrimatory purposes the issue of estimating log-densities is much more important
Some univariate model improvements are not generalizable
The – widely ignored – „curse of dimensionality“ forces the user to achieve a trade-off between necessary dimension reduction and information loss
Dilemma: Much data is required for accurate estimates – Much data lead to a explosion of the computation time
![Page 79: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/79.jpg)
Summary (3/3) – Theory vs. Application
Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification
For discrimatory purposes the issue of estimating log-densities is much more important
Some univariate model improvements are not generalizable
The – widely ignored – „curse of dimensionality“ forces the user to achieve a trade-off between necessary dimension reduction and information loss
Dilemma: Much data is required for accurate estimates – Much data lead to a explosion of the computation time
![Page 80: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/80.jpg)
Summary (3/3) – Theory vs. Application
Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification
For discrimatory purposes the issue of estimating log-densities is much more important
Some univariate model improvements are not generalizable
The – widely ignored – „curse of dimensionality“ forces the user to achieve a trade-off between necessary dimension reduction and information loss
Dilemma: Much data is required for accurate estimates – Much data lead to a explosion of the computation time
![Page 81: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/81.jpg)
Summary (3/3) – Theory vs. Application
Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification
For discrimatory purposes the issue of estimating log-densities is much more important
Some univariate model improvements are not generalizable
The – widely ignored – „curse of dimensionality“ forces the user to achieve a trade-off between necessary dimension reduction and information loss
Dilemma: Much data is required for accurate estimates – Much data lead to a explosion of the computation time
![Page 82: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/82.jpg)
Summary (3/3) – Theory vs. Application
Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification
For discrimatory purposes the issue of estimating log-densities is much more important
Some univariate model improvements are not generalizable
The – widely ignored – „curse of dimensionality“ forces the user to achieve a trade-off between necessary dimension reduction and information loss
Dilemma: Much data is required for accurate estimates – Much data lead to a explosion of the computation time
![Page 83: Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien](https://reader031.vdocument.in/reader031/viewer/2022032310/56649d795503460f94a5c261/html5/thumbnails/83.jpg)
The End