ceps/instead, g.-d. luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) variable bandwidth...
TRANSCRIPT
Adaptive kernel density estimation in Stata
Philippe Van KermCEPS/INSTEAD, G.-D. Luxembourg
9th London Stata User Group Meeting, 19-20 May 2003
Kernel density function estimation
• Official command: kdensity
f̂f (x) =1n
n∑i=1
1h
K
(x− xi
h
)(1)
– ‘Point mass’ of sample data diffused around xi’s, and averaged at x
Adaptive kernel density estimation May 17, 2003
Kernel density function estimation
• Official command: kdensity
f̂f (x) =1n
n∑i=1
1h
K
(x− xi
h
)(1)
– ‘Point mass’ of sample data diffused around xi’s, and averaged at x
– Fixed/constant/global bandwidth h: ‘degree of diffusion’ constant for allxi’s
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
– Degree of diffusion varies inversely with f(xi)
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
– Degree of diffusion varies inversely with f(xi)
– Greater precision where data are abundant and ...
– ... greater smoothness where data are sparse
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
– Degree of diffusion varies inversely with f(xi)
– Greater precision where data are abundant and ...
– ... greater smoothness where data are sparse
• Adaptive two-stage estimator (Abramson 1982):
– hi = h× λi;
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
– Degree of diffusion varies inversely with f(xi)
– Greater precision where data are abundant and ...
– ... greater smoothness where data are sparse
• Adaptive two-stage estimator (Abramson 1982):
– hi = h× λi; λi =(G/f̃(xi)
)0.5
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
– Degree of diffusion varies inversely with f(xi)
– Greater precision where data are abundant and ...
– ... greater smoothness where data are sparse
• Adaptive two-stage estimator (Abramson 1982):
– hi = h× λi; λi =(G/f̃(xi)
)0.5
– First step: compute pilot estimate (fixed bandwidth h) to generate λi
Adaptive kernel density estimation May 17, 2003
Adaptive kernel density function estimation
• akdensity
f̂v(x) =1n
n∑i=1
1hi
K
(x− xi
hi
)(2)
– Different bandwidths for different xi’s
– Degree of diffusion varies inversely with f(xi)
– Greater precision where data are abundant and ...
– ... greater smoothness where data are sparse
• Adaptive two-stage estimator (Abramson 1982):
– hi = h× λi; λi =(G/f̃(xi)
)0.5
– First step: compute pilot estimate (fixed bandwidth h) to generate λi
– Second step: compute density estimate with hi local bandwidths
Adaptive kernel density estimation May 17, 2003
Variability bands as a bonus
• akdensity estimates variability bands:
– f̂v(x)± b× SE(x)
Adaptive kernel density estimation May 17, 2003
Variability bands as a bonus
• akdensity estimates variability bands:
– f̂v(x)± b× SE(x)
– Pointwise variability bands
Adaptive kernel density estimation May 17, 2003
Variability bands as a bonus
• akdensity estimates variability bands:
– f̂v(x)± b× SE(x)
– Pointwise variability bands
– Not confidence intervals (accounts for sample variability but not bias!)
Adaptive kernel density estimation May 17, 2003
Variability bands as a bonus
• akdensity estimates variability bands:
– f̂v(x)± b× SE(x)
– Pointwise variability bands
– Not confidence intervals (accounts for sample variability but not bias!)
• Also for fixed bandwidth kernel estimates!
Adaptive kernel density estimation May 17, 2003
Syntax extract
• ‘High-level’ command (mimicks kdensity ):
akdensity varname ...[
, noadap tive width( #)[epan | gau ss
]stdb ands( #) ...
]
Adaptive kernel density estimation May 17, 2003
Syntax extract
• ‘High-level’ command (mimicks kdensity ):
akdensity varname ...[
, noadap tive width( #)[epan | gau ss
]stdb ands( #) ...
]
• ‘Low-level’ command (rarely used, but full control):
akdensity0 varname ... , width( # | varname) ...[
lambda( string) ...]
Adaptive kernel density estimation May 17, 2003
A simulated example
0.0
5.1
.15
dens
ity
0 10 20 30 40
True underlying density
Adaptive kernel density estimation May 17, 2003
0.0
5.1
.15
5 15 25 35
h = 1.11 (default)
FIXED bandwidth estimates
Adaptive kernel density estimation May 17, 2003
0.0
5.1
.15
5 15 25 35
h = 0.83
FIXED bandwidth estimates
Adaptive kernel density estimation May 17, 2003
0.0
5.1
.15
5 15 25 35
h = 0.55
FIXED bandwidth estimates
Adaptive kernel density estimation May 17, 2003
−.05
0.0
5.1
.15
.2
5 15 25 35
h = 1.11 (default)
VARIABLE bandwidth estimates
Adaptive kernel density estimation May 17, 2003
−.05
0.0
5.1
.15
.2
5 15 25 35
h = 0.83
VARIABLE bandwidth estimates
Adaptive kernel density estimation May 17, 2003
−.05
0.0
5.1
.15
.2
5 15 25 35
h = 0.55
VARIABLE bandwidth estimates
Adaptive kernel density estimation May 17, 2003
−.05
0.0
5.1
.15
.2
5 15 25 35
h = 1.11 (default) h = 0.83h = 0.55
FIXED bandwidth estimates
−.05
0.0
5.1
.15
.2
5 15 25 35
h = 1.11 (default) h = 0.83h = 0.55
VARIABLE bandwidth estimates
Adaptive kernel density estimation May 17, 2003
0.0
5.1
.15
.2
Variability−bias tradeoff−.
04−.
020
.02
.04
Poi
ntw
ise
std.
err
or.
5 15 25 35
Adaptive kernel density estimation May 17, 2003
A real example
KRTKPTMXGRJSPIR ITAFIBFISNONLUKSCAAUSDKNZLUS CH
1960
KRTK PTMX GR JSPIR ITAFIBFISNO NLUK SCAAUSDKNZ L US CH
1970
KRTK PTMX GR JSPIR ITAFIBF ISNONLUK S CAAUSDKNZ L USCH
1980
0 5000 10000 15000 20000 25000 30000
KRTK PTMX GR JSPIR ITAFIBF ISNONLUK S CAAUSDKNZ LUSCH
1990
0 5000 10000 15000 20000 25000 30000
Real GDP per capita − FIXED and VARIABLE bandwidths
Adaptive kernel density estimation May 17, 2003
KRTK PTMX GR JSPIR ITAFI BFISNO NLUK SCAAUS DKNZ L US CH
1970
Adaptive kernel density estimation May 17, 2003
KRTK PTMX GR JSPIR ITAFI BF ISNONLUK S CAAUSDKNZ L USCH
1980
0 5000 10000 15000 20000 25000 30000
Adaptive kernel density estimation May 17, 2003
Final remarks
• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)
Adaptive kernel density estimation May 17, 2003
Final remarks
• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)
• Works under both version 7 and version 8:
Adaptive kernel density estimation May 17, 2003
Final remarks
• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)
• Works under both version 7 and version 8:
– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7
Adaptive kernel density estimation May 17, 2003
Final remarks
• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)
• Works under both version 7 and version 8:
– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7
– graphics options therefore vary (in Stata 8, option plot(...)
especially useful!)
Adaptive kernel density estimation May 17, 2003
Final remarks
• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)
• Works under both version 7 and version 8:
– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7
– graphics options therefore vary (in Stata 8, option plot(...)
especially useful!)
– coded using if caller()<8 statements
Adaptive kernel density estimation May 17, 2003
Final remarks
• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)
• Works under both version 7 and version 8:
– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7
– graphics options therefore vary (in Stata 8, option plot(...)
especially useful!)
– coded using if caller()<8 statements
• Available in next Stata Journal issue (vol.3(2))
Adaptive kernel density estimation May 17, 2003