ceps/instead, g.-d. luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) variable bandwidth...

34
Adaptive kernel density estimation in Stata Philippe Van Kerm CEPS/INSTEAD, G.-D. Luxembourg 9th London Stata User Group Meeting, 19-20 May 2003

Upload: others

Post on 02-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density estimation in Stata

Philippe Van KermCEPS/INSTEAD, G.-D. Luxembourg

9th London Stata User Group Meeting, 19-20 May 2003

Page 2: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Kernel density function estimation

• Official command: kdensity

f̂f (x) =1n

n∑i=1

1h

K

(x− xi

h

)(1)

– ‘Point mass’ of sample data diffused around xi’s, and averaged at x

Adaptive kernel density estimation May 17, 2003

Page 3: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Kernel density function estimation

• Official command: kdensity

f̂f (x) =1n

n∑i=1

1h

K

(x− xi

h

)(1)

– ‘Point mass’ of sample data diffused around xi’s, and averaged at x

– Fixed/constant/global bandwidth h: ‘degree of diffusion’ constant for allxi’s

Adaptive kernel density estimation May 17, 2003

Page 4: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

Adaptive kernel density estimation May 17, 2003

Page 5: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

– Degree of diffusion varies inversely with f(xi)

Adaptive kernel density estimation May 17, 2003

Page 6: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

– Degree of diffusion varies inversely with f(xi)

– Greater precision where data are abundant and ...

– ... greater smoothness where data are sparse

Adaptive kernel density estimation May 17, 2003

Page 7: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

– Degree of diffusion varies inversely with f(xi)

– Greater precision where data are abundant and ...

– ... greater smoothness where data are sparse

• Adaptive two-stage estimator (Abramson 1982):

– hi = h× λi;

Adaptive kernel density estimation May 17, 2003

Page 8: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

– Degree of diffusion varies inversely with f(xi)

– Greater precision where data are abundant and ...

– ... greater smoothness where data are sparse

• Adaptive two-stage estimator (Abramson 1982):

– hi = h× λi; λi =(G/f̃(xi)

)0.5

Adaptive kernel density estimation May 17, 2003

Page 9: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

– Degree of diffusion varies inversely with f(xi)

– Greater precision where data are abundant and ...

– ... greater smoothness where data are sparse

• Adaptive two-stage estimator (Abramson 1982):

– hi = h× λi; λi =(G/f̃(xi)

)0.5

– First step: compute pilot estimate (fixed bandwidth h) to generate λi

Adaptive kernel density estimation May 17, 2003

Page 10: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Adaptive kernel density function estimation

• akdensity

f̂v(x) =1n

n∑i=1

1hi

K

(x− xi

hi

)(2)

– Different bandwidths for different xi’s

– Degree of diffusion varies inversely with f(xi)

– Greater precision where data are abundant and ...

– ... greater smoothness where data are sparse

• Adaptive two-stage estimator (Abramson 1982):

– hi = h× λi; λi =(G/f̃(xi)

)0.5

– First step: compute pilot estimate (fixed bandwidth h) to generate λi

– Second step: compute density estimate with hi local bandwidths

Adaptive kernel density estimation May 17, 2003

Page 11: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Variability bands as a bonus

• akdensity estimates variability bands:

– f̂v(x)± b× SE(x)

Adaptive kernel density estimation May 17, 2003

Page 12: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Variability bands as a bonus

• akdensity estimates variability bands:

– f̂v(x)± b× SE(x)

– Pointwise variability bands

Adaptive kernel density estimation May 17, 2003

Page 13: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Variability bands as a bonus

• akdensity estimates variability bands:

– f̂v(x)± b× SE(x)

– Pointwise variability bands

– Not confidence intervals (accounts for sample variability but not bias!)

Adaptive kernel density estimation May 17, 2003

Page 14: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Variability bands as a bonus

• akdensity estimates variability bands:

– f̂v(x)± b× SE(x)

– Pointwise variability bands

– Not confidence intervals (accounts for sample variability but not bias!)

• Also for fixed bandwidth kernel estimates!

Adaptive kernel density estimation May 17, 2003

Page 15: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Syntax extract

• ‘High-level’ command (mimicks kdensity ):

akdensity varname ...[

, noadap tive width( #)[epan | gau ss

]stdb ands( #) ...

]

Adaptive kernel density estimation May 17, 2003

Page 16: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Syntax extract

• ‘High-level’ command (mimicks kdensity ):

akdensity varname ...[

, noadap tive width( #)[epan | gau ss

]stdb ands( #) ...

]

• ‘Low-level’ command (rarely used, but full control):

akdensity0 varname ... , width( # | varname) ...[

lambda( string) ...]

Adaptive kernel density estimation May 17, 2003

Page 17: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

A simulated example

0.0

5.1

.15

dens

ity

0 10 20 30 40

True underlying density

Adaptive kernel density estimation May 17, 2003

Page 18: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

0.0

5.1

.15

5 15 25 35

h = 1.11 (default)

FIXED bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 19: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

0.0

5.1

.15

5 15 25 35

h = 0.83

FIXED bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 20: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

0.0

5.1

.15

5 15 25 35

h = 0.55

FIXED bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 21: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

−.05

0.0

5.1

.15

.2

5 15 25 35

h = 1.11 (default)

VARIABLE bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 22: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

−.05

0.0

5.1

.15

.2

5 15 25 35

h = 0.83

VARIABLE bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 23: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

−.05

0.0

5.1

.15

.2

5 15 25 35

h = 0.55

VARIABLE bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 24: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

−.05

0.0

5.1

.15

.2

5 15 25 35

h = 1.11 (default) h = 0.83h = 0.55

FIXED bandwidth estimates

−.05

0.0

5.1

.15

.2

5 15 25 35

h = 1.11 (default) h = 0.83h = 0.55

VARIABLE bandwidth estimates

Adaptive kernel density estimation May 17, 2003

Page 25: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

0.0

5.1

.15

.2

Variability−bias tradeoff−.

04−.

020

.02

.04

Poi

ntw

ise

std.

err

or.

5 15 25 35

Adaptive kernel density estimation May 17, 2003

Page 26: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

A real example

KRTKPTMXGRJSPIR ITAFIBFISNONLUKSCAAUSDKNZLUS CH

1960

KRTK PTMX GR JSPIR ITAFIBFISNO NLUK SCAAUSDKNZ L US CH

1970

KRTK PTMX GR JSPIR ITAFIBF ISNONLUK S CAAUSDKNZ L USCH

1980

0 5000 10000 15000 20000 25000 30000

KRTK PTMX GR JSPIR ITAFIBF ISNONLUK S CAAUSDKNZ LUSCH

1990

0 5000 10000 15000 20000 25000 30000

Real GDP per capita − FIXED and VARIABLE bandwidths

Adaptive kernel density estimation May 17, 2003

Page 27: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

KRTK PTMX GR JSPIR ITAFI BFISNO NLUK SCAAUS DKNZ L US CH

1970

Adaptive kernel density estimation May 17, 2003

Page 28: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

KRTK PTMX GR JSPIR ITAFI BF ISNONLUK S CAAUSDKNZ L USCH

1980

0 5000 10000 15000 20000 25000 30000

Adaptive kernel density estimation May 17, 2003

Page 29: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Final remarks

• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)

Adaptive kernel density estimation May 17, 2003

Page 30: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Final remarks

• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)

• Works under both version 7 and version 8:

Adaptive kernel density estimation May 17, 2003

Page 31: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Final remarks

• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)

• Works under both version 7 and version 8:

– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7

Adaptive kernel density estimation May 17, 2003

Page 32: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Final remarks

• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)

• Works under both version 7 and version 8:

– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7

– graphics options therefore vary (in Stata 8, option plot(...)

especially useful!)

Adaptive kernel density estimation May 17, 2003

Page 33: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Final remarks

• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)

• Works under both version 7 and version 8:

– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7

– graphics options therefore vary (in Stata 8, option plot(...)

especially useful!)

– coded using if caller()<8 statements

Adaptive kernel density estimation May 17, 2003

Page 34: CEPS/INSTEAD, G.-D. Luxembourg · .1.15.2 5 15 25 35 h = 1.11 (default) VARIABLE bandwidth estimates Adaptive kernel density estimation May 17, 2003-.05 0.05.1.15.2 5 15 25 35 h =

Final remarks

• Fast implementation for large datasets (use of linear interpolation for thepilot estimate)

• Works under both version 7 and version 8:

– use new graphics engine if called from Stata 8, and former engine ifcalled from Stata 7

– graphics options therefore vary (in Stata 8, option plot(...)

especially useful!)

– coded using if caller()<8 statements

• Available in next Stata Journal issue (vol.3(2))

Adaptive kernel density estimation May 17, 2003