a unifying perspective on smoothing, mixed models and ......a unifying perspective on smoothing,...

30
A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich joint work with Stefan Lang (University of Innsbruck) 19.7.2007

Upload: others

Post on 07-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

A unifying perspective on smoothing,mixed models and correlated data

Thomas Kneib

Faculty of Mathematics and Economics, University of Ulm

Department of Statistics, Ludwig-Maximilians-University Munich

joint work withStefan Lang (University of Innsbruck)

19.7.2007

Page 2: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib What is Correlation?

What is Correlation?

• Development economics is often faced with data evolving in both time and space.

• Statistical analyses have to take the special structure into account, i.e.

– account for spatio-temporal correlations,

– account for space- and time-varying effects,

– model unobserved heterogeneity due to spatial and temporal variation.

• Are these really different tasks or merely different phrases for the same goal?

A unifying perspective on smoothing, mixed models and correlated data 1

Page 3: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib What is Correlation?

• What is (positive) correlation?

⇒ Observations which are positively correlated behave ”similar”.0

.51

1.5

2

0 .2 .4 .6 .8 1

Autocorrelation: 0.45

−1−.

50

.51

1.5

−3 −1.5 0 1.5 3

Autocorrelation: 0.96

• Correlation is commonly assumed to be a stochastic phenomenon.

• The above data have been generated from deterministic models:

yt = t + εt yt = sin(t) + εt

A unifying perspective on smoothing, mixed models and correlated data 2

Page 4: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib What is Correlation?

• Temporal correlation is often (at least partly) attributable to a trend function.

• The trend itself is typically introduced by unobserved, temporally / spatially varyingcovariates.

• Usually the response is not influenced by time or space directly (no causal relationship).

A unifying perspective on smoothing, mixed models and correlated data 3

Page 5: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models I: Classical Perspective

Mixed Models I: Classical Perspective

• Longitudinal data: Repeated measurements

yit, i = 1, . . . , n, t = 1, . . . , T

on a fixed set of subjects i = 1, . . . , n at time points t = 1, . . . , T .

• Classical model for such data: Mixed effects / random effects models.

• Simplest example: Random intercepts

yit = x′itβ + bi + εit

where

bi i.i.d. N(0, τ 2),

εit i.i.d. N(0, σ2).

A unifying perspective on smoothing, mixed models and correlated data 4

Page 6: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models I: Classical Perspective

• Two sources of random variation: Variation on the subject level (bi) and variation onthe measurement level (εit).

• Rationale: The observations i are a random sample from the population of individuals.

• The random effects distribution bi i.i.d. N(0, τ 2) describes the distribution ofindividual-specific effects bi in this population.

• Corresponding density:

p(b) ∝ exp

(

−1

2τ2b′b

)

where b = (b1, . . . , bn)′.

A unifying perspective on smoothing, mixed models and correlated data 5

Page 7: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models I: Classical Perspective

• Estimation in mixed models is based on the joint likelihood

p(y, b) = p(y|b)p(b)

∝ exp

(

−1

2σ2(y − Xβ − Zb)′(y − Xβ − Zb)

)

exp

(

−1

2τ2b′b

)

→ maxβ,b

.

• Equivalently, we can consider the joint least-squares criterion

(y − Xβ − Zb)′(y − Xβ − Zb) +σ2

τ2b′b → min

β,b.

A unifying perspective on smoothing, mixed models and correlated data 6

Page 8: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models II: Marginal Perspective

Mixed Models II: Marginal Perspective

• Hierarchical formulation of mixed models:

yit|bi ∼ N(x′itβ + bi, σ

2)

bi ∼ N(0, τ2).

• What happens, if we marginalize with respect to the bi?

⇒ Correlation between observations on one individual are induced due to the sharedrandom effects bi.

• To be more specific: An equicorrelation model is obtained

Corr(yit1, yit2) =Var(bi)

Var(bi) + Var(εit)=

τ2

τ2 + σ2= ρ,

A unifying perspective on smoothing, mixed models and correlated data 7

Page 9: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models II: Marginal Perspective

• Marginal model in matrix notation:

yi ∼ N(Xiβ,Σi),

where

Σi = (σ2 + τ2)

1 ρ . . . . . . ρρ 1 ρ . . . ρ... . . . . . . . . . ...... . . . . . . ρρ . . . . . . ρ 1

.

A unifying perspective on smoothing, mixed models and correlated data 8

Page 10: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models III: Penalised Likelihood Perspective

Mixed Models III: Penalised Likelihood Perspective

• Start with the model equation

yit = x′itβ + bi + εit

without a distributional assumption for bi.

• The bi are individual-specific regression coefficients that shall capture effects ofunobserved, individual-specific covariates.

• The number of these effects is large

⇒ Add a ridge penalty to stabilise estimation.

A unifying perspective on smoothing, mixed models and correlated data 9

Page 11: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models III: Penalised Likelihood Perspective

• Instead of the least squares criterion

(y − Xβ − Zb)′(y − Xβ − Zb) → minβ,b

we minimise the penalised least squares criterion

(y − Xβ − Zb)′(y − Xβ − Zb) + λb′b → minβ,b

• The penalty shrinks parameters bi to zero, in particular if the database for individuali is small.

• The penalised least squares criterion is equivalent to the joint likelihood of the mixedmodel with

λ =σ2

τ2,

i.e. the error to signal ratio determines the strength of the penalisation.

A unifying perspective on smoothing, mixed models and correlated data 10

Page 12: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models IV: Bayesian Perspective

Mixed Models IV: Bayesian Perspective

• Bayesian view: The random effects distribution can be considered as a priordistribution that expresses our knowledge about the individual-specific effects.

• bi ∼ N(0, τ2) a priori implies that

– we expect the effects to be ”not too far” from zero,

– we expect the family of effects in the population to be Gaussian.

⇒ Qualitatively similar to the random effects view.

• No formal differentiation between fixed and random effects: Both are randomquantities but with different a priori knowledge.

p(β) ∝ const p(b) ∝ exp

(

−1

2τ2b′b

)

A unifying perspective on smoothing, mixed models and correlated data 11

Page 13: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models IV: Bayesian Perspective

• Estimation is based on the posterior

p(β, b|y) =p(y|β, b)p(β)p(b)

p(y)∝ p(y|β, b)p(b).

• The posterior mode coincides with the penalised least squares estimate.

A unifying perspective on smoothing, mixed models and correlated data 12

Page 14: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models V: Summary

Mixed Models V: Summary

• Four views on the modelyit = x′

itβ + bi + εit

for longitudinal data:

– Mixed model perspective: bi is a random effect from the population distribution.

– Marginal perspective: the bi induce equicorrelation.

– Penalised likelihood perspective: the bi are individual-specific regressioncoefficients.

– Bayesian perspective: the random effects distribution expresses a priori knowledge.

• Both the mixed model and the Bayesian perspective combine features of the twofurther perspectives.

• Different rationales but the same goal: Describe / analyse why observations of oneindividual behave more similar than randomly selected measurements.

A unifying perspective on smoothing, mixed models and correlated data 13

Page 15: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models V: Summary

• What do we gain by the different perspectives:

– Different estimation schemes have been developed by the different statisticalcommunities.

– Additional insight in more complicated types of models, e.g. concerningidentifiability problems when modelling both trend functions and correlation.

A unifying perspective on smoothing, mixed models and correlated data 14

Page 16: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Mixed Models VI: Extensions

Mixed Models VI: Extensions

• Similar considerations can be made for extended models such as

– Models with random slopes:

yit = x′itβ + z′itbi + εit.

– Nested multi-level models

yijt = x′ijtβ + bi + bij + εijt.

– Non-Nested multi-level models

yijt = x′ijtβ + bi + bj + εijt.

A unifying perspective on smoothing, mixed models and correlated data 15

Page 17: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Smoothing and Mixed Models

Smoothing and Mixed Models

• Consider trend estimation in the simple model

yt = ftrend(t) + εt, εt i.i.d. N(0, σ2).

• Model the trend function as a polynomial spline (in truncated line representation):

ftrend(t) = β0 + β1t + b1(t − κ1)+ + . . . + bd(t − κd)+.

⇒ Piecewise linear function estimate with changing slopes at the knots κj.

• In matrix notationy = Xβ + Zb + ε.

A unifying perspective on smoothing, mixed models and correlated data 16

Page 18: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Smoothing and Mixed Models

−2−1

01

23

0 .2 .4 .6 .8 1

(a) Basis functions

−2−1

01

23

0 .2 .4 .6 .8 1

(b) Scaled basis functions

−2−1

01

23

0 .2 .4 .6 .8 1

(c) Resulting model fit

A unifying perspective on smoothing, mixed models and correlated data 17

Page 19: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Smoothing and Mixed Models

• To avoid overfitting, introduce a penalty term for the truncated polynomials:

λ

d∑

j=1

b2j = λb′b.

⇒ Variability of the function estimate is controlled by the smoothing parameter λ.

• λ large ⇒ f̂(x) approaches a linear function.

• λ small ⇒ f̂(x) becomes a very wiggly estimate.

A unifying perspective on smoothing, mixed models and correlated data 18

Page 20: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Smoothing and Mixed Models

• Estimate the parameters of the trend function by minimising the penalised leastsquares criterion

(y − Xβ − Zb)′(y − Xβ − Zb) + λb′b → minβ,b

with smoothing parameter λ.

• This is the same objective function as for a mixed model

y = Xβ + Zb + ε

with distributional assumptions

[

εb

]

∼ N

([

00

]

,

[

σ2I 00 τ2I

])

where λ = σ2/τ2.

⇒ The smoothing approach for trend estimation can be considered a mixed modelwith very specific structure.

A unifying perspective on smoothing, mixed models and correlated data 19

Page 21: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Smoothing and Mixed Models

• Consequences:

– Mixed model methodology can be used to estimate the smoothing parameter λ(the ratio of error variance and random effects variance).

– Conditionally on b we are modelling a trend function but marginally the modelimplies correlation of the response.

⇒ Simultaneous modelling of trend functions and correlated errors may causeidentifiability problems.

– All four perspectives can be applied to the model, yielding for example a Bayesianinterpretation.

A unifying perspective on smoothing, mixed models and correlated data 20

Page 22: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Autoregressive Processes as Smoothers

Autoregressive Processes as Smoothers

• Consider the modelyit = x′

itβ + bt + εit

where εit i.i.d. N(0, σ2) and bt follows an autoregressive process of order 1 (AR(1))

bt = αbt−1 + ut, ut ∼ N(0, τ2).

• Note: bt is now a temporally correlated effect, not an individual-specific effect.

A unifying perspective on smoothing, mixed models and correlated data 21

Page 23: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Autoregressive Processes as Smoothers

• Correlation function of the autoregressive process (with parameter α):

ρ(bt, bs) = α|t−s|.

• This is a correlation function in discrete time. The continuous time analogue is theexponential correlation function

ρ(bt, bs) = exp

(

−|t − s|

φ

)

, α = exp

(

−1

φ

)

• It can be shown that the temporally correlated effect can be rewritten as

bt = f(t) =

T∑

s=1

ρ(bt, bs)γt.

⇒ The AR(1) assumption is equivalent to a basis function approach.

A unifying perspective on smoothing, mixed models and correlated data 22

Page 24: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Autoregressive Processes as Smoothers

−2−1

01

23

0 .2 .4 .6 .8 1

(a) Basis functions

−4−2

02

46

8

0 .2 .4 .6 .8 1

(b) Scaled basis functions

−2−1

01

23

0 .2 .4 .6 .8 1

(c) Resulting model fit

A unifying perspective on smoothing, mixed models and correlated data 23

Page 25: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Autoregressive Processes as Smoothers

• Consequences:

– The AR(1) correlation function can be interpreted as a (radial) basis function.

– A similar relation holds for stochastic processes with different types of correlationfunctions.

– The autoregressive process assumption turns into a penalty for the parametervector γt.

– The result can be immediately extended to spatial models with spatiallyautoregressive errors and spatial trend functions.

– The larger the autoregressive parameter, the smoother the basis function.

– Identifiability problems when including both a highly correlated autoregressive errorand a flexible trend function.

A unifying perspective on smoothing, mixed models and correlated data 24

Page 26: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Autoregressive Processes as Smoothers

−4−2

02

4

0 5 10 15 20 25

(a) data

−4−2

02

4

0 5 10 15 20 25

(b) trend function

−4−2

02

4

0 5 10 15 20 25

(c) autoregressive error

A unifying perspective on smoothing, mixed models and correlated data 25

Page 27: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib A Unifying Framework

A Unifying Framework

• Structured additive regression:

– Combines nonparametric regression, spatial regression, random effects, etc.

– General model equation:

y = f1(z1) + . . . + fr(zr) + x′β.

– Examples:

f(z) = f(x) z = x smooth function of a continuouscovariate x,

f(z) = fspat(s) z = s spatial effect,

f(z) = f(x1, x2) z = (x1, x2) interaction surface,

f(z) = bg z = g i.i.d. frailty bg, g is a groupingindex.

– Can be extended to non-Gaussian responses.

A unifying perspective on smoothing, mixed models and correlated data 26

Page 28: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib A Unifying Framework

• Generic representation of the different effect types:

– Vectors of function evaluations:

fj = Zjγj

– Prior distribution / random effects distribution / penalty term:

p(γ) ∝ exp

(

−1

2τ2γ′Kjγ

)

, Pen(γ) = λγ′Kjγ.

A unifying perspective on smoothing, mixed models and correlated data 27

Page 29: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib A Unifying Framework

• Four different perspectives:

– Penalised likelihood setting:

y − Xβ −

r∑

j=1

Zjγj

y − Xβ −

r∑

j=1

Zjγj

+

r∑

j=1

λjγ′jKjγj → min

β,γ1,...,γr

– Mixed model perspective: The γj are correlated random effects. Estimation isbased on the joint likelihood

p(y|γ1, . . . , γr)p(γ1, . . . , γr) → maxβ,γ1,...,γr

– Bayesian view: The mixed model distribution defines a prior for γj.

– Marginal view: After integrating out the random effects γj, we obtain a marginalmodel

y ∼ N(Xβ, V ),

where V is a covariance matrix with correlations induced by the random effects.

A unifying perspective on smoothing, mixed models and correlated data 28

Page 30: A unifying perspective on smoothing, mixed models and ......A unifying perspective on smoothing, mixed models and correlated data Thomas Kneib Faculty of Mathematics and Economics,

Thomas Kneib Conclusions

Conclusions

• Four different perspectives on semiparametric regression.

• Though looking different at first sight, there is a close connection between all them.

• In particular, semiparametric smoothing and modelling of correlations are relatedtasks.

• Identifiability problems can be encountered when flexibly modelling correlations andtemporal / spatial trend functions.

• The different perspectives allow to derive different estimation techniques.

A unifying perspective on smoothing, mixed models and correlated data 29