covariance estimation with cholesky decomposition and ...bchang/gmrg/files/bo_05222015.pdf ·...

Covariance estimation with Cholesky decomposition andgeneralized linear model

Bo Chang

Graphical Models Reading Group

May 22, 2015

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 1 / 21

Modified Cholesky decomposition

Goal: Find a re-parameterization of a covariance matrix that isunconstrained and statistically interpretable.

Assume Y = (Y1, . . . ,Yp)′ is an ordered (time-ordered) randomvector with mean 0 and covariance matrix Σ.

Yt =t−1∑j=1

φt,jYj + εt .

Let σ2t = Var(εt) and

Cov(ε) = diag(σ21, . . . , σ2p) = D.



Rearranging

Yt =t−1∑j=1

φt,jYj + εt ,

we have TY = ε, where

T =

1−φ2,1 1−φ3,1 −φ3,2 1

......

. . .

−φp,1 −φp,2 · · · −φp,p−1 1

.

Cov(TY ) = Cov(ε) = TΣT′ = D.



Definition: For a positive-definite covariance matrix Σ, its modifiedCholesky decomposition is

TΣT′ = D,

where T is a unique unit lower-triangular matrix having ones on itsdiagonal and D is a unique diagonal matrix.

Precision matrix can be written as

Σ−1 = T′D−1T.

T is unconstrained and statistically meaningful.

T and D can be fitted by regressing a variable Yt on its predecessors.


Sparse estimation

k-banding:

AR(k) model.

Yt =k∑

i=1

φt,t−iYt−i + εt

The resulting estimate of the precision matrix is also k-banded.


Sparse estimation


Sparse estimation

k-banding:

Nonparametric estimation: Wu and Pourahmadi (2003) used localpolynomial estimators to smooth the subdiagonals of T.

k∑j=0

fj ,p(t/p)Yt−j = σp(t/p)εt ,

where f0,p(·) = 1, fj ,p(·) and σp(·) are continuous functions on [0, 1].εt are independent with mean 0 and variance 1.

φt,t−j = fj ,p(t/p), σt = σp(t/p).


Sparse estimation

Lasso penalty: Huang et al. (2006)

Minimize

n log |Σ|+ ntr(D−1TST′) + λ

p∑t=2

t−1∑j=1

|φt,j |.

Zeros are placed in T with no regular patterns.

Sparsity of the precision matrix is not guaranteed.


Sparse estimation


Sparse estimation

Nested lasso penalty / Adaptive banding: Levina et al. (2008)

Minimize

n log |Σ|+ ntr(D−1TST′) + λ

p∑t=2

P(φt),

P(φt) = |φt,t−1|+|φt,t−2||φt,t−1|

+ · · ·+ |φt,1||φt,2|

,

where 0/0 is defined to be zero.

Select the best model that regresses the jth variable on its k closestpredecessors, where k = kj is dependent on j .


Sparse estimation


Sparse estimation

Forward adaptive banding: Leng and Li. (2011)

Minimize modified BIC:

n log |Σ|+ ntr(D−1TST′) + Cn log(n)

p∑j=1

kj ,

s.t. kj ≤ min{n/(log n)2, j − 1},

where kj is the band length.

Fit AR(kj) to obtain T and D.


Cholesky decomposition: summary

Cholesky decomposition is dependent on the order in which thevariables appear in the random vector Y .

It works when the variables have a natural ordering.


GLM for covariance matrices

Another way to reduce number of covariance parameters is to usecovariates, as in modeling the mean vector.

Path of development: linear → log-linear → GLM.


Linear covariance models

Linear covariance models (LCM):

Σ± = α1U1 + · · ·+ αqUq,

where Ui ’s are some known symmetric basis matrices (covariates) andαi ’s are unknown parameters.

For q = p2, any covariance matrix can be written as:

Σ = (σij) =

p∑i=1

p∑j=1

σijUij ,

where Uij is matrix with 1 on (i , j)th position and 0 elsewhere.


Linear covariance models

MLE: the score equation of αi is

tr(Σ−1Ui )− tr(SΣ−1UiΣ−1) = 0,

which can be solved by an iterative method.

Constraint: αi ’s are restricted so that the matrix is positive definite.

Lack of interpretation.


Log-linear covariance models

Log-linear covariance models:

log Σ = α1U1 + · · ·+ αqUq,

αi ’s are now unconstrained.


GLM via Cholesky decomposition

Pourahmadi (1999):

Cholesky decomposition: Σ−1 = T′D−1T.

T and log D are unconstrained.

Parametric models for φt,j and log σ2t :

log σ2t = z ′tλ, φt,j = w ′t,jγ,

where zt and wt,j are q × 1 and d × 1 vectors of covariates, λ and γare parameters.

Common covariates are powers of times and lags

zt = (1, t, t2, . . . , tq−1)′,

wt,j = (1, t − j , (t − j)2, . . . , (t − j)d−1)′.


GLM via Cholesky decomposition

Number of parameters: q + d .

Computing MLE is relatively simple:

−2l(λ, γ) = n log |D|+ ntr(D−1TST′).

Given D, the MLE of T has a closed form. Similarly, given T, theMLE of D has a closed form.


References

Pourahmadi, M. (2011). Covariance estimation: The GLM and regularizationperspectives. Statistical Science, 26(3), 369-387.

Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: WithHigh-Dimensional Data. John Wiley & Sons.

Pourahmadi, M. (1999). Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterisation. Biometrika, 86(3), 677-690.

Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrixselection and estimation via penalised normal likelihood. Biometrika, 93(1), 85-98.

Leng, C., & Li, B. (2011). Forward adaptive banding for estimating largecovariance matrices. Biometrika, 98(4), 821-830.

Levina, E., Rothman, A., & Zhu, J. (2008). Sparse estimation of large covariancematrices via a nested Lasso penalty. The Annals of Applied Statistics, 2(1),245-263.

Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of largecovariance matrices of longitudinal data. Biometrika, 90(4), 831-844.


The End


covariance estimation with cholesky decomposition and ...bchang/gmrg/files/bo_05222015.pdf ·...

Documents