covariance estimation with cholesky decomposition and ...bchang/gmrg/files/bo_05222015.pdf ·...
TRANSCRIPT
Covariance estimation with Cholesky decomposition andgeneralized linear model
Bo Chang
Graphical Models Reading Group
May 22, 2015
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 1 / 21
Modified Cholesky decomposition
Goal: Find a re-parameterization of a covariance matrix that isunconstrained and statistically interpretable.
Assume Y = (Y1, . . . ,Yp)′ is an ordered (time-ordered) randomvector with mean 0 and covariance matrix Σ.
Yt =t−1∑j=1
φt,jYj + εt .
Let σ2t = Var(εt) and
Cov(ε) = diag(σ21, . . . , σ2p) = D.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 2 / 21
Modified Cholesky decomposition
Rearranging
Yt =t−1∑j=1
φt,jYj + εt ,
we have TY = ε, where
T =
1−φ2,1 1−φ3,1 −φ3,2 1
......
. . .
−φp,1 −φp,2 · · · −φp,p−1 1
.
Cov(TY ) = Cov(ε) = TΣT′ = D.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 3 / 21
Modified Cholesky decomposition
Definition: For a positive-definite covariance matrix Σ, its modifiedCholesky decomposition is
TΣT′ = D,
where T is a unique unit lower-triangular matrix having ones on itsdiagonal and D is a unique diagonal matrix.
Precision matrix can be written as
Σ−1 = T′D−1T.
T is unconstrained and statistically meaningful.
T and D can be fitted by regressing a variable Yt on its predecessors.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 4 / 21
Sparse estimation
k-banding:
AR(k) model.
Yt =k∑
i=1
φt,t−iYt−i + εt
The resulting estimate of the precision matrix is also k-banded.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 5 / 21
Sparse estimation
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 6 / 21
Sparse estimation
k-banding:
Nonparametric estimation: Wu and Pourahmadi (2003) used localpolynomial estimators to smooth the subdiagonals of T.
k∑j=0
fj ,p(t/p)Yt−j = σp(t/p)εt ,
where f0,p(·) = 1, fj ,p(·) and σp(·) are continuous functions on [0, 1].εt are independent with mean 0 and variance 1.
φt,t−j = fj ,p(t/p), σt = σp(t/p).
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 7 / 21
Sparse estimation
Lasso penalty: Huang et al. (2006)
Minimize
n log |Σ|+ ntr(D−1TST′) + λ
p∑t=2
t−1∑j=1
|φt,j |.
Zeros are placed in T with no regular patterns.
Sparsity of the precision matrix is not guaranteed.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 8 / 21
Sparse estimation
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 9 / 21
Sparse estimation
Nested lasso penalty / Adaptive banding: Levina et al. (2008)
Minimize
n log |Σ|+ ntr(D−1TST′) + λ
p∑t=2
P(φt),
P(φt) = |φt,t−1|+|φt,t−2||φt,t−1|
+ · · ·+ |φt,1||φt,2|
,
where 0/0 is defined to be zero.
Select the best model that regresses the jth variable on its k closestpredecessors, where k = kj is dependent on j .
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 10 / 21
Sparse estimation
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 11 / 21
Sparse estimation
Forward adaptive banding: Leng and Li. (2011)
Minimize modified BIC:
n log |Σ|+ ntr(D−1TST′) + Cn log(n)
p∑j=1
kj ,
s.t. kj ≤ min{n/(log n)2, j − 1},
where kj is the band length.
Fit AR(kj) to obtain T and D.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 12 / 21
Cholesky decomposition: summary
Cholesky decomposition is dependent on the order in which thevariables appear in the random vector Y .
It works when the variables have a natural ordering.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 13 / 21
GLM for covariance matrices
Another way to reduce number of covariance parameters is to usecovariates, as in modeling the mean vector.
Path of development: linear → log-linear → GLM.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 14 / 21
Linear covariance models
Linear covariance models (LCM):
Σ± = α1U1 + · · ·+ αqUq,
where Ui ’s are some known symmetric basis matrices (covariates) andαi ’s are unknown parameters.
For q = p2, any covariance matrix can be written as:
Σ = (σij) =
p∑i=1
p∑j=1
σijUij ,
where Uij is matrix with 1 on (i , j)th position and 0 elsewhere.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 15 / 21
Linear covariance models
MLE: the score equation of αi is
tr(Σ−1Ui )− tr(SΣ−1UiΣ−1) = 0,
which can be solved by an iterative method.
Constraint: αi ’s are restricted so that the matrix is positive definite.
Lack of interpretation.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 16 / 21
Log-linear covariance models
Log-linear covariance models:
log Σ = α1U1 + · · ·+ αqUq,
αi ’s are now unconstrained.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 17 / 21
GLM via Cholesky decomposition
Pourahmadi (1999):
Cholesky decomposition: Σ−1 = T′D−1T.
T and log D are unconstrained.
Parametric models for φt,j and log σ2t :
log σ2t = z ′tλ, φt,j = w ′t,jγ,
where zt and wt,j are q × 1 and d × 1 vectors of covariates, λ and γare parameters.
Common covariates are powers of times and lags
zt = (1, t, t2, . . . , tq−1)′,
wt,j = (1, t − j , (t − j)2, . . . , (t − j)d−1)′.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 18 / 21
GLM via Cholesky decomposition
Number of parameters: q + d .
Computing MLE is relatively simple:
−2l(λ, γ) = n log |D|+ ntr(D−1TST′).
Given D, the MLE of T has a closed form. Similarly, given T, theMLE of D has a closed form.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 19 / 21
References
Pourahmadi, M. (2011). Covariance estimation: The GLM and regularizationperspectives. Statistical Science, 26(3), 369-387.
Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: WithHigh-Dimensional Data. John Wiley & Sons.
Pourahmadi, M. (1999). Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterisation. Biometrika, 86(3), 677-690.
Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrixselection and estimation via penalised normal likelihood. Biometrika, 93(1), 85-98.
Leng, C., & Li, B. (2011). Forward adaptive banding for estimating largecovariance matrices. Biometrika, 98(4), 821-830.
Levina, E., Rothman, A., & Zhu, J. (2008). Sparse estimation of large covariancematrices via a nested Lasso penalty. The Annals of Applied Statistics, 2(1),245-263.
Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of largecovariance matrices of longitudinal data. Biometrika, 90(4), 831-844.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 20 / 21
The End
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 21 / 21