hierarchical matrix approximation of large covariance matrices

Center for UncertaintyQuantification


Center for Uncertainty Quantification Logo Lock-up

Hierarchical matrix approximation of

large covariance matricesA. Litvinenko1, M. Genton2, Ying Sun2, R. Tempone

1SRI-UQ Center and 2Spatio-Temporal Statistics & Data Analysis Groupat KAUST

[email protected]



Center for Uncertainty Quantification Logo Lock-up

Abstract

We approximate large non-structured covariance ma-trices in the H-matrix format with a log-linear com-putational cost and storage O(n log n). We computeinverse, Cholesky decomposition and determinant inH-format. As an example we consider the class ofMatern covariance functions, which are very popu-lar in spatial statistics, geostatistics, machine learningand image analysis. Applications are: kriging and op-timal design

1. Matern covariance

C(x, y) = C(|x−y|) = σ2 1

Γ(ν)2ν−1

(√2νr

L

)νKν

(√2νr

L

),

where Γ is the gamma function, Kν is the modifiedBessel function of the second kind, r = |x − y| andν, L are non-negative parameters of the covariance.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

Matern covariance (nu=1)

σ=0.5, l=0.5

σ=0.5, l=0.3

σ=0.5, l=0.2

σ=0.5, l=0.1

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

nu=0.15

nu=0.3

nu=0.5

nu=1

nu=2

nu=30

As ν →∞ [4],

C(r) = σ2 exp(−r2/2L2).

When ν = 0.5, the Matern covariance is identical tothe exponential covariance function.

Cν=3/2(r) =

(1 +

√3r

L

)exp

(−√

3r

L

)

Cν=5/2(r) =

(1 +

√5r

L+

5r2

3L2

)exp

(−√

5r

L

).

Note: no need to assume neither C(x, y) = C(|x− y|) nor tensorgrid.

2. H-matrix approximation

25 20

20 20

20 16

20 16

20 20

16 16

20 16

16 16

19 20

20 19 32

19 19

16 16 32

19 20

20 19

19 16

19 16

32 32

20 20

20 20 32

32 32

20 19

19 19 32

20 19

16 16 32

32 20

32 32

20 32

32 32

32 20

32 32

20 19

19 19

20 16

19 16

32 32

20 32

32 32

32 32

20 32

32 32

20 32

20 20

20 20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

20 2020 19

20 20 32

32 32 20

20 20

32 20

32 32 20

20 20

32 32

20 32 20

20 20

32 32

32 32 20

20

20 20

19 20 32

32 32

20 20

2032 20

32 32

20 20

2032 32

20 32

20 20

2032 32

32 32

20 20

20 20

20 20 32

32 32

20 19

20 19 32

32 32

20 20

19 19 32

32 32

20 20

20 20 32

32 32

32 20

32 32

32 20

32 32

32 20

32 32

32 20

32 32

32 32

20 32

32 32

20 32

32 32

20 32

32 32

20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

20 20

20 20 2019 20

20 20 32

32 32

32 20

32 32

20 32

32 32

32 20

32 32

20 20

20 20

20 20

20 20 20

20 20

32 20

32 32 20

20 20

20 20

20 20

20 20 20

2032 20

32 32

20 20

20 20

20 20

20 20

20 20 2032 20

32 32

32 20

32 32

32 20

32 32

32 20

32 32

20 20

20 20

20 20

20 20

19 20

20 20 32

32 32

20 32

32 32

32 32

20 32

32 32

20 32

2020 20

20 20

20 20

20 20

20 20

32 32

20 32 20

2020 20

20 20

20 20

20 20

2032 32

20 32

20 20

2020 20

20 20

20 20

20 20

32 32

20 32

32 32

20 32

32 32

20 32

32 32

20 32

2020 20

20 20

20 20

20 20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

19 19

20 20 32

32 32

32 20

32 32

20 32

32 32

32 20

32 32

19 20

19 20 32

32 32

20 32

32 32

32 32

20 32

32 32

20 32

20 20

20 20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

20 20

32 32

32 32 20

20 20

32 20

32 32 20

20 20

32 32

20 32 20

20 20

32 32

32 32 20

2032 32

32 32

20 20

2032 20

32 32

20 20

2032 32

20 32

20 20

2032 32

32 32

20 20

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 20

32 32

32 20

32 20

32 20

32 32

32 20

32 20

32 32

20 32

32 32

20 32

32 32

20 20

32 32

20 20

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

25 9

9 20 9

920 7

7 169

9

20 9

9 20 9

9 329

9

20 9

9 20 9

9 32 9

932 9

9 32

9

9

20 9

9 20 9

9 32 9

9

20 9

9 20 9

9 329

9

32 9

9 32 9

932 9

9 32

9

9

20 9

9 20 9

9 32 9

932 9

9 329

9

20 9

9 20 9

9 32 9

932 9

9 32

9

9

32 9

9 32 9

932 9

9 329

9

32 9

9 32 9

932 9

9 32

Figure 2: Two approximation strategies [1]: fixed rank (left) andflexible rank (right) approximations, C ∈ Rn×n, n = 652.

I

I

I I

I

I

I I I I1

1

2

2

11 12 21 22

I11

I12

I21

I22

QQ t

S

dist

H=

t

s

1. Build cluster tree TI, I = {1, 2, ..., n}2. Build block cluster tree TI×I3. For each (t× s) ∈ TI×I, t, s ∈ TI, check admissibility

condition min{diam(Qt), diam(Qs)} ≤ η ·dist(Qt, Qs).

if(adm=true) then M |t×s is a rank-k matrix blockif(adm=false) then divide M |t×s further or define as adense matrix block, if small enough.

Grid → cluster tree (TI) + admissibility condition →block cluster tree (TI×I)→H-matrix→H-matrix arith-metics.

Operation Sequential Complexity Parallel Complexity(Hackbusch et al. ’99-’06) (Kriemann ’05)

storage(M) N = O(kn log n) NP

Mx N = O(kn log n) NP

M1 ⊕M2 N = O(k2n log n) NP

M1 �M2, M−1 N = O(k2n log2 n) NP +O(n)

H-LU N = O(k2n log2 n) NP +O(k

2n log2 nn1/d

)

Table 1: Computational cost of H-matrix arithmetics, sequentialand parallel.

Let ε =‖(C−CH)z‖2‖C‖2‖z‖2 , where z is a random vector.

n rank k size, MB t, sec. ε maxi=1..10

|λi − λi|, i ε2

for C C C C C4.0 · 103 10 48 3 0.8 0.08 7 · 10−3 7.0 · 10−2, 9 2.0 · 10−4

1.05 · 104 18 439 19 7.0 0.4 7 · 10−4 5.5 · 10−2, 2 1.0 · 10−4

2.1 · 104 25 2054 64 45.0 1.4 1 · 10−5 5.0 · 10−2, 9 4.4 · 10−6

Table 2: Accuracy of the H-matrix approx. exp. covariance function, l1 = l3 =0.1, l2 = 0.5.

l1 l2 ε

0.01 0.02 3 · 10−2

0.1 0.2 8 · 10−3

0.5 1 2.8 · 10−5

Table 3: Dependence of the H-matrix accuracy on the covari-ance lengths l1 and l2, n = 1292. The smaller cov. length the lessaccurate is H-approximation.

0

100

200

300

0

50

100

150

200

250

300

−1

0

1

2

−1

−0.5

0

0.5

1

1.5

2

2.5

3

0

100

200

300

0

50

100

150

200

250

300

−1

0

1

2

−3

−2

−1

0

1

2

Figure 4: Two realizations of random field generated viaCholesky decomposition of Matern covariance matrix, ν = 0.4.

3. Kullback-Leibler divergence

Measure of the information lost when distribution Q is used toapproximate P .

DKL(P‖Q) =∑i

P (i) lnP (i)

Q(i), DKL(P‖Q) =

∫ ∞−∞

p(x) lnp(x)

q(x)dx,

where p, q densities of P and Q. For miltivariate normal distribu-tions (µ0,C) and (µ1,C

H):

2DKL(N0‖N1) =

(tr((CH)−1C) + (µ1 − µ0)T (CH)−1(µ1 − µ0)− n− ln

(detC

detCH

)).

0 10 20 30 40 50 60 70 80 90 100−16

−14

−12

−10

−8

−6

−4

−2

0

rank k

log(r

el.

err

or)

Spectral norm, L=0.1, nu=0.5

Frob. norm, L=0.1

Spectral norm, L=0.2

Frob. norm, L=0.2


Frob. norm, L=0.5

0 10 20 30 40 50 60 70 80 90 100−18

−16

−14

−12

−10

−8

−6

−4

−2

0

rank k

log(r

el.

err

or)

Spectral norm, L=0.1, ν=1.5

Frob. norm, L=0.1


Frob. norm, L=0.2


Frob. norm, L=0.5

Figure 5: RelativeH-matrix approx. error ‖C−CH‖2 for differentcov. lengths L = {0.1, 0.2, 0.5} and ν = {0.5, 1.5}

k KLD(C,CH) ‖C −CH‖2 ‖C(CH)−1 − I‖2L = 0.25 L = 0.75 L = 0.25 L = 0.75 L = 0.25 L = 0.75

5 0.51 2.3 4.0e-2 0.1 4.8 636 0.34 1.6 9.4e-3 0.02 3.4 228 5.3e-2 0.4 1.9e-3 0.003 1.2 810 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.112 5.0e-4 2e-2 9.7e-5 5.6e-5 1.6e-2 0.515 1.0e-5 9e-4 2.0e-5 1.1e-5 8.0e-4 0.0220 4.5e-7 4.8e-5 6.5e-7 2.8e-7 2.1e-5 1.2e-350 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9

Table 4: Dependence of KLD on H-matrix rank k, Matern co-variance with L = {0.25, 0.75} and ν = 0.5, domain G = [0, 1]2,‖C(L=0.25,0.75)‖2 = {212, 568}.

For ν = 1.5 the KLD and the inverse (CH)−1 is hard to computenumerically. Results in Table 4 are better since covariance ma-trix with ν = 0.5 has smallest eigenvalues far enough from zero.The case ν = 1.5 is more smooth, the eigenvalues decay faster,but the smallest eigenvalues come much closer to zero than inν = 0.5 case.

4. Other applications

4.1 Low-rank approximation of Kriging and geo-statistical optimal designLet s ∈ Rn to be estimated, Css covariance matrix, y ∈ Rm isvector of measurements. The corresponding cross- and auto-covariance matrices are denoted by Csy and Cyy, respectively,sized n×m and m×m.

Kriging estimate s = CsyC−1yy y .

The estimation variance σ is the diagonal of the cond. cov. ma-trix Css|y: σs = diag(Css|y) = diag

(Css −CsyC

−1yyCys

)Geostatistical optimal design:

φA = n−1 trace[Css|y

]φC = cT

(Css −CsyC

−1yyCys

)c, c− a vector.

4.2 Weather forecast in Europa

180 24030

60

Figure 6: Europa weather stations (≈ 2500). Collected data setM ∈ R2500×365

.

0 50 100 150 200 250 300 350 400−20

−15

−10

−5

0

5

10

15

20

Figure 7: Truth temperature forecast and its low-rank approxi-mation (rank 50 approximation of matrix M ) in one station, rel.error=25%.

5. Open question

1. Compute the whole spectrum of large covariance matrix

2. Compute KLD for large matrices (det Σ ?)

3. How sensible is KLD to H-matrix accuracy ?

4. Derive/estimate KLD for non-Gaussian distributions.

Acknowledgements

A. Litvinenko is a member of the KAUST SRI UQ Center.

References

1. B. N. Khoromskij, A. Litvinenko, H. G. Matthies, Application ofhierarchical matrices for computing the Karhunen?Loeve expan-sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 20082. R. Furrer, M. Genton, D. Nychka, Covariance tapering for in-terpolation of large spatial datasets, J. Comp. & Graph. Stat.,Vol.15, N3, pp502-523.3. M. Stein, Limitations on low rank approximations for covari-ance matrices of spatial data, Spat. Statistics, 20134. J. Castrillion-Candis, M. Genton, R. Yokota, Multi-Level Re-stricted Maximum Likelihood Cov. Estim. and Kriging for LargeNon-Gridded Datasets, 2014.

hierarchical matrix approximation of large covariance matrices

Education