hierarchical matrix approximation of large covariance matrices

1
Hierarchical matrix approximation of large covariance matrices A. Litvinenko 1 , M. Genton 2 , Ying Sun 2 , R. Tempone 1 SRI-UQ Center and 2 Spatio-Temporal Statistics & Data Analysis Group at KAUST [email protected] Center for Uncertainty Quantification Abstract We approximate large non-structured covariance ma- trices in the H-matrix format with a log-linear com- putational cost and storage O (n log n). We compute inverse, Cholesky decomposition and determinant in H-format. As an example we consider the class of Matern covariance functions, which are very popu- lar in spatial statistics, geostatistics, machine learning and image analysis. Applications are: kriging and op- timal design 1. Matern covariance C (x, y )= C (|x-y |)= σ 2 1 Γ(ν )2 ν -1 2ν r L ν K ν 2ν r L , where Γ is the gamma function, K ν is the modified Bessel function of the second kind, r = |x - y | and ν , L are non-negative parameters of the covariance. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.05 0.1 0.15 0.2 0.25 Matern covariance (nu=1) σ=0.5, l=0.5 σ=0.5, l=0.3 σ=0.5, l=0.2 σ=0.5, l=0.1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.05 0.1 0.15 0.2 0.25 nu=0.15 nu=0.3 nu=0.5 nu=1 nu=2 nu=30 As ν →∞ [4], C (r )= σ 2 exp(-r 2 /2L 2 ). When ν =0.5, the Matern covariance is identical to the exponential covariance function. C ν =3/2 (r )= 1+ 3r L ! exp - 3r L ! C ν =5/2 (r )= 1+ 5r L + 5r 2 3L 2 ! exp - 5r L ! . Note: no need to assume neither C (x, y )= C (|x - y |) nor tensor grid. 2. H-matrix approximation 25 20 20 20 20 16 20 16 20 20 16 16 20 16 16 16 19 20 20 19 32 19 19 16 16 32 19 20 20 19 19 16 19 16 32 32 20 20 20 20 32 32 32 20 19 19 19 32 20 19 16 16 32 32 20 32 32 20 32 32 32 32 20 32 32 20 19 19 19 20 16 19 16 32 32 20 32 32 32 32 32 20 32 32 32 20 32 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 20 20 19 20 20 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 20 20 19 20 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 20 20 20 20 32 32 32 20 19 20 19 32 32 32 20 20 19 19 32 32 32 20 20 20 20 32 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 20 20 20 20 19 20 20 20 32 32 32 32 20 32 32 20 32 32 32 32 20 32 32 20 20 20 20 20 20 20 20 20 20 20 32 20 32 32 20 20 20 20 20 20 20 20 20 20 20 32 20 32 32 20 20 20 20 20 20 20 20 20 20 20 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 32 20 20 20 20 20 20 20 20 19 20 20 20 32 32 32 20 32 32 32 32 32 20 32 32 32 20 32 20 20 20 20 20 20 20 20 20 20 20 32 32 20 32 20 20 20 20 20 20 20 20 20 20 20 32 32 20 32 20 20 20 20 20 20 20 20 20 20 20 32 32 20 32 32 32 20 32 32 32 20 32 32 32 20 32 20 20 20 20 20 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 19 19 20 20 32 32 32 32 20 32 32 20 32 32 32 32 20 32 32 19 20 19 20 32 32 32 20 32 32 32 32 32 20 32 32 32 20 32 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 20 32 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 32 32 32 32 20 20 20 32 20 32 32 20 20 20 32 32 20 32 20 20 20 32 32 32 32 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 20 32 32 32 20 32 20 32 20 32 32 32 20 32 20 32 32 20 32 32 32 20 32 32 32 20 20 32 32 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 25 9 9 20 9 9 20 7 7 16 9 9 20 9 9 20 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 Figure 2: Two approximation strategies [1]: fixed rank (left) and flexible rank (right) approximations, C R n×n , n = 65 2 . I I I I I I I I I I 1 1 2 2 11 12 21 22 I 11 I 12 I 21 I 22 Q Q t S dist H= t s 1. Build cluster tree T I , I = {1, 2, ..., n} 2. Build block cluster tree T I ×I 3. For each (t × s) T I ×I , t, s T I , check admissibility condition min{diam(Q t ), diam(Q s )}≤ η · dist(Q t ,Q s ). if(adm=true) then M | t×s is a rank-k matrix block if(adm=false) then divide M | t×s further or define as a dense matrix block, if small enough. Grid cluster tree (T I ) + admissibility condition block cluster tree (T I ×I ) →H-matrix →H-matrix arith- metics. Operation Sequential Complexity Parallel Complexity (Hackbusch et al. ’99-’06) (Kriemann ’05) storage(M ) N = O(kn log n) N P Mx N = O(kn log n) N P M 1 M 2 N = O(k 2 n log n) N P M 1 M 2 , M -1 N = O(k 2 n log 2 n) N P + O(n) H-LU N = O(k 2 n log 2 n) N P + O( k 2 n log 2 n n 1/d ) Table 1: Computational cost of H-matrix arithmetics, sequential and parallel. Let ε = k(C -C H )zk 2 kC k 2 kzk 2 , where z is a random vector. n rank k size, MB t, sec. ε max i=1..10 |λ i - ˜ λ i |, i ε 2 for ˜ C C ˜ C C ˜ C 4.0 · 10 3 10 48 3 0.8 0.08 7 · 10 -3 7.0 · 10 -2 ,9 2.0 · 10 -4 1.05 · 10 4 18 439 19 7.0 0.4 7 · 10 -4 5.5 · 10 -2 ,2 1.0 · 10 -4 2.1 · 10 4 25 2054 64 45.0 1.4 1 · 10 -5 5.0 · 10 -2 ,9 4.4 · 10 -6 Table 2: Accuracy of the H-matrix approx. exp. covariance function, l 1 = l 3 = 0.1, l 2 =0.5. l 1 l 2 ε 0.01 0.02 3 · 10 -2 0.1 0.2 8 · 10 -3 0.5 1 2.8 · 10 -5 Table 3: Dependence of the H-matrix accuracy on the covari- ance lengths l 1 and l 2 , n = 129 2 . The smaller cov. length the less accurate is H-approximation. 0 100 200 300 0 50 100 150 200 250 300 -1 0 1 2 -1 -0.5 0 0.5 1 1.5 2 2.5 3 0 100 200 300 0 50 100 150 200 250 300 -1 0 1 2 -3 -2 -1 0 1 2 Figure 4: Two realizations of random field generated via Cholesky decomposition of Matern covariance matrix, ν =0.4. 3. Kullback-Leibler divergence Measure of the information lost when distribution Q is used to approximate P . D KL (P kQ)= X i P (i) ln P (i) Q(i) ,D KL (P kQ)= Z -∞ p(x) ln p(x) q (x) dx, where p, q densities of P and Q. For miltivariate normal distribu- tions (μ 0 , C ) and (μ 1 , C H ): 2D KL (N 0 kN 1 )= tr ((C H ) -1 C )+(μ 1 - μ 0 ) T (C H ) -1 (μ 1 - μ 0 ) - n - ln det C det C H . 0 10 20 30 40 50 60 70 80 90 100 -16 -14 -12 -10 -8 -6 -4 -2 0 rank k log(rel. error) Spectral norm, L=0.1, nu=0.5 Frob. norm, L=0.1 Spectral norm, L=0.2 Frob. norm, L=0.2 Spectral norm, L=0.5 Frob. norm, L=0.5 0 10 20 30 40 50 60 70 80 90 100 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 rank k log(rel. error) Spectral norm, L=0.1, ν=1.5 Frob. norm, L=0.1 Spectral norm, L=0.2 Frob. norm, L=0.2 Spectral norm, L=0.5 Frob. norm, L=0.5 Figure 5: Relative H-matrix approx. error kC - C H k 2 for different cov. lengths L = {0.1, 0.2, 0.5} and ν = {0.5, 1.5} k KLD(C , C H ) kC - C H k 2 kC (C H ) -1 - I k 2 L =0.25 L =0.75 L =0.25 L =0.75 L =0.25 L =0.75 5 0.51 2.3 4.0e-2 0.1 4.8 63 6 0.34 1.6 9.4e-3 0.02 3.4 22 8 5.3e-2 0.4 1.9e-3 0.003 1.2 8 10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.1 12 5.0e-4 2e-2 9.7e-5 5.6e-5 1.6e-2 0.5 15 1.0e-5 9e-4 2.0e-5 1.1e-5 8.0e-4 0.02 20 4.5e-7 4.8e-5 6.5e-7 2.8e-7 2.1e-5 1.2e-3 50 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9 Table 4: Dependence of KLD on H-matrix rank k , Matern co- variance with L = {0.25, 0.75} and ν =0.5, domain G = [0, 1] 2 , kC (L=0.25,0.75) k 2 = {212, 568}. For ν =1.5 the KLD and the inverse (C H ) -1 is hard to compute numerically. Results in Table 4 are better since covariance ma- trix with ν =0.5 has smallest eigenvalues far enough from zero. The case ν =1.5 is more smooth, the eigenvalues decay faster, but the smallest eigenvalues come much closer to zero than in ν =0.5 case. 4. Other applications 4.1 Low-rank approximation of Kriging and geo- statistical optimal design Let ˆ s R n to be estimated, C ss covariance matrix, y R m is vector of measurements. The corresponding cross- and auto- covariance matrices are denoted by C sy and C yy , respectively, sized n × m and m × m. Kriging estimate ˆ s = C sy C -1 yy y . The estimation variance ˆ σ is the diagonal of the cond. cov. ma- trix C ss|y : ˆ σ s = diag(C ss|y ) = diag ( C ss - C sy C -1 yy C ys ) Geostatistical optimal design: φ A = n -1 trace h C ss|y i φ C = c T C ss - C sy C -1 yy C ys c,c - a vector. 4.2 Weather forecast in Europa 180 240 30 60 Figure 6: Europa weather stations (2500). Collected data set M R 2500×365 . 0 50 100 150 200 250 300 350 400 -20 -15 -10 -5 0 5 10 15 20 Figure 7: Truth temperature forecast and its low-rank approxi- mation (rank 50 approximation of matrix M ) in one station, rel. error=25%. 5. Open question 1. Compute the whole spectrum of large covariance matrix 2. Compute KLD for large matrices (det Σ ?) 3. How sensible is KLD to H-matrix accuracy ? 4. Derive/estimate KLD for non-Gaussian distributions. Acknowledgements A. Litvinenko is a member of the KAUST SRI UQ Center. References 1. B. N. Khoromskij, A. Litvinenko, H. G. Matthies, Application of hierarchical matrices for computing the Karhunen?Lo ´ eve expan- sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008 2. R. Furrer, M. Genton, D. Nychka, Covariance tapering for in- terpolation of large spatial datasets, J. Comp. & Graph. Stat., Vol.15, N3, pp502-523. 3. M. Stein, Limitations on low rank approximations for covari- ance matrices of spatial data, Spat. Statistics, 2013 4. J. Castrillion-Candis, M. Genton, R. Yokota, Multi-Level Re- stricted Maximum Likelihood Cov. Estim. and Kriging for Large Non-Gridded Datasets, 2014.

Upload: alexander-litvinenko

Post on 12-Apr-2017

28 views

Category:

Education


2 download

TRANSCRIPT

Center for UncertaintyQuantification

Center for UncertaintyQuantification

Center for Uncertainty Quantification Logo Lock-up

Hierarchical matrix approximation of

large covariance matricesA. Litvinenko1, M. Genton2, Ying Sun2, R. Tempone

1SRI-UQ Center and 2Spatio-Temporal Statistics & Data Analysis Groupat KAUST

[email protected]

Center for UncertaintyQuantification

Center for UncertaintyQuantification

Center for Uncertainty Quantification Logo Lock-up

Abstract

We approximate large non-structured covariance ma-trices in the H-matrix format with a log-linear com-putational cost and storage O(n log n). We computeinverse, Cholesky decomposition and determinant inH-format. As an example we consider the class ofMatern covariance functions, which are very popu-lar in spatial statistics, geostatistics, machine learningand image analysis. Applications are: kriging and op-timal design

1. Matern covariance

C(x, y) = C(|x−y|) = σ2 1

Γ(ν)2ν−1

(√2νr

L

)νKν

(√2νr

L

),

where Γ is the gamma function, Kν is the modifiedBessel function of the second kind, r = |x − y| andν, L are non-negative parameters of the covariance.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

Matern covariance (nu=1)

σ=0.5, l=0.5

σ=0.5, l=0.3

σ=0.5, l=0.2

σ=0.5, l=0.1

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

nu=0.15

nu=0.3

nu=0.5

nu=1

nu=2

nu=30

As ν →∞ [4],

C(r) = σ2 exp(−r2/2L2).

When ν = 0.5, the Matern covariance is identical tothe exponential covariance function.

Cν=3/2(r) =

(1 +

√3r

L

)exp

(−√

3r

L

)

Cν=5/2(r) =

(1 +

√5r

L+

5r2

3L2

)exp

(−√

5r

L

).

Note: no need to assume neither C(x, y) = C(|x− y|) nor tensorgrid.

2. H-matrix approximation

25 20

20 20

20 16

20 16

20 20

16 16

20 16

16 16

19 20

20 19 32

19 19

16 16 32

19 20

20 19

19 16

19 16

32 32

20 20

20 20 32

32 32

20 19

19 19 32

20 19

16 16 32

32 20

32 32

20 32

32 32

32 20

32 32

20 19

19 19

20 16

19 16

32 32

20 32

32 32

32 32

20 32

32 32

20 32

20 20

20 20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

20 2020 19

20 20 32

32 32 20

20 20

32 20

32 32 20

20 20

32 32

20 32 20

20 20

32 32

32 32 20

20

20 20

19 20 32

32 32

20 20

2032 20

32 32

20 20

2032 32

20 32

20 20

2032 32

32 32

20 20

20 20

20 20 32

32 32

20 19

20 19 32

32 32

20 20

19 19 32

32 32

20 20

20 20 32

32 32

32 20

32 32

32 20

32 32

32 20

32 32

32 20

32 32

32 32

20 32

32 32

20 32

32 32

20 32

32 32

20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

20 20

20 20 2019 20

20 20 32

32 32

32 20

32 32

20 32

32 32

32 20

32 32

20 20

20 20

20 20

20 20 20

20 20

32 20

32 32 20

20 20

20 20

20 20

20 20 20

2032 20

32 32

20 20

20 20

20 20

20 20

20 20 2032 20

32 32

32 20

32 32

32 20

32 32

32 20

32 32

20 20

20 20

20 20

20 20

19 20

20 20 32

32 32

20 32

32 32

32 32

20 32

32 32

20 32

2020 20

20 20

20 20

20 20

20 20

32 32

20 32 20

2020 20

20 20

20 20

20 20

2032 32

20 32

20 20

2020 20

20 20

20 20

20 20

32 32

20 32

32 32

20 32

32 32

20 32

32 32

20 32

2020 20

20 20

20 20

20 20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

19 19

20 20 32

32 32

32 20

32 32

20 32

32 32

32 20

32 32

19 20

19 20 32

32 32

20 32

32 32

32 32

20 32

32 32

20 32

20 20

20 20 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

20 20

32 32

32 32 20

20 20

32 20

32 32 20

20 20

32 32

20 32 20

20 20

32 32

32 32 20

2032 32

32 32

20 20

2032 20

32 32

20 20

2032 32

20 32

20 20

2032 32

32 32

20 20

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 20

32 32

32 20

32 20

32 20

32 32

32 20

32 20

32 32

20 32

32 32

20 32

32 32

20 20

32 32

20 20

32 32

32 32

32 32

32 32

32 32

32 32

32 32

32 32

25 9

9 20 9

920 7

7 169

9

20 9

9 20 9

9 329

9

20 9

9 20 9

9 32 9

932 9

9 32

9

9

20 9

9 20 9

9 32 9

9

20 9

9 20 9

9 329

9

32 9

9 32 9

932 9

9 32

9

9

20 9

9 20 9

9 32 9

932 9

9 329

9

20 9

9 20 9

9 32 9

932 9

9 32

9

9

32 9

9 32 9

932 9

9 329

9

32 9

9 32 9

932 9

9 32

Figure 2: Two approximation strategies [1]: fixed rank (left) andflexible rank (right) approximations, C ∈ Rn×n, n = 652.

I

I

I I

I

I

I I I I1

1

2

2

11 12 21 22

I11

I12

I21

I22

QQ t

S

dist

H=

t

s

1. Build cluster tree TI, I = {1, 2, ..., n}2. Build block cluster tree TI×I3. For each (t× s) ∈ TI×I, t, s ∈ TI, check admissibility

condition min{diam(Qt), diam(Qs)} ≤ η ·dist(Qt, Qs).

if(adm=true) then M |t×s is a rank-k matrix blockif(adm=false) then divide M |t×s further or define as adense matrix block, if small enough.

Grid → cluster tree (TI) + admissibility condition →block cluster tree (TI×I)→H-matrix→H-matrix arith-metics.

Operation Sequential Complexity Parallel Complexity(Hackbusch et al. ’99-’06) (Kriemann ’05)

storage(M) N = O(kn log n) NP

Mx N = O(kn log n) NP

M1 ⊕M2 N = O(k2n log n) NP

M1 �M2, M−1 N = O(k2n log2 n) NP +O(n)

H-LU N = O(k2n log2 n) NP +O(k

2n log2 nn1/d

)

Table 1: Computational cost of H-matrix arithmetics, sequentialand parallel.

Let ε =‖(C−CH)z‖2‖C‖2‖z‖2 , where z is a random vector.

n rank k size, MB t, sec. ε maxi=1..10

|λi − λi|, i ε2

for C C C C C4.0 · 103 10 48 3 0.8 0.08 7 · 10−3 7.0 · 10−2, 9 2.0 · 10−4

1.05 · 104 18 439 19 7.0 0.4 7 · 10−4 5.5 · 10−2, 2 1.0 · 10−4

2.1 · 104 25 2054 64 45.0 1.4 1 · 10−5 5.0 · 10−2, 9 4.4 · 10−6

Table 2: Accuracy of the H-matrix approx. exp. covariance function, l1 = l3 =0.1, l2 = 0.5.

l1 l2 ε

0.01 0.02 3 · 10−2

0.1 0.2 8 · 10−3

0.5 1 2.8 · 10−5

Table 3: Dependence of the H-matrix accuracy on the covari-ance lengths l1 and l2, n = 1292. The smaller cov. length the lessaccurate is H-approximation.

0

100

200

300

0

50

100

150

200

250

300

−1

0

1

2

−1

−0.5

0

0.5

1

1.5

2

2.5

3

0

100

200

300

0

50

100

150

200

250

300

−1

0

1

2

−3

−2

−1

0

1

2

Figure 4: Two realizations of random field generated viaCholesky decomposition of Matern covariance matrix, ν = 0.4.

3. Kullback-Leibler divergence

Measure of the information lost when distribution Q is used toapproximate P .

DKL(P‖Q) =∑i

P (i) lnP (i)

Q(i), DKL(P‖Q) =

∫ ∞−∞

p(x) lnp(x)

q(x)dx,

where p, q densities of P and Q. For miltivariate normal distribu-tions (µ0,C) and (µ1,C

H):

2DKL(N0‖N1) =

(tr((CH)−1C) + (µ1 − µ0)T (CH)−1(µ1 − µ0)− n− ln

(detC

detCH

)).

0 10 20 30 40 50 60 70 80 90 100−16

−14

−12

−10

−8

−6

−4

−2

0

rank k

log(r

el.

err

or)

Spectral norm, L=0.1, nu=0.5

Frob. norm, L=0.1

Spectral norm, L=0.2

Frob. norm, L=0.2

Spectral norm, L=0.5

Frob. norm, L=0.5

0 10 20 30 40 50 60 70 80 90 100−18

−16

−14

−12

−10

−8

−6

−4

−2

0

rank k

log(r

el.

err

or)

Spectral norm, L=0.1, ν=1.5

Frob. norm, L=0.1

Spectral norm, L=0.2

Frob. norm, L=0.2

Spectral norm, L=0.5

Frob. norm, L=0.5

Figure 5: RelativeH-matrix approx. error ‖C−CH‖2 for differentcov. lengths L = {0.1, 0.2, 0.5} and ν = {0.5, 1.5}

k KLD(C,CH) ‖C −CH‖2 ‖C(CH)−1 − I‖2L = 0.25 L = 0.75 L = 0.25 L = 0.75 L = 0.25 L = 0.75

5 0.51 2.3 4.0e-2 0.1 4.8 636 0.34 1.6 9.4e-3 0.02 3.4 228 5.3e-2 0.4 1.9e-3 0.003 1.2 810 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.112 5.0e-4 2e-2 9.7e-5 5.6e-5 1.6e-2 0.515 1.0e-5 9e-4 2.0e-5 1.1e-5 8.0e-4 0.0220 4.5e-7 4.8e-5 6.5e-7 2.8e-7 2.1e-5 1.2e-350 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9

Table 4: Dependence of KLD on H-matrix rank k, Matern co-variance with L = {0.25, 0.75} and ν = 0.5, domain G = [0, 1]2,‖C(L=0.25,0.75)‖2 = {212, 568}.

For ν = 1.5 the KLD and the inverse (CH)−1 is hard to computenumerically. Results in Table 4 are better since covariance ma-trix with ν = 0.5 has smallest eigenvalues far enough from zero.The case ν = 1.5 is more smooth, the eigenvalues decay faster,but the smallest eigenvalues come much closer to zero than inν = 0.5 case.

4. Other applications

4.1 Low-rank approximation of Kriging and geo-statistical optimal designLet s ∈ Rn to be estimated, Css covariance matrix, y ∈ Rm isvector of measurements. The corresponding cross- and auto-covariance matrices are denoted by Csy and Cyy, respectively,sized n×m and m×m.

Kriging estimate s = CsyC−1yy y .

The estimation variance σ is the diagonal of the cond. cov. ma-trix Css|y: σs = diag(Css|y) = diag

(Css −CsyC

−1yyCys

)Geostatistical optimal design:

φA = n−1 trace[Css|y

]φC = cT

(Css −CsyC

−1yyCys

)c, c− a vector.

4.2 Weather forecast in Europa

180 24030

60

Figure 6: Europa weather stations (≈ 2500). Collected data setM ∈ R2500×365

.

0 50 100 150 200 250 300 350 400−20

−15

−10

−5

0

5

10

15

20

Figure 7: Truth temperature forecast and its low-rank approxi-mation (rank 50 approximation of matrix M ) in one station, rel.error=25%.

5. Open question

1. Compute the whole spectrum of large covariance matrix

2. Compute KLD for large matrices (det Σ ?)

3. How sensible is KLD to H-matrix accuracy ?

4. Derive/estimate KLD for non-Gaussian distributions.

Acknowledgements

A. Litvinenko is a member of the KAUST SRI UQ Center.

References

1. B. N. Khoromskij, A. Litvinenko, H. G. Matthies, Application ofhierarchical matrices for computing the Karhunen?Loeve expan-sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 20082. R. Furrer, M. Genton, D. Nychka, Covariance tapering for in-terpolation of large spatial datasets, J. Comp. & Graph. Stat.,Vol.15, N3, pp502-523.3. M. Stein, Limitations on low rank approximations for covari-ance matrices of spatial data, Spat. Statistics, 20134. J. Castrillion-Candis, M. Genton, R. Yokota, Multi-Level Re-stricted Maximum Likelihood Cov. Estim. and Kriging for LargeNon-Gridded Datasets, 2014.