center for uncertainty quantification logo lock-up seminars... · numerical aerodynamics center for...

Fast and cheap approximation of large covariancematrices by the hierarchical matrix technique

Alexander LitvinenkoExtreme Computing Research Center and Uncertainty

Quantification Center, KAUST

Center for UncertaintyQuantification


Center for Uncertainty Quantification Logo Lock-up

http://sri-uq.kaust.edu.sa/

http://sri-uq.kaust.edu.sa/

4*

The structure of the talk

1. a short BIO-Sketch

2. Hierarchical matrices, domain decomposition, low-ranktensors,

3. Stochastic PDEs and UQ

4. UQ Examples

5. Matern Covariance and Green functions




2 / 45

4*

The structure of the talk

Main steps in my carrier




3 / 45

4*

Points of my study and of my work.




4 / 45

4*

My experience 08.2002-now




5 / 45

4*

Bachelor and Master, Sobolev Institute of Math., Novosibirsk




6 / 45

4*

PhD 2002-2006, MPI for Mathematics in the Science, Leipzig

After defense, Prof. W. Hackbusch and the group, MPI forMathematics in the Science, Leipzig




7 / 45

4*

CV

What I did during my PhD ?




8 / 45

4*

Problem setup

The elliptic boundary value problem: find u ∈ H1(Ω) s.t. :∑

1≤i ,j≤2

∂

∂xiαi ,j(x)

∂

∂xju = f in Ω

u = g on ∂Ω

(1)

where αi ,j ∈ L∞(Ω) such A(x) = (αi ,j)i ,j=1,2 satisfies0 < λ ≤ λmin(A(x)) ≤ λmax(A(x)) ≤ λ , ∀x ∈ Ω.⇒ Oscillatory or jumping coefficients are allowed.




9 / 45

4*

Motivation and goals

E.g. a) compute solution on γ, b) compute solution in asubdomain ν, c) compute solution on the interface.d) Let Ah · xh = bh and h H, interested only inxH = RH←hA−1

h bh or xH = RH←hA−1h PbH .




10 / 45

4*

The idea of HDD

Apply Galerkin FE discretisation to (1).Construct the discrete solution in the form

uh = Fhfh + Ghgh, (2)

where Fh, Gh are two solution operators, fh is the FE rhs and gh isthe FE Dirichlet-boundary values.

Often only few functionals of the solution are of interest!




11 / 45

4*

IDEA: Leaves to Root and Root to Leaves algorithms




12 / 45

4*

HDD with truncation of the small scales

Ω

h

H

TH

Th

Tr

Ω

v

T

repeated cells

v

. . . .

.

.

.

.

.

.

.

.

.

.

.

.

mean value

Domain decomposition tree TTh .Application: Multiscale problems (e.g. the skin problem, porousmedium).Use the microscopic model to extract all microscale details andthen compute the macroscale behaviour.




13 / 45

A

B

C

D

E

13 4

4 45

5 85

5 82

2

8 5

5 16 5

5 8

5

5

8 5

5

16 5

5 81

1

8 5

5

8 5

5 15

5

516 5

5 15

255 8

6 16

6 16

6 32

7 32

32

16

32

16

32

6

32

6

16

6

32

5 16

6 32

32

32

16

16

31

19

32

5 32

632

5 31

258

16

16

32

32

1

32

6

16

6 32

6 16

6 32

5

32

16

32

16

32

1

32

6

32

5

16

6

16

5 31

20

32

5 32

632

5 31

25 7

7 89

9 1610

10 1611

11 3218

18 3215

15

32 17

17

16 10

10

32 8

8 1611

11 32

19

19

32 11

11 32 14

1432 12

12 31

17 8

8 16 11

11

16 8

8 32 10

10 16

17

17 3214

14

32 16

16

17 6

6 16 9

9

16 10

10

16 8

8 31

20

20

32 12

12 32 13

1332 11

11 31

25 5

5 86

6 166

6 166

6 327

7 321

1

32 6

6

16 6

6 32 6

6 16

6

6 32

11

11

32 6

6

16 6

6

32 5

5 166

6 321

1

32 6

6

32 5

5

16 6

6

16 5

5 31

32

32

32 10

10 32 12

1232 10

10 31

F

A, B: H-matrix approximations for Ψgω1 and Ψg

ω2 , A, B havedifferent block structures. C is an extension of A, D is an extensionof B, where C and D have the same block structure.F=C+D and E is a block 11 of F after the Schur complement.




14 / 45

4*

CV

What I did as a ”PostDoc”/Scientific Staff?




15 / 45

4*

PostDoc 2007-2013, TU Braunschweig

Work period 2007-2013, Prof. H.G. Matthies, Scientific computinggroup, TU Braunschweig




16 / 45

4*

Numerical solution of stochastic PDEs

−∇ · (κ(x , ω)∇u(x , ω)) = f (x , ω), x ∈ G ⊂ Rd , ω ∈ Ω.

Let S = L2(Ω), U = L2(G),one is looking for u(x , ω) ∈ U ⊗ S.

Further decompositionL2(Ω) = L2(×mΩm) ∼=

⊗m L2(Ωm) ∼=

⊗m L2(R, Γm) results in

u(x , ω) ∈ U ⊗ S = L2(G)⊗⊗

m L2(R, Γm), m = 1..M.

IDEA: reduce the cost from O(nM) to O(Mn)




17 / 45

4*

Application: higher order moments

Why tensors ?

The 3-rd moment of u =∑

α∈J uαHα, J is a multi-index set, is

M(3)u = E

∑α∈J

∑β∈J

∑γ∈J

uα ⊗ uβ ⊗ uγHαHβHγ

,

where uα = K−1fα and multi-indices α, β, γ ∈ J , e.g.α = (α1, ..., αM).




18 / 45

4*

Why tensors ? Stochastic Galerkin matrix is...

[see Keese 05, Zander 12, sglib 2006-2015]Center for UncertaintyQuantification



19 / 45

4*

Numerical Experiments

Few nice pictures about Uncertainties from

Numerical Aerodynamics




20 / 45

4*

Density, mean and variance

The mean density and variance of the density. Case 9, RAE-2822.Center for UncertaintyQuantification



21 / 45

4*

Example 3:

5% and 95% quantiles for cp from 500 MC realisations.




22 / 45

4*

Example 4:

5% and 95% quantiles for cf from 500 MC realisations.




23 / 45

4*

Relative error, density variance, trans-sonic flow, Z = 2600




24 / 45

4*

Research Scientist 2013-now, KAUST

Research Scientist 2013-now,Prof. R. Tempone and D. Keyes,

Uncertainty Quantification (UQ) andExtreme Computing Research Center, KAUST




25 / 45

4*

My interests and cooperations




26 / 45

4*

Karhunen-Loeve expansion (KLE)

The spectral representation of the cov. function isCκ(x , y) =

∑∞i=0 λiki (x)ki (y), where λi and ki (x) are the

eigenvalues and eigenfunctions.KLE [Loeve, 1977] is the series

κ(x , ω) = µk(x) +∞∑i=1

√λiki (x)ξi (ω), where

ξi (ω) are uncorrelated random variables and ki are basis functionsin L2(D).Eigenpairs λi , ki are the solution of

Tki = λiki , ki ∈ L2(D), i ∈ N, where.

T : L2(D)→ L2(D),(Tu)(x) :=

∫D covk(x , y)u(y)dy .




27 / 45

4*

H - Matrices

Dependence of the computational time and storage requirementsof CH on the rank k , n = 1089.

k time (sec.) memory (MB) ‖C−CH‖2

‖C‖2

2 0.04 2e + 6 3.5e − 56 0.1 4e + 6 1.4e − 59 0.14 5.4e + 6 1.4e − 5

12 0.17 6.8e + 6 3.1e − 717 0.23 9.3e + 6 6.3e − 8

The time for dense matrix C is 3.3 sec. and the storage 1.4e + 8MB.




28 / 45

4*

Sparse tensor decompositions of kernels cov(x , y) = cov(x − y)

We want to approximate C ∈ RN×N , N = nd byCr =

∑rk=1 V

1k ⊗ ...⊗ Vd

k such that ‖C− Cr‖ ≤ ε.

The storage of C is O(N2) = O(n2d) and the storage of Cr isO(rdn2).

To define Vik use e.g. SVD.

Approximate all Vik in the H-matrix format ⇒ HKT format.

See basic arithmetics in [Hackbusch, Khoromskij, Tyrtyshnikov].

Assume f (x , y), x = (x1, x2), y = (y1, y2), then the equivalentapprox. problem is f (x1, x2; y1, y2) ≈

∑rk=1 Φk(x1, y1)Ψk(x2, y2).




29 / 45

4*

Numerical examples of tensor approximations

Gaussian kernel

e−|x−y |2

= e−(√∑d

`=1(x`−y`)2

)2

= e−|x1−y1|2 · e−|x2−y2|2 has theKroneker rank 1.

The exponential kernel e−|x−y | can be approximated by a tensorwith low Kronecker rank, i.e.e−|x−y | ≈

∑r`=1 ϕ`(|x1 − y1|)ψ`(|x2 − y2|)

r 1 2 3 4 5 6 10‖C−Cr‖∞‖C‖∞ 11.5 1.7 0.4 0.14 0.035 0.007 2.8e − 8‖C−Cr‖2

‖C‖26.7 0.52 0.1 0.03 0.008 0.001 5.3e − 9

Very moderate tensor ranks by e−|x−y |ν.




30 / 45

4*

Examples of realizations of random fields

To generate a realization κ(x , θ∗) of a RF κ(x , θ), one needs: 1)C = LLT ,2) generate a realization ξ(θ∗) of a random vector ξ(θ) and3) compute MV product L · ξ(θ∗).




31 / 45

4*

Kullback-Leibler divergence (KLD)

DKL(P‖Q) is measure of the information lost when distribution Qis used to approximate P:

DKL(P‖Q) =∑i

P(i) lnP(i)

Q(i), DKL(P‖Q) =

∫ ∞−∞

p(x) lnp(x)

q(x)dx ,

where p, q densities of P and Q. For miltivariate normaldistributions (µ0,Σ0) and (µ1,Σ1)

2DKL(N0‖N1) = tr(Σ−11 Σ0)+(µ1−µ0)TΣ−1

1 (µ1−µ0)−k− ln

(det Σ0

det Σ1

)




32 / 45

k KLD ‖C− CH‖2 ‖C(CH)−1 − I‖2

L = 0.25 L = 0.75 L = 0.25 L = 0.75 L = 0.25 L = 0.75

5 0.51 2.3 4.0e-2 0.1 4.8 636 0.34 1.6 9.4e-3 0.02 3.4 228 5.3e-2 0.4 1.9e-3 0.003 1.2 8

10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.112 5.0e-4 2e-2 9.7e-5 5.6e-5 1.6e-2 0.515 1.0e-5 9e-4 2.0e-5 1.1e-5 8.0e-4 0.0220 4.5e-7 4.8e-5 6.5e-7 2.8e-7 2.1e-5 1.2e-350 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9

Table : Dependence of KLD on the approximation H-matrix rank k,Matern covariance with parameters L = 0.25, 0.75 and ν = 0.5,domain G = [0, 1]2, ‖C(L=0.25,0.75)‖2 = 212, 568.




33 / 45

k KLD ‖C− CH‖2 ‖C(CH)−1 − I‖2

L = 0.25 L = 0.75 L = 0.25 L = 0.75 L = 0.25 L = 0.75

5 nan nan 0.05 6e-2 2.1e+13 1e+2810 10 10e+17 4e-4 5.5e-4 276 1e+1915 3.7 1.8 1.1e-5 3e-6 112 4e+318 1.2 2.7 1.2e-6 7.4e-7 31 5e+220 0.12 2.7 5.3e-7 2e-7 4.5 7230 3.2e-5 0.4 1.3e-9 5e-10 4.8e-3 2040 6.5e-8 1e-2 1.5e-11 8e-12 7.4e-6 0.550 8.3e-10 3e-3 2.0e-13 1.5e-13 1.5e-7 0.1

Table : Dependence of KLD on the approximation H-matrix rank k,Matern covariance with parameters L = 0.25, 0.75 and ν = 1.5,domain G = [0, 1]2, ‖C(L=0.25,0.75)‖2 = 720, 1068.




34 / 45

4*

Further applications of large covariance matrices

1. Kriging estimate s := CsyC−1yy y

2. Estimation of variance σ, is the diagonal of conditional cov.matrix Css|y = diag

(Css − CsyC−1

yy Cys

),

3. Gestatistical optimal design ϕA := n−1traceCss|y ,

ϕC := cT(Css − CsyC−1

yy Cys

)c ,




35 / 45

4*

Current work with M. Genton and his spatial statistics group

Current work with M. Genton and his spatial

statistics group




36 / 45

4*

Likelihood function with Matern covariance

Maximize the likelihood function w.r.t. parameter θ

`(θ) = −n

2log(2π)− n

2log detC (θ) − 1

2(zT (θ∗) · C (θ)−1z(θ∗)).

(3)After simplification, obtain

`(θ) = −n

2log(2π)− n

n∑i=1

logLii (θ) − 1

2(zT (θ∗)v(θ)), (4)

C (θ)v := z(θ∗) or L(θ)L(θ)T v := z(θ∗), or solution of these twosystems L(θ)w = z(θ∗) and LT (θ)v = w .




37 / 45

4*

Matern Fields (Whittle, 63)

Taken from D. Simpson (see also Finn Lindgren, Havard Rue,David Bolin,...)

TheoremThe covariance function of a Matern field

c(x , y) =1

Γ(ν + d/2)(4π)d/2κ2ν2ν−1(κ‖x − y‖)νKν(κ‖x − y‖)

(5)is the Green’s function of the differential operator

L2ν =

(κ2 −∆

)ν+d/2. (6)




38 / 45

4*

Gaussian Field and Green Function

A Gaussian field x(u) ∈ Rd with the Matern covariance is asolution to the linear fractional SPDE

(κ2 −∆)ν+d/2x(u) = W (u), κ > 0, ν > 0. (7)

W (u) - is spatial Gaussian white noise with unit variance.For all x , y ∈ Ω the Green function G (x , y) is the solution ofLG (·, y) = δy with b.c. G (·, y)|Γ = 0, where δy is the Diracdistribution at y ∈ Ω. The Green function is the kernel of theinverse L−1, i.e.,

u(x) =

∫Ω

G (x , y)f (y)dy . (8)

For L = −∆, G (x , y) is analytic in Ω.




39 / 45

4*

Bridge between numerical methods for PDEs and covariance

How we can use this bridge between numerical methods for PDEsand covariance ?See, e.g. [Bebendorf, Hackbusch 02,] Existence of H-matrixapproximation of the inverse FE-matrix of elliptic operators withL∞-coefficients. The Green functions of uniformly elliptic operatorcan be approximated by degenerate functions giving rise to theexistence of blockwise low-rank approximants of FEM inverses.




40 / 45

4*

Matern function for different parameters

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

Matern covariance (nu=1)

σ=0.5, l=0.5

σ=0.5, l=0.3

σ=0.5, l=0.2

σ=0.5, l=0.1

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

0.25

nu=0.15

nu=0.3

nu=0.5

nu=1

nu=2

nu=30

Figure : Matern function for different parameters (computed in sglib).




41 / 45

4*

The eigenvalue problem

∆u = λu in Ω = (0, 1)3

u = 0 on ∂Ω.(9)

The eigenvalues in Eq. 9 are

λ = λα+β+γ := π2(α2 + β2 + γ2), where α, βγ ∈ N. (10)

To solve Eq. 9 numerically (for testing purposes) one usually, first

discretize it by, e.g., using a piecewise linear basis(φ

(N)i

),

i = 1..N, in a subspace VN ∈ H10 (Ω) and then apply any classical

method, e.g. H-AMLS [Grasedyck & Gerds 15].




42 / 45

The discretized problem is

find (λ(N), xN) ∈ R× RN with

KxN = λNMx (N),(11)

where K ∈ NN×N is the stiffness matrix

Ki ,j := a(φ(N)j , φ

(N)i )

and M ∈ NN×N the mass matrix

Mi ,j :=(φ

(N)j , φ

(N)i

), i , j = 1..N.

Here the discretization step h := 1n+1 , N = n3. The eigenvalues of

the discrete problem Eq. 11 are approximating the eigenvalues ofthe continuous problem Eq. 9. See an approximation analysis in[Grasedyck & Gerds 15].




43 / 45

4*

Eigenvalues of((κ2 −∆)ν+d/2

)−1

TheoremLet ∆3 and ∆1 be the Laplace operators in 3D and 1D, I theidentity matrix. Then

∆3 = ∆1 ⊗ I ⊗ I + I ⊗∆1 ⊗ I + I ⊗ I ⊗∆1.

The eigenvalues of the shifted Laplace (κ2 −∆)ν+d/2 in the powerν + d/2 will be

κ2 + λ(α+β+γ)ν+d/2 =(κ2 + π2(λα + λβ + λγ)

)ν+d/2(12)

1L = 1

(κ2+π2(λα+λβ+λγ))ν+d/2 .

This is well-known Hilbert tensor 1α2+β2+γ2 , a low-rank tensor

decomposition of which is well known (see, e.g. B. Khoromskij).




44 / 45

4*

Conclusion

I Covariance matrices allow data sparse low-rankapproximations.

I With application of H-matricesI we extend the class of covariance functions to work with,I allows non-regular discretization of the covariance function on

large spatial grids.

I There is a bridge between SPDEs and covariance matrices.




45 / 45

4*

Literature

1. PCE of random coefficients and the solution of stochastic partialdifferential equations in the Tensor Train format, S. Dolgov, B. N.Khoromskij, A. Litvinenko, H. G. Matthies, 2015/3/11, arXiv:1503.032102. Efficient analysis of high dimensional data in tensor formats, M. Espig,W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander Sparse Grids andApplications, 31-56, 40, 20133. Application of hierarchical matrices for computing the Karhunen-Loeveexpansion, B.N. Khoromskij, A. Litvinenko, H.G. Matthies, Computing84 (1-2), 49-67, 31, 20094. Efficient low-rank approximation of the stochastic Galerkin matrix intensor formats, M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies,P. Waehnert, Comp. & Math. with Appl. 67 (4), 818-829, 2012

5. Numerical Methods for Uncertainty Quantification and Bayesian

Update in Aerodynamics, A. Litvinenko, H. G. Matthies, Book

”Management and Minimisation of Uncertainties and Errors in Numerical

Aerodynamics” pp 265-282, 2013




46 / 45

center for uncertainty quantification logo lock-up seminars... · numerical aerodynamics center for...

Documents