likelihood approximation with parallel hierarchical matrices for large spatial datasets

Center for UncertaintyQuantification

Center for Uncertainty Quantification Logo Lock-up

LikelihoodApproximationWithParallelHierarchicalMatricesForLargeSpatialDatasets

A. Litvinenko, Y. Sun, M. Genton, D. Keyes, CEMSE, KAUST

HIERARCHICAL LIKELIHOOD APPROXIMATIONSuppose we observe a mean-zero, stationary and isotropic Gaussian process Z with a Matérn covari-

ance at n irregularly spaced locations. Let Z = (Z(s1), ..., Z(sn))T then Z ∼ N (0,C(θ)), θ ∈ Rq is anunknown parameter vector of interest, where

Cij(θ) = cov(Z(si), Z(sj)) = C(‖si − sj‖,θ), and

C(r) := Cθ(r) =2σ2

Γ(ν)

)νKν

), θ = (σ2, ν, `)T

is the Matérn covariance function. The MLE ofθ is obtained by maximizing the Gaussian log-likelihood function:

L(θ) = −n2

log(2π)− 1

2log |C(θ)|− 1

2Z>C(θ)−1Z.

We approximate C ≈ C̃ in the H-matrix formatwith cost and storage O(kn log n), k � n.

Theorem 1 1. Let ρ(C̃−1C̃− I) < ε < 1. It holds| log |C| − log |C̃|| ≤ −n log(1− ε). Let ‖C−1‖ ≤ c1,then |L̃(θ; k)− L(θ)| ≤ c20 · c1 · ε+ n log(1− ε)

Operation Sequen. Compl. Parallel Compl. (shared mem.)

building(C̃) O(n logn) O(n logn)p

+O(|V (T )\L(T )|)storage(C̃) O(kn logn) O(kn logn)

C̃z O(kn logn) O(kn logn)p

+ n√p

H-Cholesky O(k2n log2 n) O(n logn)p

+O( k2n log2 n

n1/d )

Daily soil moisture, Mississippi basin.

H-matrix rank

Box-plots for differentH-ranks k = {3, 7, 9}, ` = 0.0334.

ℓ, ν = 0.325, σ2 = 0.980 0.2 0.4 0.6 0.8 1

log(|C̃|)

−L̃

Moisture, n = 66049, rank k = 11.

PARALLEL HIERARCHICAL MATRICES (HACKBUSCH, KRIEMANN’05)Advantages to approximate C by C̃: H-approximation is cheap; storage and matrix-vector productcost O(kn log n); LU and inverse cost O(k2n log2 n); efficient parallel implementations exists.

19 112

112 123

26 107

112 112

21 112

115 115

33 115

25 125

17 125

119 119

16 119

120 120

20 120

17 122

122 122

23 112

21 104

104 120

18 120

125 125

28 125

122 122

32 121

122 123

25 123

124 124

18 123

27 127

21 123

16 123

113 113

22 113

28 12325 12

12326 26

39 112

112 123

63 107

112 112

82 112

115 115

87 115

73 125

76 125

119 119

64 119

120 120

59 120

67 122

122 122

71 113

64 104

104 120

63 120

125 125

81 125

122 122

77 122

122 123

76 123

124 124

68 123

69 127

73 123

66 123

113 113

72 113

83 12326 26

12326 26

(1st) Matérn H-matrix approximations for moisture example, n = 8000, ε = 10−3, ` = 0.64, ν = 0.325,σ2 = 0.98, 29.3MB vs 488.3MB for dense, set up time 0.4 sec.; (2nd) Cholesky factor L, with accuracy ineach block ε = 10−8, 4.8 sec., storage 52.8 MB.; (3rd) Distribution across p processors; (4) Kronecker prod-uct ofH-matrices, n = 381K; (5) Discretization of Mississippi basin, [−84.8◦−72.9◦]×[32.446◦, 43.4044◦].

NUMERICAL EXAMPLES

H-matrix approximation, ν = 0.5, domain G = [0, 1]2, ‖C̃(0.25,0.75)‖2 = {212, 568}, n = 16049.

k KLD ‖C− C̃‖2 ‖C̃C̃−1 − I‖2` = 0.25 ` = 0.75 ` = 0.25 ` = 0.75 ` = 0.25 ` = 0.75

10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.150 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9

Computing time and number of iterations for maximization of log-likelihood L̃(θ; k), n = 66049.k size, GB C̃, set up time, s. compute L̃, s. maximizing, s. # iters10 1 7 115 1994 1320 1.7 11 370 5445 9

dense 38 42 657 ∞ -

Moisture data. We used adaptive rank arithmetics with ε = 10−4 for each block of C̃ and ε = 10−8 foreach block of C̃−1. Number of processing cores is 40.

n compute C̃ L̃L̃T inverseCompr. time size time size ‖I− (L̃L̃T)−1C̃‖2 time size ‖I− C̃−1C̃‖2rate % sec. MB sec. MB sec. MB

10000 86% 0.9 106 4.1 109 7.7e-6 44 230 7.8e-530000 92.5% 4.3 515 25 557 1.1e-3 316 1168 1.1e-1

n = 512K, accuracy inside each block 10−8, matrix setup 261 sec., compression rate 99.98% (0.4GB against 2006 GB).H-LU is done in 843 sec., required 5.8 GB RAM, inversion LU error 2 · 10−3.

(1st) −L vs. ν; (2nd) with nuggets {0.01, 0.005, 0.001} for Gaussian covariance, n = 2000, k = 14,σ2 = 1; (3rd) Zoom of 2nd figure; (4th) box-plots for ν vs number of locations n.

REFERENCES AND ACKNOWLEDGEMENTS

[1] B. N. KHOROMSKIJ, A. LITVINENKO, H. G. MATTHIES, Application of hierarchical matrices for computing theKarhunen-Loéve expan-sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008.

[2] Y. SUN, M. STEIN, Statistically and computationally efficient estimating equations for large spatial datasets, JCGS, 2016,[3] A. LITVINENKO, M. GENTON, Y. SUN, D. KEYES, ??matrix techniques for approximating large covariance matrices

and estimating its parameters, PAMM 16 (1), 731-732, 2016[4] W. NOWAK, A. LITVINENKO, Kriging and spatial design accelerated by orders of magnitude: combining low-rank

covariance approximations with FFT-techniques, J. Mathematical Geosciences, Vol. 45, N4, pp 411-435, 2013.

Work supported by SRI-UQ and ECRC, KAUST. Thanks to Ronald Kriemann for HLIBPro .

likelihood approximation with parallel hierarchical matrices for large spatial datasets

Environment

maximum likelihood. likelihood the likelihood is the...

joint and conditional maximum likelihood estimation for...

a monte carlo likelihood approximation for generalized...

likelihood and conditional likelihood inference for...

approximation algorithms and hardness of approximation ipm

likelihood of purchase on-line: reliability, security, and...

maximum likelihood and restricted likelihood - nist page

likelihood inference for large scale stochastic...

nrcse...whittle likelihood approximation to gaussian...

approximation - courses.cs.washington.edu · this is a...

parameter estimation & maximum likelihood · parameter...

chapter 6 likelihood inference · chapter 6 likelihood...

proxy datasets

wealth management analysis · mr mrs 95% likelihood 80%...

uni-goettingen.denum.math.uni-goettingen.de/plonka/pdfs/modified-prony...optimal...

7 likelihood and maximum likelihood estimation -...

benchmarking datasets

a radial basis function approximation for large datasets ·...

the mesa distribution: an approximation likelihood...

a radial basis function approximation for large...