bayesian inversion: algorithms · bayesian inversion: algorithms andrew m stuart1 1mathematics...

SETTING AND ASSUMPTIONS MAP ESTIMATORS KULLBACK-LEIBLER APPROXIMATION SAMPLING CONCLUSIONS

Bayesian Inversion:Algorithms

Andrew M Stuart1

1Mathematics Institute andCentre for Scientific Computing

University of Warwick

Woudshoten Lectures 2013October 4th 2013

Work funded by EPSRC, ERC and ONR

http://homepages.warwick.ac.uk/∼masdr/

A.M. Stuart. Inverse problems: a Bayesian perspective.Acta Numerica 19(2010).∼masdr/BOOKCHAPTERS/stuart15c.pdfM. Dashti, K.J.H. Law, A.M. Stuart and J. Voss. MAPestimators and posterior consistency . . . .Inverse Problems, 29(2013), 095017. arxiv:1303.4795.

F. Pinski, G. Simpson, A.M. Stuart and H. Weber.Kullback-Leibler approximation for probability measures oninfinite dimensional spaces. In preparation.

S.L. Cotter , G.O. Roberts, A.M. Stuart and D. White.MCMC methods for functions . . . . Statistical Science28(2013). arxiv:1202.0709.

M. Hairer, A.M.Stuart and S. Vollmer. Spectral gaps for aMetropolis-Hastings algorithm . . . . arxiv:1112.1392.

Outline

1 SETTING AND ASSUMPTIONS

2 MAP ESTIMATORS

3 KULLBACK-LEIBLER APPROXIMATION

4 SAMPLING

5 CONCLUSIONS

Outline

2 MAP ESTIMATORS

4 SAMPLING

5 CONCLUSIONS

The Setting

Probability measure µ on Hilbert space H.Reference measure µ0 (often a prior).µ related to µ0 by (often Bayes’ Theorem)

dµdµ0

(u) =1

Zµexp(−Φ(u)

Another way of saying the same thing:

Eµf (u) =1

ZµEµ0(

exp(−Φ(u)

)f (u)

How do we get information from µ if we know µ0 and Φ?

The Talk In One Picture

Gaussian

Samples

The Assumptions

µ0 = N(0,C0) a centred Gaussian measure on H.µ0(X ) = 1; X (Banach) continuously embedded in H.

Let E = D(C− 1

20 ) (Cameron-Martin space).

Then E ⊂ X ⊆ H. E (Hilbert) compactly embedded in X .The function Φ ∈ C(X ;R+).

For all u, v with ‖u‖X ≤ r , ‖v‖X ≤ r there are Mi(r):

|Φ(u)| ≤ M1(r);

|Φ(u)− Φ(v)| ≤ M2(r)‖u − v‖.

Outline

2 MAP ESTIMATORS

4 SAMPLING

5 CONCLUSIONS

Probability Maximizers and Tikhonov Regularization

Define the Tikhonov-regularized LSQ functional I : E → R+ by

I(u) :=12‖C−

0 u‖2 + Φ(u).

Let Bδ(z) be a ball of radius δ in X centred at z ∈ E = D(C− 1

Theorem(Dashti, Law, S and Voss, 2013). The probability measure µand functional I are related by

limδ→0

µ(Bδ(z1)

)µ(Bδ(z2)

) = exp(I(z2)− I(z1)

Thus probability maximizers are minimizers of the regularizedTikhonov functional I.

Existence of Probability Maximizers

The minimization is well-defined:

Theorem(S, Acta Numerica, 2010). ∃u ∈ E :

I(u) = I := infI(u) : u ∈ E.

Furthermore, if un is a minimizing sequence satisfyingI(un)→ I then there is a subsequence un′ that convergesstrongly to u in E.

Example: Navier-Stokes Inversion for Initial Condition

Stream function

−1 −0.5 0 0.5−1

−0.8

−0.6

−0.4

−0.2

−10 0 10

−0.3

−0.2

−0.1

Stream function

−1 −0.5 0 0.5−1

−0.8

−0.6

−0.4

−0.2

−10 0 10

−0.4

−0.2

Incompressible NSE on ΩT = T2× (0,∞) :

∂tv − ν4v + v · ∇v +∇p = f in ΩT ,∇ · v = 0 in ΩT ,

v |t=0 = u inT2.

yj,k = v(xj , tk ) + ηj,k , ηj,k ∼ N(0, σ2I2×2).

y = G(u) + η, η ∼ N(0, σ2I).C0 = (−4stokes)

−2; Φ = 1103σ2 |y − G(u)|2.

Example: Navier-Stokes Inversion for Initial Condition

10−6

10−5

10−4

10−3

10−2

10−1

|u*−u+|/|u+|

|G(u*)−G(u+)|/|G(u+)||G(u*)−y|/|y|

Figure: MAP estimator u?; Truth u†

Outline

2 MAP ESTIMATORS

4 SAMPLING

5 CONCLUSIONS

The Objective Functional

Recall µ0 = N(0,C0) and µ(du) ∝ exp(−Φ(u)

)µ0(du). Let A

denote a set of simple measures on H (usually Gaussian).

ProblemFind ν ∈ A that minimizes I(ν) := DKL(ν‖µ).

Here DKL =Kullbach-Leibler divergence = relative entropy

DKL(ν‖µ) =

dνdµ(x) log

(dνdµ(x)

)µ(dx) if ν µ

+∞ else.

We note, for intuition, the inequality:

dHell(ν, µ)2 ≤ 2DKL(ν‖µ).

Existence of Minimizers

The minimization is well-defined:

Theorem(Pinski, Simpson, S, Weber, 2013) If A is closed under weakconvergence and there is ν ∈ A with I(ν) <∞ then ∃ ν ∈ Asuch that

I(ν) = I := infI(ν) : ν ∈ A.Furthermore, if νn is a minimizing sequence satisfyingI(νn)→ I then there is a subsequence νn′ that converges to νin the Hellinger metric:

dHell(νn, ν)→ 0.

Example: A := G = Gaussian measures on H.

Parameterization of G

Gaussian case equivalent to ν = N(

m,(C−1

0 + Γ)−1).

J(m, Γ) =

DKL(ν‖µ) if (m, Γ) ∈ E ×HS(E ,E∗)+∞ else.

Theorem

(Pinski, Simpson, S, Weber, 2013) ∃ (m, Γ) ∈ E ×HS(E ,E∗) :

J(m, Γ) = J := infJ(m, Γ) : (m, Γ) ∈ E ×HS(E ,E∗).

Furthermore, if (mn, Γn) is a minimizing sequence satisfyingJ(mn, Γn)→ J then there is a subsequence (mn′ , Γn′) suchthat

‖mn′ −m‖E + ‖Γn′ − Γ‖HS(E ,E∗) → 0.

Cautionary Examples

−1 −0.5 0 0.5 10.5

n =1n =4n =16

−1 −0.5 0 0.5 10

n =1n =4n =16

µ0 is Brownian bridge on [−1,1] :

νn := N(

0,(C−1

0 + Γn)−1)

Either:

(Γnu)(x) = ϕ(nx)u(x), ϕ ∈ C∞per, meanϕ

νn ⇒ ν := N(

0,(C−1

0 + ϕId)−1)

(Γnu)(x) = nϕ(nx)u(x), ϕ ∈ C∞0 , ‖ϕ‖L1 = 1

νn ⇒ ν := N(

0,(C−1

0 + δ0Id)−1).

Regularization of J I

Let (S, ‖ · ‖S) be compact in L(E ,E∗).

Jδ(m, Γ) =

J(m, Γ) + δ‖Γ‖2S if (m, Γ) ∈ E × S+∞ else.

Theorem

(Pinski, Simpson, S, Weber, 2013) ∃ (m, Γ) ∈ E × S :

Jδ(m, Γ) = Jδ := infJδ(m, Γ) : (m, Γ) ∈ E × S.

Furthermore, if νn(mn, Γn) is a minimizing sequencesatisfying Jδ(mn, Γn)→ Jδ then there is a subsequence(mn′ , Γn′) such that

dHell(νn′ , ν) + ‖Γn′ − Γ‖S → 0.

Regularization of J II

Let H = L2(Ω), Ω ⊂ Rd , C0 = (−4)−α with α > d/2.Choose (Γu)(x) = B(x)u(x) and S = H r , r > 0.Thus ν = N(m,C), C−1 = C−1

0 + B(·)Id for potential B.

Jδ(m,B) =

J(m,B) + δ‖B‖2H r if (m,B) ∈ Hα × H r

+∞ else.

Theorem

(Pinski, Simpson, S, Weber, 2013) ∃ (m,B) ∈ Hα × H r :

Jδ(m,B) = Jδ := infJδ(m,B) : (m,B) ∈ Hα × H r.

Example: Conditioned Diffusion in a Double Well

−1.5 −1 −0.5 0 0.5 1 1.50

V (x) = (x2 − 1)2

−5 0 5−1.5

−0.5

Consider the conditioned diffusion

dXt = −∇V (Xt )dt +√

2εdWts, .X−T = x− < 0,X+T = x+ > 0.

For x− = −x+, by symmetry, we can studypaths satisfying X0 = 0, XT = x+Path space distribution approximated byN(m(t),C), with

C−1 = 12ε

(− d2

dt2 + BI)

B is either a constant or B = B(t)m and B obtained by minimizing DKL

Stochastic Root Finding & OptimizationRobbins-Monro with Derivatives and Iterate Averaging

Functions Estimated Via Sampling

Assume f (x) (the target function) can be estimated via F (y ; x)as

f (x) = EY [F (Y ; x)], f ′(x) = EY [∂xF (Y ; x)]

Iteration Scheme (See, for instance, Asmussen & Glynn)

xn+1 = xn − an

∂xF (Yi ; xn))−1(

F (Yi ; xn)),

with an ∼ n−γ , γ ∈ (12 ,1) and Yi i.i.d. Also let xn ≡

∑nj=1 xi .

Then xn → x?, with f (x?) = 0, in distribution at rate n−1/2.

Numerical Results With Constant Potential BT = 5, ε = .15,104 Iterations, 103 Samples per Iteration, 102 Points in (0,T ) per Sample

100000

Numerical Results With Variable Potential BT = 5, ε = .15, δ = 1

2 × 10−4, r = 1 so H1 regularization104 Iterations, 104 Samples per Iteration, 102 Points in (0,T ) per Sample

100000

Model Comparison95% Confidence Intervals about the Mean Path

Constant B

0 1 2 3 4 5−0.2

Variable B

0 1 2 3 4 5−0.2

Outline

2 MAP ESTIMATORS

4 SAMPLING

5 CONCLUSIONS

MCMC: create an ergodic Markov chain u(k) which isinvariant for approximate target µ (or µN the approximationon RN ) so that

K∑k=1

f (u(k))→ Eµf

Recall µ0 = N(0,C0) and µ(du) ∝ exp(−Φ(u)

)µ0(du).

Recall the Tikhonov functional I(u) = 12‖C

− 12

0 u‖2 + Φ(u).

Standard Random Walk Algorithm

Metropolis, Rosenbluth, Teller and Teller,J. Chem. Phys. 1953.

Set k = 0 and Pick u(0).Propose v (k) = u(k) + βξ(k), ξ(k) ∼ N(0,C0).

Set u(k+1) = v (k) with proability a(u(k), v (k)).

Set u(k+1) = u(k) otherwise.k → k + 1.

Here a(u, v) = min1,exp(I(u)− I(v)

New Random Walk Algorithm

Cotter, Roberts, S and White, Stat. Sci. 2013.

Set k = 0 and Pick u(0).Propose v (k) =

√(1− β2)u(k) + βξ(k), ξ(k) ∼ N(0,C0).

Set u(k+1) = v (k) with proability a(u(k), v (k)).

Set u(k+1) = u(k) otherwise.k → k + 1.

Here a(u, v) = min1,exp(Φ(u)− Φ(v)

Example: Navier-Stokes Inversion for Forcing

10−5 10−4 10−3 10−2 10−1 100

SRWMH, ∆x = 0.100

SRWMH, ∆x = 0.050

SRWMH, ∆x = 0.020

SRWMH, ∆x = 0.010

SRWMH, ∆x = 0.005

SRWMH, ∆x = 0.002

10−5 10−4 10−3 10−2 10−1 100

RWMH, ∆x = 0.100

RWMH, ∆x = 0.050

RWMH, ∆x = 0.020

RWMH, ∆x = 0.010

RWMH, ∆x = 0.005

RWMH, ∆x = 0.002

Incompressible NSE onΩT = T2 × (0,∞) :

∂tv − ν4v + v · ∇v +∇p = u in ΩT ,∇ · v = 0 in ΩT ,

v |t=0 = v0 inT2.

yj,k = v(xj , tk )+ξj,k , ξj,k ∼ N(0, σ2I2×2).

y = G(u) + ξ, ξ ∼ N(0, σ2I).Prior OU process; Φ = 1

σ2 |y − G(u)|2.

Spectral Gaps

Theorem(Hairer, S, Vollmer, arXiv 2012.)

For the standard Random walk algorithm the spectral gapis bounded above by C N−

For the new Random walk algorithm the spectral gap isbounded below independently of dimension.

Outline

2 MAP ESTIMATORS

4 SAMPLING

5 CONCLUSIONS

What We Have Shown

We have shown that:

Common Structure: A range of problems requireextracting information from a probability measure on aHilbert space, having density with respect to a Gaussian.Algorithmic Approaches We have laid the foundations ofa range of computational methods related to this task.MAP Estimators Maximum a posteriori estimators can bedefined on Hilbert space; there is a link to Tikhonovregularization.Kullback-Leibler Approximation Kullback-Leiblerapproximation can be defined on Hilbert space and findingthe closest Gaussian results in a well-defined problem inthe calculus of variations.Sampling MCMC methods can be defined on Hilbertspace. Results in new algorithms robust to discretization.

http://homepages.warwick.ac.uk/∼masdr/

A.M. Stuart. Inverse problems: a Bayesian perspective.Acta Numerica 19(2010).

M. Dashti, K.J.H. Law, A.M. Stuart and J. Voss. MAPestimators and posterior consistency . . . .Inverse Problems, 29(2013), 095017. arxiv:1303.4795.

F. Pinski, G. Simpson, A.M. Stuart and H. Weber.Kullback-Leibler approximation for probability measures oninfinite dimensional spaces. In preparation.

S.L. Cotter , G.O. Roberts, A.M. Stuart and D. White.MCMC methods for Functions . . . . Statistical Science28(2013). arxiv:1202.0709.

M. Hairer, A.M.Stuart and S. Vollmer. Spectral gaps for aMetropolis-Hastings algorithm . . . . arxiv:1112.1392.

bayesian inversion: algorithms · bayesian inversion: algorithms andrew m stuart1 1mathematics...

Documents

delivery4d: an open–source model–based bayesian seismic...

uncertainty quanti cation in bayesian in-...

integration of seismic-pressure-petrophysics inversion of...

delivery: an open–source model–based bayesian seismic...

a bayesian inversion approach to recovering material...

bayesian inversion - joint inversions in geophysics ·...

varying dimensional bayesian acoustic waveform …...varying...

bayesian avo inversion and application to a case study

simultaneous inversion of multiple microseismic data for...

stochastic oceanographic-acoustic prediction and bayesian...

exploring noise models in approximate bayesian inversion

bayesian level set inversion - stony brook university · pdf...

inversion bayésienne avec tables de données...introduction...

[0.4cm] scaling limits in computational bayesian inversion...

bayesian level set inversion

wavelet-based bayesian inversion for tomographic problems...

downscaling multiple seismic inversion constraints to fine...

bayesian moment tensor inversion and uncertainty ......

bayesian optimization for full waveform inversion ·...

disintegration and bayesian inversion via string...