a few directions in data assimilation - eafit · directions in da 1. robust da 2. bayesian...

A few directions in data assimilation

Adrian Sandu

Computational Science LaboratoryComputer Science Department

Virginia Tech

“Compute the Future!”

Computational Science Laboratory: “Compute the Future!”

First IWDADM, Medellin/Baranquilla, 2019

Directions in DA

1. Robust DA2. Bayesian localization3. ML for localization4. ML for UQ5. Other


Direction 1: Robust data assimilation

Data assimilation of satellite observations in globalCTM improves global ozone predictions

Example: data assimilation of satellite observations in global CTM improves global ozone predictions

“Compute the Future”. UQ @ Institut Mittag-Leffler, May 2016

TES ozone column observations

(NASA EOS Aura)

IONS-6 ozonesondedata for validation

Improved global ozone prediction, 1-15 Aug. 2007 {Singh, Sandu, et al, GMD 2010}

Figure: Past work: 3D/4D-Var systems for GEOS-Chem (NASA/Harvard),CMAQ (Environmental Protection Agency), STEM, partially WRF-Chem.

Robust Data Assimilation. Example. [2/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

What happens when observations surprise the model?

I Presence of outliers in data is a common occurrence.

I These outliers negatively affect the quality of the solution. Qualitycontrol becomes an important step.

I Data quality control by rejecting observations on the basis ofbackground departure statistics leads to the inability to capturesmall scales in the analysis {Tavolato and Isaksen, QJRMS 2015}.

I Robust data assimilation is needed to overcome this issue.

Robust Data Assimilation. Motivation for robust data assimilation. [3/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Robust data assimilation

(a) Gaussian/Laplace/Huber pdfs (b) L2/L1/Huber norms

I Huber Norm :

∥∥x∥∥

HUB=∑`

Lτ (x`) , Lτ (a) =

12

a2, for | a |≤ τ

τ

(| a | −1

2τ

), otherwise .


Motivation for robust data assimilation: resilience toobservation outliers

1. Consider the following case:I Background: xb = 10, B = 1.I Observations: y1 = 11, y2 = 12, R = 1.I Assuming Gaussian observation error the analysis reads:

xa = arg min ‖x− 10‖2B−1 + ‖x− 11‖2

R−1 + ‖x− 12‖2R−1

⇒ xa = 11.

2. With observation outlier: y1 = 11 and y2 = 20:Gaussian error assumption⇒ xa = 13.66

3. Assuming :I small error distribution is Gaussian,I large error distribution is Laplace,

leads to DA formulation in Huber norm (here τ = 2):

arg min ‖x− 10‖2B−1i

+∥∥∥R−1/2(x− 11)

∥∥∥HUB

+∥∥∥R−1/2(x− 20)

∥∥∥HUB

⇒ xa = 11.37.Robust Data Assimilation. Motivation for robust data assimilation. [5/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

The need for a rigorous approach to solve Huber normdata assimilation

Departure statistics for radiosondetemperatures is well described by

a Huber-norm distribution {E.Holm, L. Isaksen, C. Tavolato, E.

Andersson, ECMWF trainingcourse “Variational Quality

Control”, 2014}.

Current approaches only approximately solve the Huber norm DA:I {Tavolato and Isaksen, ECMWF letter 22, 2009; QJRMS 2015}

“Huberize norm” in 4D-Var: first adjust error covariances of observationsbased on their background departure statistics, then solve traditional4D-Var problem.

I {Roh, Szunyogh, et al MWR 2013}: “Huberized analysis” done byclipping EnKF innovations.


Robust strong-constraint 4D-Var data assimilation

I The robust strong-constraint 4D-Var problem:

minx0J (x0, z) :=

12‖x0 − xb

0‖2B−1

0+

12

N∑i=1

‖zi‖HUB

subject to: zi = R−1/2i [H(xi)− yi ], i = 1,2, · · · ,N,

xi =Mi−1,i(xi−1), i = 1,2, · · · ,N.

I Can be solved by either approach:I ADMM, orI half-quadratic algorithms.

I Each inner iteration requires the solution of a (modified) newL2-4D-Var problem.

Robust Data Assimilation. Robust 4D-Var. [7/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Traditional EnKF data assimilationI Ensemble space at time ti :

xi = xbi + Xb

i wi , Xbi ∈ RNvar×Nens .

I Traditional L2-EnKF: analysis mean weights minimize {Hunt et al,2007}:

wai := argmin

wJ (w) := (Nens − 1) ‖w‖22 +

∥∥H(xbi + Xb

i w)− yi∥∥2

R−1i.

I LETKF {Hunt et al, 2007} analysis ensemble weights:

Si :=((Nens − 1) I +

(Yb

i)T R−1

i Ybi

)−1= (Nens − 1)−1 Wi WT

i ,

wa〈`〉i = wa

i + Wi(:, `), xa〈`〉i = xb

i + Xbi wa〈`〉

i .

I Perturbed observations EnKF analysis ensemble weights:

wa〈`〉i ≈ Si

((Nens − 1)wb〈`〉

i + (Ybi )

T R−1i

(y〈`〉i −H(x

bi )))

.

Robust Data Assimilation. Robust EnKF. [8/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Robust EnKF data assimilation I

1. The Huber EnKF optimization problem in ensemble space:

min J (w , zi) = (Nens − 1) ‖w‖22 + ‖zi‖HUB

subject to zi = R−1/2i

[H(

xbi + Xb

i w)− yi

].

2. The problem is solved iteratively by half-quadratic minimization.Let the solution at iteration {k} be w{k}i ,u{k}i . We proceed asfollows.

3. Step 1: calculate (component-wise)

u{k+1}i = σ

(R−1/2

i

[H(

xbi + Xb

i w{k}i

)− yi

]),

σ(a) ={

1, |a| ≤ τ ;τ/|a|, |a| > τ.


Robust EnKF data assimilation II4. Step 2: update the state x{k+1}

i = xbi + Xb

i w{k+1}i :

w{k+1}i = argmin

w(Nens − 1) ‖w‖22 + ‖H(x

bi + Xb

i w)− yi‖2R{k+1}i

−1

where R{k+1}i

−1 = R−1/2i diag

(u{k}/2

)R−1/2

i .

Apply again EnSRF with modified observation covariance matrix.5. Satisfactory convergence after M iterations. The analysis:

wai = w{M}i

S{M}i :=((Nens − 1)I +

(Yb

i)T R−1/2

i diag(

u{M}/2)

R−1/2i Yb

i

)−1

= (Nens − 1)−1 W{M}i W{M}iT ,

wa〈`〉i = wa

i + W{M}i (:, `).


Numerical results with the Lorenz-96 model

0.1

1

10

100

0.4 0.8 1.2 1.6 2

RM

SE

Time

ForecastL2L1Huber - Half quadraticHuber - Shrinkage

(a) Observations with small random er-rors

0.1

1

10

100

0.4 0.8 1.2 1.6 2

RM

SE

Time

ForecastL2L1Huber - Half quadraticHuber - Shrinkage

(b) Observations with outliers

Figure: 3D-Var results for Lorenz-96. The frequency of observations is 0.1time units. Erroneous observations occur every 0.2 time units. The Hubernorm uses τ = 1.

Robust Data Assimilation. Lorenz results. [11/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Results with Shallow Water Equations on the sphere I

100

300

500

0 7200 14400 21600 28800

RMSE

Time

ForecastL2Huber - Half quadratic

(a) Observations with small randomerrors.

100

300

500

0 7200 14400 21600 28800

RMSE

Time

ForecastL2Huber - Half quadratic

(b) Observations with outliers.

Figure: 4D-Var results for the shallow water model on the sphere. The Huberthreshold is τ = 2.

Robust Data Assimilation. SWE numerical results. [12/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Results with Shallow Water Equations on the sphere II

200

400

600

7200 14400 21600 28800 36000 43200

RMSE

Time

ForecastL2L1Huber - Half quadratic

(a) Observations with small random er-rors.

200

400

600

7200 14400 21600 28800 36000 43200

RMSE

Time

ForecastL2L1Huber - Half quadratic

(b) Observations with outliers.

Figure: LETKF results for the shallow water model on the sphere. The Huberthreshold is τ = 1.

Robust Data Assimilation. SWE numerical results. [13/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Direction 2: Adaptive localization

via a Bayesian approach

A Bayesian Approach to Multivariate AdaptiveLocalization

I Assume that variables xi and xj have the localization radii:

xi ↔ ri , xj ↔ rj .

I Multivariate localization matrix accounts for multiple radii:

ρ =[m(`(d(i , j)/ri), `(d(j , i)/rj)

)]1≤i,j≤n

The localization sub-matrix of state-space variables xi and xj :[`(0) ρi,jρj,i `(0)

]

A few directions in DA. Bayesian Localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

A Bayesian Approach to Adaptive LocalizationI Analysis step with υ = localization and inflation parameters.

xa = A(xb,y,υ),

I Assumption: radii have univariate gamma background distribution:

π(υ) =

g∏j=1

Γ(α(j), β(j)).

I Bayes’ identity:

π(x,υ | y) = π(y | x,υ)π(x | υ,y)π(υ).

I The negative log of posterior probability:

J (υ) = 12

∥∥A(xb,y,υ)− xb∥∥2Pb(υ)

−1 + 12

∥∥y− HA(xb,y,υ)∥∥2

R−1

+

g∑j=1

(β(j)υ(j) −

(α(j) − 1

)log(υ(j))).


Quasi-geostrophic Model Adaptive Localization

1 1.05 1.1 1.15

Inflation

0.2

0.4

0.6

0.8

1

1.2

RM

SE

Constant RadiusAdaptive Loc.

Figure: The constant localization radius varies in increments of 5 over therange [5,45]. Only the best results are plotted, and are used as the meanseeds for the adaptive algorithm. The adaptive Var(υ) takes one of the values1/2, 1, 2, and 4.


Quasi-geostrophic Model Adaptive Localization

0 500 1000 1500 2000 2500 3000

Step

0

5

10

15

20

25

30

35

40

45

50

Ra

diu

s

Figure: QGSO SAMPLE RADII. For α = 1.08, υ = 25, and Var(υ) = 4.


Direction 3: Adaptive localization

via a ML approach

Machine learning algorithms in aid of data assimilation

I The Ensemble Kalman Filter (EnKF) provides a practicalimplementation of the statistical solution of the data assimilationproblem.

I EnKF prone to sampling errors – typically, 30− 100 members formodel state space dimension 109 − 1012.

I Sampling errors manifest themselves as long-range spuriouscorrelations and considerably degrade performance.

I The quality of the resulting analysis and forecast is greatlyinfluenced by the choice of the localization function parameters,e.g., the radius of influence. The localization radius is generallytuned empirically to yield desirable results.

I Here we employ ML for tuning the localization function.

A few directions in DA. ML aids DA.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Approach to learn good localization radii

1. Identify important features of the model results and data sets:means of a subset of state variables that are only weaklycorrelated, and on blocks of the correlation matrix for variablesthat are highly correlated.

2. Define two complementary decision criteria:I root mean-squared error (RMSE) of the analysis, andI the uniformity of the rank histogram (Anderson, 1996).

3. A random forest approach.4. The first learning algorithm uses the same (scalar) localization

radius for all variables of the model. The value of this radiuschanges adaptively from one assimilation cycle to the next.

5. The second proposed learning algorithm provides a differentlocalization radius for each of the state variables, and these radiichange at each assimilation cycle.

A few directions in DA. ML for adaptive localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Application of ML-based localization to dataassimilation with a quasi-geostrophic model I

(a) Reference (b) Ensemble mean

Figure: Data assimilation with the QG model. Stream function.


Application of ML-based localization to dataassimilation with a quasi-geostrophic model II

(a) Training phase (b) Testing phase

Figure: Data assimilation with the QG model. The analysis results withadaptive-in-time localization outperform those obtained with the hand-tunedradius.


Application of ML-based localization to dataassimilation with a quasi-geostrophic model III

Figure: Data assimilation with the QG model. The evolution of the localizationradius in time over all 100 assimilation cycles. The adaptive algorithmchanges the radius considerably over the course of the simulation.


Machine learning algorithms in aid of uncertaintyanalysis in NWP

I Numerical weather prediction models incorporate a variety ofphysical processes, each described by multiple alternativephysical schemes with specific parameters.

I The selection of the physical schemes and parameters duringmodel configuration can significantly impact the accuracy of modelforecasts. No combination works best under all conditions.

I Seek to understand the interplay between the choice of physicsand the accuracy of the resulting forecasts under differentconditions.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Formulation uncertainty problems

1. Estimate the systematic model error in certain quantities ofinterest at future times, and to use this information in order toimprove the WRF forecast:

φerror(θ,Λ(∆past)

)≈ Λ(∆forecast).

2. Identify the specific physical processes that contribute most to theforecast uncertainty in the quantity of interest under specifiedmeteorological conditions:

φphysics(

Λ(∆past))≈ θ.

The mappings are “learned” using artificial neural networks (ANN) andrandom forests (RF) via Scikit-learn.


Direction 4: ML for uncertainty quantification

in atmospheric modeling

Case study: precipitation forecasts with WRF

(a) NCEP analysis at 12PM provides aproxy for truth

(b) WRF forecast at 12PM

Figure: The analysis and the WRF forecast for the simulation time 12PM on5/1/2017. Accumulated precipitation in millimeters.


Problem one: predicting precipitation forecast errors(over Virginia) based on physics configuration

(a) NCEP analysis (b) Original WRF prediction

Figure: WRF prediction and NCEP analysis at 6PM on 5/1/2017. Zoom-inpanels show the predictions over Virginia.


Problem one: predicting precipitation forecast errors(over Virginia) based on physics configuration

(a) Discrepancy between originalWRF forecast and NCEP analysis

(b) Discrepancy between correctedWRF forecast and NCEP analysis

Figure: Discrepancy between WRF forecasts and the NCEP analysis overVirginia at 6PM on 5/1/2017. Error correction improves analysis.


Problem two: physical processes that contribute mostto WRF precipitation forecast uncertainty

Figure: Frequency of change in the physics with respect to change in theinput data from ∆test

t=12PM to ∆testt=12PM/2.


Other Directions

Data assimilation with WRFDA (04/27/2011-18:00 to 04/28/2011- 00:00; CONUS, 60Km)


Optimal initial V, first level Optimal initial Temp, first level

{Rao, Stefanescu, and Sandu, 2015}

Atmospheric motion vector wind data from geostationary satellite (GEOAMV): V-observations


Data errors

Contribution of data errors to errorin analysis QoI

Data error impact factors


Model errors introduced by excluding the microphysics package. Data for 11 PM on 27th April, 2011


Temperature model errors(first level)

Impact factor of Temp model errors on analysis QoI


Example: data assimilation of satellite observations in global CTM improves global ozone predictions


TES ozone column observations

(NASA EOS Aura)

IONS-6 ozonesondedata for validation

Improved global ozone prediction, 1-15 Aug. 2007 {Singh, Sandu, et al, GMD 2010}

Estimating information content of data. Top and bottom 20% data points by DFS


{Sandu et al., SIUQ 2013}

Assimilation of the top 20% most informative data points produces analyses similar to using all data


0 100 200 300

100

200

300

400

500

1000

Ozone Concentrations [ ppbv ]

Pre

ssu

re [

hP

a ]

Ozonesonde

Signal (top 20%)

Signal (bot 20%)

All Obs

Free model run

−40 −20 0 20

100

200

300

400

500

1000

Relative Difference [ % ]

Pre

ssu

re [

hP

a ]

20 30 40 50 60 70

100

200

300

400

500

1000

Standard Deviation [ % ]

Pre

ssu

re [

hP

a ]

{Sandu et al., SIUQ 2013}

Summary: Directions in DARobust DA:V. Rao*, A. Sandu, M. Ng, and E.D. Ni~no-Ruiz*: “Robust Data Assimilation Using L1 and Huber Norms." SIAM Journal on Scientific Computing, #M104591, Vol. 39, Issue 3, pp. B548-B570, 2017.

Bayesian localization:A. Popov* and A. Sandu: \A Bayesian Approach to Multivariate Adaptive Localization in Ensemble-Based Data Assimilation with Time-Dependent Extensions." Nonlinear Processes in Geophysics, Vol. 26, pp. 109-122, 2019.

ML for localization:A. Moosavi*, A. Attia*, and A. Sandu: “Tuning Covariance Localization using Machine Learning". Paper 394, ICCS 2019, Faro, Algarve, Portugal, June 2019.

ML for UQ:A. Moosavi*, V. Hebbur Venkata Subba Rao*, and A. Sandu: “A Learning-Based Approach for Uncertainty Analysis in Numerical Weather Prediction". Paper 445, ICCS 2019, Faro, Algarve, Portugal, June 2019.

Other:http://csl.cs.vt.edu



Computational Science Laboratory: “Compute the Future!”

Other Directions: Sampling the posterior

distribution

Non-Gaussian data assimilation by directly samplingthe posteriorI Data Assimilation:

Model + Prior + Observations︸︷︷︸with associated uncertainties

→ Best Description of the state︸︷︷︸Ensemble+Variational

I Goal: Ensemble representation of the posterior PDF for thegeneral non-Gaussian and non-linear cases.

Sample the posterior PDF

I MCMC (Gold Standard) : popular and guaranteed to converge butmay take very long time!

I Accelerated MCMC: Hamiltonian Monte Carlo (HMC) Not entirely new!

Recursively use HMC for Filtering and Smoothing

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Hamiltonian Monte Carlo+ The Hamiltonian:

H(p,x) =12

pT M−1p− log(π(x)) =12

pT M−1p︸︷︷︸kinetic energy

+ J (x)︸︷︷︸potential energy

+ The Hamiltonian dynamics (symplectic integrator needed):dxdt

= ∇p H = M−1p ,dpdt

= −∇x H = −∇xJ (x)

+ The canonical PDF of (p,x):

∝ exp (−H(p,x)) = exp

(−1

2pT M−1p− J (x)

)= exp

(−1

2pT M−1p

)π(x)

I To draw samples {x(e)}e=1,2,... from ∝ π(x):- x: a position variable,- Add ”momentum“ p ∼ N (0,M) and sample from the joint PDF of

both variables.→ Generate a MC with invariant distribution ∝ exp (−H(p,x)).


HMC Sampling Algorithm to sample from ∝ π(x):I Initialize the CHAIN← x0

I For k = 0,1, . . .

1- Draw pk ∼ N (0,M)

2- Use (Verlet, two-stage, three-stage,...) to propose a new state:

(p∗,x∗) = ΦT(pk ,xk

)3- Acceptance Probability:

a(k) = 1 ∧ e−∆H , ∆H = H(p∗,x∗)− H(pk ,xk )

4- Discard both p∗, pk

5-

xk+1 =

{x∗ with probability a(k)

xk with probability 1− a(k)

I Start sampling after CONVERGENCE


HMC sampling filterI Bayes’ Theorem:

Pa(x) = P(x|y) =P(y|x)Pb(x)

P(y)∝ P(y|x)Pb(x)

I Gaussian mixture framework:

Pb(x) ∝N∑

i=1

πi exp

(−1

2(x− xb

i )T B−1i (x− xb

i )

),

P(y|x) ∝ exp

(−1

2(H(x)− y)T R−1(H(x)− y)

).

I Posterior PDF:

Pa(x) ∝

π(x)︷︸︸︷exp (−J (x))

J (x) = − log P(y|x)− log Pb(x)


HMC Sampling FilterGiven observation yk at time point tk ,

Assimilation cycle

I Forecast: given an analysis ensemble {xak−1(e)}e=1,2,...,N at time

tk−1:xb

k (e) =Mtk−1→tk(xa

k−1(e)), e = 1,2, . . . ,N

I Analysis: use HMC to sample from ∝ π(x) = exp (−J (x)):

1- Set the elements of Hamiltonian and the integrator:- M (e.g. constant diagonal, diagonal of B−1

k )- Time (artificial) step settings (T = hm) of the symplectic integrator

2- Initialize the chain (x0): (e.g. Forecast, EnKF, 3D-Var ,... )

4- Generate the MC and sample {xak (e)}e=1,2,...,N upon convergence.


Results using Lorenz-96: discontinuous quadratic H

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

1

2

3

4

5

6

Time

RMSE

Forecast

IEnKF

(a) Two-stage integrator0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(b) x1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(c) x2

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

1

2

3

4

5

6

Time

RMSE

IEnKF

Forecast

(d) Three-stage integrator0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(e) x1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(f) x2

Figure: h = 0.01, m = 10. 50 burn-in steps, and 10 inter-chain steps.N = 30.


HMC sampling smother results with Lorenz-96

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 210

−2

10−1

100

101

Time

RMSE

HMC

4D−Var

Forecast

(a) Linear observation operator

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 210

−2

10−1

100

101

Time

RMSE

HMC

4D−Var

Forecast

(b) Quadratic observation operator

Figure: Each second component only observed. Two-stage integrator ,h = 0.01, m = 10. 30 burn-in steps, 10 inter-chain steps.


a few directions in data assimilation - eafit · directions in da 1. robust da 2. bayesian...

Documents