a few directions in data assimilation - eafit · directions in da 1. robust da 2. bayesian...

51
A few directions in data assimilation Adrian Sandu Computational Science Laboratory Computer Science Department Virginia Tech “Compute the Future!”

Upload: others

Post on 14-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

A few directions in data assimilation

Adrian Sandu

Computational Science LaboratoryComputer Science Department

Virginia Tech

“Compute the Future!”

Page 2: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Computational Science Laboratory: “Compute the Future!”

First IWDADM, Medellin/Baranquilla, 2019

Page 3: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Directions in DA

1. Robust DA2. Bayesian localization3. ML for localization4. ML for UQ5. Other

First IWDADM, Medellin/Baranquilla, 2019

Page 4: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Direction 1: Robust data assimilation

Page 5: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Data assimilation of satellite observations in globalCTM improves global ozone predictions

Example: data assimilation of satellite observations in global CTM improves global ozone predictions

“Compute the Future”. UQ @ Institut Mittag-Leffler, May 2016

TES ozone column observations

(NASA EOS Aura)

IONS-6 ozonesondedata for validation

Improved global ozone prediction, 1-15 Aug. 2007 {Singh, Sandu, et al, GMD 2010}

Figure: Past work: 3D/4D-Var systems for GEOS-Chem (NASA/Harvard),CMAQ (Environmental Protection Agency), STEM, partially WRF-Chem.

Robust Data Assimilation. Example. [2/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 6: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

What happens when observations surprise the model?

I Presence of outliers in data is a common occurrence.

I These outliers negatively affect the quality of the solution. Qualitycontrol becomes an important step.

I Data quality control by rejecting observations on the basis ofbackground departure statistics leads to the inability to capturesmall scales in the analysis {Tavolato and Isaksen, QJRMS 2015}.

I Robust data assimilation is needed to overcome this issue.

Robust Data Assimilation. Motivation for robust data assimilation. [3/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 7: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Robust data assimilation

(a) Gaussian/Laplace/Huber pdfs (b) L2/L1/Huber norms

I Huber Norm :

∥∥x∥∥

HUB=∑`

Lτ (x`) , Lτ (a) =

12

a2, for | a |≤ τ

τ

(| a | −1

), otherwise .

Robust Data Assimilation. Motivation for robust data assimilation. [4/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 8: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Motivation for robust data assimilation: resilience toobservation outliers

1. Consider the following case:I Background: xb = 10, B = 1.I Observations: y1 = 11, y2 = 12, R = 1.I Assuming Gaussian observation error the analysis reads:

xa = arg min ‖x− 10‖2B−1 + ‖x− 11‖2

R−1 + ‖x− 12‖2R−1

⇒ xa = 11.

2. With observation outlier: y1 = 11 and y2 = 20:Gaussian error assumption⇒ xa = 13.66

3. Assuming :I small error distribution is Gaussian,I large error distribution is Laplace,

leads to DA formulation in Huber norm (here τ = 2):

arg min ‖x− 10‖2B−1i

+∥∥∥R−1/2(x− 11)

∥∥∥HUB

+∥∥∥R−1/2(x− 20)

∥∥∥HUB

⇒ xa = 11.37.Robust Data Assimilation. Motivation for robust data assimilation. [5/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 9: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

The need for a rigorous approach to solve Huber normdata assimilation

Departure statistics for radiosondetemperatures is well described by

a Huber-norm distribution {E.Holm, L. Isaksen, C. Tavolato, E.

Andersson, ECMWF trainingcourse “Variational Quality

Control”, 2014}.

Current approaches only approximately solve the Huber norm DA:I {Tavolato and Isaksen, ECMWF letter 22, 2009; QJRMS 2015}

“Huberize norm” in 4D-Var: first adjust error covariances of observationsbased on their background departure statistics, then solve traditional4D-Var problem.

I {Roh, Szunyogh, et al MWR 2013}: “Huberized analysis” done byclipping EnKF innovations.

Robust Data Assimilation. Motivation for robust data assimilation. [6/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 10: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Robust strong-constraint 4D-Var data assimilation

I The robust strong-constraint 4D-Var problem:

minx0J (x0, z) :=

12‖x0 − xb

0‖2B−1

0+

12

N∑i=1

‖zi‖HUB

subject to: zi = R−1/2i [H(xi)− yi ], i = 1,2, · · · ,N,

xi =Mi−1,i(xi−1), i = 1,2, · · · ,N.

I Can be solved by either approach:I ADMM, orI half-quadratic algorithms.

I Each inner iteration requires the solution of a (modified) newL2-4D-Var problem.

Robust Data Assimilation. Robust 4D-Var. [7/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 11: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Traditional EnKF data assimilationI Ensemble space at time ti :

xi = xbi + Xb

i wi , Xbi ∈ RNvar×Nens .

I Traditional L2-EnKF: analysis mean weights minimize {Hunt et al,2007}:

wai := argmin

wJ (w) := (Nens − 1) ‖w‖22 +

∥∥H(xbi + Xb

i w)− yi∥∥2

R−1i.

I LETKF {Hunt et al, 2007} analysis ensemble weights:

Si :=((Nens − 1) I +

(Yb

i)T R−1

i Ybi

)−1= (Nens − 1)−1 Wi WT

i ,

wa〈`〉i = wa

i + Wi(:, `), xa〈`〉i = xb

i + Xbi wa〈`〉

i .

I Perturbed observations EnKF analysis ensemble weights:

wa〈`〉i ≈ Si

((Nens − 1)wb〈`〉

i + (Ybi )

T R−1i

(y〈`〉i −H(x

bi )))

.

Robust Data Assimilation. Robust EnKF. [8/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 12: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Robust EnKF data assimilation I

1. The Huber EnKF optimization problem in ensemble space:

min J (w , zi) = (Nens − 1) ‖w‖22 + ‖zi‖HUB

subject to zi = R−1/2i

[H(

xbi + Xb

i w)− yi

].

2. The problem is solved iteratively by half-quadratic minimization.Let the solution at iteration {k} be w{k}i ,u{k}i . We proceed asfollows.

3. Step 1: calculate (component-wise)

u{k+1}i = σ

(R−1/2

i

[H(

xbi + Xb

i w{k}i

)− yi

]),

σ(a) ={

1, |a| ≤ τ ;τ/|a|, |a| > τ.

Robust Data Assimilation. Robust EnKF. [9/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 13: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Robust EnKF data assimilation II4. Step 2: update the state x{k+1}

i = xbi + Xb

i w{k+1}i :

w{k+1}i = argmin

w(Nens − 1) ‖w‖22 + ‖H(x

bi + Xb

i w)− yi‖2R{k+1}i

−1

where R{k+1}i

−1 = R−1/2i diag

(u{k}/2

)R−1/2

i .

Apply again EnSRF with modified observation covariance matrix.5. Satisfactory convergence after M iterations. The analysis:

wai = w{M}i

S{M}i :=((Nens − 1)I +

(Yb

i)T R−1/2

i diag(

u{M}/2)

R−1/2i Yb

i

)−1

= (Nens − 1)−1 W{M}i W{M}iT ,

wa〈`〉i = wa

i + W{M}i (:, `).

Robust Data Assimilation. Robust EnKF. [10/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 14: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Numerical results with the Lorenz-96 model

0.1

1

10

100

0.4 0.8 1.2 1.6 2

RM

SE

Time

ForecastL2L1Huber - Half quadraticHuber - Shrinkage

(a) Observations with small random er-rors

0.1

1

10

100

0.4 0.8 1.2 1.6 2

RM

SE

Time

ForecastL2L1Huber - Half quadraticHuber - Shrinkage

(b) Observations with outliers

Figure: 3D-Var results for Lorenz-96. The frequency of observations is 0.1time units. Erroneous observations occur every 0.2 time units. The Hubernorm uses τ = 1.

Robust Data Assimilation. Lorenz results. [11/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 15: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Results with Shallow Water Equations on the sphere I

100

300

500

0 7200 14400 21600 28800

RMSE

Time

ForecastL2Huber - Half quadratic

(a) Observations with small randomerrors.

100

300

500

0 7200 14400 21600 28800

RMSE

Time

ForecastL2Huber - Half quadratic

(b) Observations with outliers.

Figure: 4D-Var results for the shallow water model on the sphere. The Huberthreshold is τ = 2.

Robust Data Assimilation. SWE numerical results. [12/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 16: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Results with Shallow Water Equations on the sphere II

200

400

600

7200 14400 21600 28800 36000 43200

RMSE

Time

ForecastL2L1Huber - Half quadratic

(a) Observations with small random er-rors.

200

400

600

7200 14400 21600 28800 36000 43200

RMSE

Time

ForecastL2L1Huber - Half quadratic

(b) Observations with outliers.

Figure: LETKF results for the shallow water model on the sphere. The Huberthreshold is τ = 1.

Robust Data Assimilation. SWE numerical results. [13/13]First IWDADM, Medellin/Baranquilla. CSL: “Compute the Future” (http://csl.cs.vt.edu)

Page 17: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Direction 2: Adaptive localization

via a Bayesian approach

Page 18: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

A Bayesian Approach to Multivariate AdaptiveLocalization

I Assume that variables xi and xj have the localization radii:

xi ↔ ri , xj ↔ rj .

I Multivariate localization matrix accounts for multiple radii:

ρ =[m(`(d(i , j)/ri), `(d(j , i)/rj)

)]1≤i,j≤n

The localization sub-matrix of state-space variables xi and xj :[`(0) ρi,jρj,i `(0)

]

A few directions in DA. Bayesian Localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 19: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

A Bayesian Approach to Adaptive LocalizationI Analysis step with υ = localization and inflation parameters.

xa = A(xb,y,υ),

I Assumption: radii have univariate gamma background distribution:

π(υ) =

g∏j=1

Γ(α(j), β(j)).

I Bayes’ identity:

π(x,υ | y) = π(y | x,υ)π(x | υ,y)π(υ).

I The negative log of posterior probability:

J (υ) = 12

∥∥A(xb,y,υ)− xb∥∥2Pb(υ)

−1 + 12

∥∥y− HA(xb,y,υ)∥∥2

R−1

+

g∑j=1

(β(j)υ(j) −

(α(j) − 1

)log(υ(j))).

A few directions in DA. Bayesian Localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 20: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Quasi-geostrophic Model Adaptive Localization

1 1.05 1.1 1.15

Inflation

0.2

0.4

0.6

0.8

1

1.2

RM

SE

Constant RadiusAdaptive Loc.

Figure: The constant localization radius varies in increments of 5 over therange [5,45]. Only the best results are plotted, and are used as the meanseeds for the adaptive algorithm. The adaptive Var(υ) takes one of the values1/2, 1, 2, and 4.

A few directions in DA. Bayesian Localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 21: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Quasi-geostrophic Model Adaptive Localization

0 500 1000 1500 2000 2500 3000

Step

0

5

10

15

20

25

30

35

40

45

50

Ra

diu

s

Figure: QGSO SAMPLE RADII. For α = 1.08, υ = 25, and Var(υ) = 4.

A few directions in DA. Bayesian Localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 22: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Direction 3: Adaptive localization

via a ML approach

Page 23: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Machine learning algorithms in aid of data assimilation

I The Ensemble Kalman Filter (EnKF) provides a practicalimplementation of the statistical solution of the data assimilationproblem.

I EnKF prone to sampling errors – typically, 30− 100 members formodel state space dimension 109 − 1012.

I Sampling errors manifest themselves as long-range spuriouscorrelations and considerably degrade performance.

I The quality of the resulting analysis and forecast is greatlyinfluenced by the choice of the localization function parameters,e.g., the radius of influence. The localization radius is generallytuned empirically to yield desirable results.

I Here we employ ML for tuning the localization function.

A few directions in DA. ML aids DA.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 24: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Approach to learn good localization radii

1. Identify important features of the model results and data sets:means of a subset of state variables that are only weaklycorrelated, and on blocks of the correlation matrix for variablesthat are highly correlated.

2. Define two complementary decision criteria:I root mean-squared error (RMSE) of the analysis, andI the uniformity of the rank histogram (Anderson, 1996).

3. A random forest approach.4. The first learning algorithm uses the same (scalar) localization

radius for all variables of the model. The value of this radiuschanges adaptively from one assimilation cycle to the next.

5. The second proposed learning algorithm provides a differentlocalization radius for each of the state variables, and these radiichange at each assimilation cycle.

A few directions in DA. ML for adaptive localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 25: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Application of ML-based localization to dataassimilation with a quasi-geostrophic model I

(a) Reference (b) Ensemble mean

Figure: Data assimilation with the QG model. Stream function.

A few directions in DA. ML for adaptive localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 26: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Application of ML-based localization to dataassimilation with a quasi-geostrophic model II

(a) Training phase (b) Testing phase

Figure: Data assimilation with the QG model. The analysis results withadaptive-in-time localization outperform those obtained with the hand-tunedradius.

A few directions in DA. ML for adaptive localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 27: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Application of ML-based localization to dataassimilation with a quasi-geostrophic model III

Figure: Data assimilation with the QG model. The evolution of the localizationradius in time over all 100 assimilation cycles. The adaptive algorithmchanges the radius considerably over the course of the simulation.

A few directions in DA. ML for adaptive localization.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 28: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Machine learning algorithms in aid of uncertaintyanalysis in NWP

I Numerical weather prediction models incorporate a variety ofphysical processes, each described by multiple alternativephysical schemes with specific parameters.

I The selection of the physical schemes and parameters duringmodel configuration can significantly impact the accuracy of modelforecasts. No combination works best under all conditions.

I Seek to understand the interplay between the choice of physicsand the accuracy of the resulting forecasts under differentconditions.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 29: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Formulation uncertainty problems

1. Estimate the systematic model error in certain quantities ofinterest at future times, and to use this information in order toimprove the WRF forecast:

φerror(θ,Λ(∆past)

)≈ Λ(∆forecast).

2. Identify the specific physical processes that contribute most to theforecast uncertainty in the quantity of interest under specifiedmeteorological conditions:

φphysics(

Λ(∆past))≈ θ.

The mappings are “learned” using artificial neural networks (ANN) andrandom forests (RF) via Scikit-learn.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 30: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Direction 4: ML for uncertainty quantification

in atmospheric modeling

Page 31: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Case study: precipitation forecasts with WRF

(a) NCEP analysis at 12PM provides aproxy for truth

(b) WRF forecast at 12PM

Figure: The analysis and the WRF forecast for the simulation time 12PM on5/1/2017. Accumulated precipitation in millimeters.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 32: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Problem one: predicting precipitation forecast errors(over Virginia) based on physics configuration

(a) NCEP analysis (b) Original WRF prediction

Figure: WRF prediction and NCEP analysis at 6PM on 5/1/2017. Zoom-inpanels show the predictions over Virginia.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 33: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Problem one: predicting precipitation forecast errors(over Virginia) based on physics configuration

(a) Discrepancy between originalWRF forecast and NCEP analysis

(b) Discrepancy between correctedWRF forecast and NCEP analysis

Figure: Discrepancy between WRF forecasts and the NCEP analysis overVirginia at 6PM on 5/1/2017. Error correction improves analysis.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 34: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Problem two: physical processes that contribute mostto WRF precipitation forecast uncertainty

Figure: Frequency of change in the physics with respect to change in theinput data from ∆test

t=12PM to ∆testt=12PM/2.

A few directions in DA. ML for Uncertainty Quantification.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 35: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Other Directions

Page 36: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Data assimilation with WRFDA (04/27/2011-18:00 to 04/28/2011- 00:00; CONUS, 60Km)

First IWDADM, Medellin/Baranquilla, 2019

Optimal initial V, first level Optimal initial Temp, first level

{Rao, Stefanescu, and Sandu, 2015}

Page 37: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Atmospheric motion vector wind data from geostationary satellite (GEOAMV): V-observations

First IWDADM, Medellin/Baranquilla, 2019

Data errors

Contribution of data errors to errorin analysis QoI

Data error impact factors

{Rao, Stefanescu, and Sandu, 2015}

Page 38: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Model errors introduced by excluding the microphysics package. Data for 11 PM on 27th April, 2011

First IWDADM, Medellin/Baranquilla, 2019

Temperature model errors(first level)

Impact factor of Temp model errors on analysis QoI

{Rao, Stefanescu, and Sandu, 2015}

Page 39: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Example: data assimilation of satellite observations in global CTM improves global ozone predictions

First IWDADM, Medellin/Baranquilla, 2019

TES ozone column observations

(NASA EOS Aura)

IONS-6 ozonesondedata for validation

Improved global ozone prediction, 1-15 Aug. 2007 {Singh, Sandu, et al, GMD 2010}

Page 40: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Estimating information content of data. Top and bottom 20% data points by DFS

First IWDADM, Medellin/Baranquilla, 2019

{Sandu et al., SIUQ 2013}

Page 41: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Assimilation of the top 20% most informative data points produces analyses similar to using all data

First IWDADM, Medellin/Baranquilla, 2019

0 100 200 300

100

200

300

400

500

1000

Ozone Concentrations [ ppbv ]

Pre

ssu

re [

hP

a ]

Ozonesonde

Signal (top 20%)

Signal (bot 20%)

All Obs

Free model run

−40 −20 0 20

100

200

300

400

500

1000

Relative Difference [ % ]

Pre

ssu

re [

hP

a ]

20 30 40 50 60 70

100

200

300

400

500

1000

Standard Deviation [ % ]

Pre

ssu

re [

hP

a ]

{Sandu et al., SIUQ 2013}

Page 42: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Summary: Directions in DARobust DA:V. Rao*, A. Sandu, M. Ng, and E.D. Ni~no-Ruiz*: “Robust Data Assimilation Using L1 and Huber Norms." SIAM Journal on Scientific Computing, #M104591, Vol. 39, Issue 3, pp. B548-B570, 2017.

Bayesian localization:A. Popov* and A. Sandu: \A Bayesian Approach to Multivariate Adaptive Localization in Ensemble-Based Data Assimilation with Time-Dependent Extensions." Nonlinear Processes in Geophysics, Vol. 26, pp. 109-122, 2019.

ML for localization:A. Moosavi*, A. Attia*, and A. Sandu: “Tuning Covariance Localization using Machine Learning". Paper 394, ICCS 2019, Faro, Algarve, Portugal, June 2019.

ML for UQ:A. Moosavi*, V. Hebbur Venkata Subba Rao*, and A. Sandu: “A Learning-Based Approach for Uncertainty Analysis in Numerical Weather Prediction". Paper 445, ICCS 2019, Faro, Algarve, Portugal, June 2019.

Other:http://csl.cs.vt.edu

First IWDADM, Medellin/Baranquilla, 2019

Page 43: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

First IWDADM, Medellin/Baranquilla, 2019

Computational Science Laboratory: “Compute the Future!”

Page 44: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Other Directions: Sampling the posterior

distribution

Page 45: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Non-Gaussian data assimilation by directly samplingthe posteriorI Data Assimilation:

Model + Prior + Observations︸ ︷︷ ︸with associated uncertainties

→ Best Description of the state︸ ︷︷ ︸Ensemble+Variational

I Goal: Ensemble representation of the posterior PDF for thegeneral non-Gaussian and non-linear cases.

Sample the posterior PDF

I MCMC (Gold Standard) : popular and guaranteed to converge butmay take very long time!

I Accelerated MCMC: Hamiltonian Monte Carlo (HMC) Not entirely new!

Recursively use HMC for Filtering and Smoothing

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 46: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Hamiltonian Monte Carlo+ The Hamiltonian:

H(p,x) =12

pT M−1p− log(π(x)) =12

pT M−1p︸ ︷︷ ︸kinetic energy

+ J (x)︸ ︷︷ ︸potential energy

+ The Hamiltonian dynamics (symplectic integrator needed):dxdt

= ∇p H = M−1p ,dpdt

= −∇x H = −∇xJ (x)

+ The canonical PDF of (p,x):

∝ exp (−H(p,x)) = exp

(−1

2pT M−1p− J (x)

)= exp

(−1

2pT M−1p

)π(x)

I To draw samples {x(e)}e=1,2,... from ∝ π(x):- x: a position variable,- Add ”momentum“ p ∼ N (0,M) and sample from the joint PDF of

both variables.→ Generate a MC with invariant distribution ∝ exp (−H(p,x)).

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 47: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

HMC Sampling Algorithm to sample from ∝ π(x):I Initialize the CHAIN← x0

I For k = 0,1, . . .

1- Draw pk ∼ N (0,M)

2- Use (Verlet, two-stage, three-stage,...) to propose a new state:

(p∗,x∗) = ΦT(pk ,xk

)3- Acceptance Probability:

a(k) = 1 ∧ e−∆H , ∆H = H(p∗,x∗)− H(pk ,xk )

4- Discard both p∗, pk

5-

xk+1 =

{x∗ with probability a(k)

xk with probability 1− a(k)

I Start sampling after CONVERGENCE

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 48: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

HMC sampling filterI Bayes’ Theorem:

Pa(x) = P(x|y) =P(y|x)Pb(x)

P(y)∝ P(y|x)Pb(x)

I Gaussian mixture framework:

Pb(x) ∝N∑

i=1

πi exp

(−1

2(x− xb

i )T B−1i (x− xb

i )

),

P(y|x) ∝ exp

(−1

2(H(x)− y)T R−1(H(x)− y)

).

I Posterior PDF:

Pa(x) ∝

π(x)︷ ︸︸ ︷exp (−J (x))

J (x) = − log P(y|x)− log Pb(x)

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 49: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

HMC Sampling FilterGiven observation yk at time point tk ,

Assimilation cycle

I Forecast: given an analysis ensemble {xak−1(e)}e=1,2,...,N at time

tk−1:xb

k (e) =Mtk−1→tk(xa

k−1(e)), e = 1,2, . . . ,N

I Analysis: use HMC to sample from ∝ π(x) = exp (−J (x)):

1- Set the elements of Hamiltonian and the integrator:- M (e.g. constant diagonal, diagonal of B−1

k )- Time (artificial) step settings (T = hm) of the symplectic integrator

2- Initialize the chain (x0): (e.g. Forecast, EnKF, 3D-Var ,... )

4- Generate the MC and sample {xak (e)}e=1,2,...,N upon convergence.

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 50: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

Results using Lorenz-96: discontinuous quadratic H

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

1

2

3

4

5

6

Time

RMSE

Forecast

IEnKF

(a) Two-stage integrator0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(b) x1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(c) x2

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

1

2

3

4

5

6

Time

RMSE

IEnKF

Forecast

(d) Three-stage integrator0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(e) x1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

(f) x2

Figure: h = 0.01, m = 10. 50 burn-in steps, and 10 inter-chain steps.N = 30.

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)

Page 51: A few directions in data assimilation - EAFIT · Directions in DA 1. Robust DA 2. Bayesian localization 3. ML for localization 4. ML for UQ 5. Other First IWDADM, Medellin/Baranquilla,

HMC sampling smother results with Lorenz-96

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 210

−2

10−1

100

101

Time

RMSE

HMC

4D−Var

Forecast

(a) Linear observation operator

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 210

−2

10−1

100

101

Time

RMSE

HMC

4D−Var

Forecast

(b) Quadratic observation operator

Figure: Each second component only observed. Two-stage integrator ,h = 0.01, m = 10. 30 burn-in steps, 10 inter-chain steps.

A few directions in DA. DA via posterior sampling.First IWDADM, Medellin/Baranquilla. Computational Science Lab (http://csl.cs.vt.edu)