enkf overview and theory jeff whitaker [email protected]@noaa.gov 1

1

EnKF Overview and Theory

Jeff Whitaker [email protected]

mailto:[email protected]

2

The Numerical Weather Prediction Process

Forecast Model

06 UTCobs

Analysis

Data Assimilation

Forecast Model

00 UTCobs

Analysis

Data Assimilation

ForecastForecast

• Analyses and forecasts become more accurate when:– Observations, forecast model and/or data assimilation components

improve.– Forecast model carries information from past observations.

The Data Assimilation Process

Background forecast xb with uncertainty Pb

Obs yo with uncertainty R

Bayes Thm + Gaussian assumption (Kalman Filter)

xa = xb + K(yo-Hxb)

where K=PbHT(HPbHT + R)-1

Pa = (I-KH)Pb

Forecast modelxa xb, Pa Pb

Cycling xa and xb easy, but Pb and Pa are huge matrices!

The EnKF approach

• Instead of evolving the full error covariance matrix (Pa), evolve a random sample (via an ensemble forecast).

• In the DA step, Pb is a sample estimate from the forecast ensemble, and each ensemble member is updated individually.

The EnKF DA Process

Ensemble of background forecasts xb

Obs yo with uncertainty R

EnKF – estimate Pb from ensembleupdate every member with

xa = xb + K(yo-Hxb)

where K =PbHT(HPbHT + R)-1

Forecast modelxa xb

for each member

Cycle every ensemble member instead of propagating Pa

Two categories of EnKF• ‘stochastic’ EnKF (original formulation by Houtekamer and Mitchell,

1998 MWR) treats obs as ensemble by adding N(0,R) noise.

– Every member updated with the same familiar KF equation (simple!)

– Used by Environment Canada.

• ‘deterministic’ EnKF (ETKF, Bishop et al 2001 MWR; EnSRF, Whitaker and Hamill 2002 MWR) avoids perturbing the obs by constructing analysis perturbations so that Pa consistent with KF is obtained.

– Mean updated with Kalman update equation.

– Perturbations updated differently.

– More accurate than stochastic approach for small ensembles.

Computational shortcuts in EnKF:(1) Simplifying Kalman gain calculation

The key here is that the huge matrix Pb is never explicitly formed7

Computational shortcuts in EnKF:(2) serial processing of observations (requires observation error

covariance R to be diagonal)

EnKFBackground

forecasts

Observations1 and 2

Analyses

EnKFBackground

forecasts

Observation1

Analysesafter obs 1 EnKF

Observation2

Analyses

Method 1

Method 2

8

NOAA EnKF implements two algorithms

• Serial EnKF (observations processed one at a time) – Both stochastic and deterministic options

available.• Local Ensemble Transform Filter (LETKF)– Update computed with all obs at once, but

matrices are kept small by updating each grid point independently using only nearby observations.

10

LETKF Algorithm

Consequences of sampling error

• Ensemble sizes we can afford are O(100-1000). Rank of full Pb is at least O(106) for current global ensemble resolution.

• Pb is estimated from XbXbT, so most of eigenvalues are zero.

• Errors in individual elements of Pb are O(Ne-1/2), but

correlated across elements.• EnKF fails miserably if raw sample covariance used.– Pa grossly underestimated, spread collapses, data

assimilation ignores all observations.

Covariance localization – the secret sauce that makes it all work.

• Basic idea: small values in Pb cannot be estimated accurately, just setting them to zero will increase rank of sample estimate.

• Hypothesis: Covariances decrease in significance with distance.

• Method: Taper covariance estimate to zero away from (block) diagonal using a pre-specific function.– Also makes algorithms faster, since observations

cannot impact the entire state.

Localization: a simple example

13

Estimates of covariances from a small ensemble will be noisy,with signal-to-noise small especially when covariance is small

14

Localization: a real-world example• AMSUA_n15 channel 6 radiance

at 150E,-50S. • Increment to level 30 (~310mb)

temperature for a 1K O-F for 40,80,160,320 and 640 ens members with no localization.

15

Localization: a real-world example

16

What about uncertainty in the model itself?Not included in Pb if every member run with the same model

Must account for the background error any difference between simulated and true environment. Methods used so far:

1) multiplicative inflation (mult. ens perts by a factor > 1).2) additive inflation (random perts added to each member – e.g. differences

between 24 and 48-h forecasts valid at the same time).3) model-based schemes (e.g. stochastic kinetic energy backscatter for

representing unresolved processes, multi-model/multi-parameterization).

Opnl NCEP system used to use a combination of 1) and 2), now using a combination of 1) and 3).

Only 1) is taken care of within the EnKF itself.2) Is a separate step (runs after EnKF update and before forecast step)3) happens inside forecast model.

17

Relaxation To Prior Spread (RTPS) InflationDescribed in DOI: 10.1175/MWR-D-11-00276.1

Inflate posterior spread (std. dev) sa back toward prior spread sb

Equivalent to

18

Why does Pb matter?Flow dependence and unobserved variables

Surface pressure observation can improve analysis of integrated water vapor (through flow-dependent cross-variable relationships). If climo Pb were used (3DVar) there would be no vapor increment.

Firs

t-G

uess

Pre

cipit

able

Wate

r

First-Guess SLP contours

ps ob

19

Why combine EnKF and Var?Features from EnKF Features from VarCan propagate Pb from across assimilation windows

Treatment of sampling error in ensemble Pb estimate does not depend on H.

More flexible treatment of model error (can be treated in ensemble)

Dual-resolution capability – can produce a high-res “control” analysis.

Automatic initialization of ensemble forecasts.

Ease of adding extra constraints to cost function, including a static Pb component.

What is limiting performance now?

• Sampling error– Run larger ensembles.– Better (flow and scale dependent) localization methods.– Better treatment of non-local observations (model-space

localization).• Model error

– Run higher resolution ensembles.– Better parameterizations of model uncertainty.

• Non-gaussian error statistics.– Some phenomena not observed often/well enough, dynamical

error growth will be nonlinear.– Can arise from displacement errors in coherent features.– Variables that are physically bounded (humidity, wind speed).

enkf overview and theory jeff whitaker [email protected]@noaa.gov 1

Documents