Statistical Methods forNumerical Weather Prediction.
Thomas Bengtsson(Doug Nychka, Chris Snyder)
Geophysical Statistics Project – NCAR, Boulder, COwww.cgd.ucar.edu/~tocke/ [email protected]
University of Missouri-ColumbiaDepartment of Statistics
Oct 11, 2001.
GSP is supported by the National Science Foundationunder grants DMS 9815344 and DMS 9312686.
Outline
1. Numerical Weather Prediction
• Ensemble Forecasting
• Lorenz ODE’s
2. Atmospheric Data Assimilation & State-Space Framework
• Nonlinearities (Non-gaussian forecast distributions)
– Simulations
• High-dimensional (Computational issues)
3. Future Directions
Numerical Weather Prediction (NWP)
• A weather forecast is produced by integrating forward (in time) a systemof nonlinear differential equations:
xt+δt = xt +
∫ t+δt
t
Ω(u)du
Here, xt is initial condition (current state of atmosphere) and Ω(t) = xtdefines the physics.
• A NWP model must be able to incorporate and combine:
1. physical laws for atmosphere (classical mechanics, thermo dynamics).
2. statistical and numerical techniques.
• Forecasting (weather) is an uncertain proposition- a matter of probability?
Ensemble Forecasting
• A probabilistic view of prediction: p(xt).
• Difficult to solve forward integration of p(xt) analytically.
• An ensemble forecast is a (sample) collection of weather forecaststhat verify at the same time.
• The ensemble is (generally) derived under the same dynamic model start-ing from different initial conditions.
• Issue: how to sample from posterior?
NCEP Ensemble (Sivillo et al., 1997)
• 5640-m contour line of 500-hPa height field (15 November, 1995)
• Solid line ↔ actual weather, 15 Nov.
NCEP Ensemble (Sivillo et al., 1997)
• Ensemble spread (variance) decreased ↔ forecast more accurate.
Lorenz System (Lorenz, 1963)
• Simplification of motion of a fluid heated below in a gravitational field.
• Equations:
xt = −σ(xt + yt)
yt = rxt − yt − xtytzt = xtyt − bzt
• xt ∝ intensity of fluid flowyt represents ∆T between ascending/descending currentszt ∝ temperature gradient.
Ensemble Forecast
Atmospheric Data Assimilation & State-Space Framework
Data Assimilation
Updating our knowledge of the state of the atmosphere once new weatherdata is available.
Atmospheric Model
Weather observations −→ yt = Htxt + εt
Atmospheric State −→ xt = G(xt−1)
yt ∈ <105
, data
xt ∈ <107
, unobserved
Ht maps state to observation (linear or non-linear)
G highly nonlinear, chaotic (known or approximate)
εt (gaussian) observation error, cov(εt) = R
Sequential assimilation and forecasting:
p(xt), ytBayes−→ p(xt|yt)
G(·)−→ p(xt+1), yt+1Bayes−→ p(xt+1|yt+1)
Forecast Chaos
Linear Filtering
• Assuming p(xt) and p(yt|xt) both normal, assimilate yt and p(xt) usingthe Kalman Filter (KF) (Kalman, 1960):
E(xt|Yt) = E(xt|Yt−1) + Kt[yt −HtE(xt|Yt−1)]
Put = (I−KtHt)P
ft ,
where
Kt = PftH
′t(HtP
ftH
′t + R)−1, and
Pft = E[xt − E(xt|Yt−1)][xt − E(xt|Yt−1)]
′.
• Easy to implement sequentially in systems with linear dynamics.
• Covariance recursion expensive for high-dimensional systems.
The Ensemble Kalman Filter (EnsKF)
• EnsKF proceeds by estimating the first two moments of the forecastdistribution using an ensemble (sample) of state vectors.
E(xt|Yt) = E(xt|Yt−1) + Kt[yt −HtE(xt|Yt−1)]
Algorithm:
i. sample xi ∼ p(xt−1|Yt−1), for i=1,...,m.
ii. propagate xfi = G(xi), for i = 1,...,m.
iii. calculate E(xt|Yt−1) = 1m
∑mi=1 xfi , and Pf
t .
iv. update E(xt|Yt−1) using the sample moments from iii).
• Advantage: covariance information is propagated in a compact and re-duced dimensional form, real-time efficiency, feasible to implement inhigh-dimensional systems, performs well in low-order systems with un-stable dynamics
Nonlinear Filtering
• No universal analytical solution exists.
• Use extended KF (Jazwinski, 1970); or, Sequential Monte Carlo (SMC)filters (Doucet et al., 2001).
• Expect problems with the extended KF if:
– G(·) strongly nonlinear (Evensen, 1994; Miller at al. 1994).
use sample based filter, e.g. ensemble Kalman filter(Evensen, 1994).
– p(xt) non-gaussian
use mixture (Gaussian sum) Kalman filter(Alspach & Sorenson, 1972; Chen & Liu, 2000).
• Expect problems with SMC filters if:
– dim(xt) is large (Gilks et al., 1996; Robert & Casella, 1999)
A Mixture Ensemble Kalman Filter
• Suppose p(xt) is non-gaussian.
• We approximate p(xt) by a mixture of Gaussian distributions:
p(xt) =
k∑i=1
piMN(µi,Pi)
By Bayes theorem,
p(xt|yt) =
k∑i=1
p?iMN(µ?i ,P
?i )
Here (µ?i ,P
?i ) are found by KF, and p?i ∝ pi × p(yt|µi,Pi)
• Need to choose k, MN(µi,Pi)
Gaussian prior/posterior → k = 1
Kernel density estimate → k = ensemble size
A (Bootstrap) Particle Filter (Doucet et al., 2001)
p(xt−1), yt−1Bayes−→ p(xt−1|yt−1)
G(sample)−→ p(xt), ytBayes−→ p(xt|yt)
A Sampling Scheme
1. Calculate µ?i using KF, and find p?i .
2. Generate the following random quantities:
x? ∼ MN(0,Pi) , y? = Hx? + e
where e ∼ MN(0,R).
3. Find u = x? −Kiy?, and let zi = µ?
i + u. Use zi with probability p?i .
Note that the perturbation u has the correct (posterior) covariance:
Cov(u) = Cov(x∗) + Cov(Kiy?)− 2Cov(x∗,Kiy
?)
= Pi + PiHT (HPiH
T + R)−1HPi − 2PiHT (HPiH
T + R)−1HPi
= Pi −KiHPi = Pui
• No need to factor Pi if x? is a sample perturbation from the prior en-semble.
Simulations
Form of prior p(xt) Statistics
Gaussian MN(µ, P) (µ, P) = ensemble mean, cov.
Mixture∑m
i=11m
MN(µi, Pi) µi = i:th ensemble member
Pi = cov. in neighborhood of µi
• m = (40,400), Ht = I, var(εt) = 4I, T = 5000.
• Error measure:
median of RMSEt =
√(xt − E(xt|yt))′(xt − E(xt|yt))/3
δt Gaussian, k=1 Mixture, k=mm = 40 m = 400 m = 40 m = 400
.1 .51 .35 .54 ?.25 .75 .68 .56 .52.5 1.06 1.05 .76 .71
• Improvement is 25-30%.
Unstable Dynamics
Conditional Simulation Results
• Condition on posterior means from gaussian assimilationlocated in saddle (T = 250).
• m = (40,400), Ht = I, var(εt) = 4I.
• Error measure: median of RMSEt
δt Gaussian Mixturem = 400 m = 40 m = 400
.5 1.64 .94 .73
• Improvement following unstable (saddle) area is 45-55%.
Computational Issues: Sequential Assimilation of Observations
LemmaFor uncorrelated (independent) measurement errors, sequential as-
similation of observations yields the same result as simultaneous assim-ilation.
• Implication: The inverse of the Kalman gain matrix does not have to beexplicitly calculated.
• The jth observation at time t is related to the state by yjt = hjtxt + εjt .Assimilation of yjt is given by
E(xt|yjt ,Yt−1) = E(xt|y(j−1)
t ,Yt−1) + K(f,j−1)t [yj − hjtE(xt|y(j−1)
t ,Yt−1)],
where ykt = (y1
t , y2t , . . . , y
kt )′, and K
(f,k)t = P(f,k)
t hjt′
(hjtP(f,k)t hjt
′+R)
.
• To obtain E(xt|Yt) iterate above for all observations in yt.
Computational Issues: Limiting Impact of Observations.
• Observations are (fairly) local in space → ht is sparse.
• (From sequential update) Need to compute Pft h′t.
Pft h′t =
1
m− 1
m∑i=1
[xfi − E(xt|Yt−1)]ht[xfi − E(xt|Yt−1)]′
=1
m− 1
m∑i=1
αi[xfi − E(xt|Yt−1)]
• In terms of error structure, physically remote state variables should beuncorrelated. Observations over Boulder should not update the atmo-spheric state over London.
• By considering local information in the updates, effects of sampling vari-ability of xfi is decreased.
MSE Properties of EnsKF
• What are the effects of sampling variability on EnsKF update?
• Suppose linear dynamics: E(xt|Yt−1) ∼ MN(E(xt|Yt−1),1mPft ).
We have the following orthogonal decomposition:
E(xt|Yt) =E(xt|Yt−1) + Kt[yt −HtE(xt|Yt−1)]
+(I−KtHt)[E(xt|Yt−1)− E(xt|Yt−1)]
+(Kt −Kt)[yt −HtE(xt|Yt−1)]
+(Kt − Kt)Ht[E(xt|Yt−1)− E(xt|Yt−1)]
• Let ηt = (I−KtHt)[E(xt|Yt−1)− E(xt|Yt−1)].
• We wish to study E(ηt′ηt) as a function of the eigenstructure (values) of
Pft .
Effects of Sampling Variability on EnsKF
• Let ith eigenvalue of Pft ∝ i−θ, θ > 1. Set tr(Pf
t ) = 1.
• Here, Ht = I, and R = I. Y-axis:√mE(η′tηt)
tr(Put ) , X-axis: tr(R)
tr(Put ).
Summary
• We have presented a bootstrap mixture Kalman filter for recursivelytracking atmospheric states.
• But, . . . what we really have is a updating procedure which is locallylinear.
• Maybe, . . . the weighting (represented by pi’s) can resolve non-gaussianstructures.
Future Directions
• How to construct mixture? (order, kernels)
• Sequential parameter estimation; incorporate model (parameter) uncer-tainty.
• Validate mixture approach on higher dimensional system, e.g. Lorenz(1996).
• Formalize algorithms for rank deficient cases, i.e., when m≪ dim(xt).