hunt_et_al_2007
DESCRIPTION
ds 7587TRANSCRIPT
![Page 1: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/1.jpg)
Efficient Data Assimilation for Spatiotemporal Chaos: a
Local Ensemble Transform Kalman Filter
Brian R. Hunt
Institute for Physical Science and Technology and Department of Mathematics
University of Maryland, College Park MD 20742
Eric J. Kostelich
Department of Mathematics and Statistics
Arizona State University, Tempe AZ 85287-1804
Istvan Szunyogh
Institute for Physical Science and Technology and Department of Atmospheric and Oceanic Science
University of Maryland, College Park MD 20742
12 December 2006
Abstract
Data assimilation is an iterative approach to the problem ofestimating the state of a dy-
namical system using both current and past observations of the system together with a model
for the system’s time evolution. Rather than solving the problem from scratch each time new
observations become available, one uses the model to “forecast” the current state, using a prior
state estimate (which incorporates information from past data) as the initial condition, then uses
current data to correct the prior forecast to a current stateestimate. This Bayesian approach
is most effective when the uncertainty in both the observations and in the state estimate, as it
evolves over time, are accurately quantified. In this article, we describe a practical method for
data assimilation in large, spatiotemporally chaotic systems. The method is a type of “ensemble
Kalman filter”, in which the state estimate and its approximate uncertainty are represented at
any given time by an ensemble of system states. We discuss both the mathematical basis of
this approach and its implementation; our primary emphasisis on ease of use and computa-
tional speed rather than improving accuracy over previously published approaches to ensemble
1
![Page 2: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/2.jpg)
Kalman filtering. We include some numerical results demonstrating the efficiency and accuracy
of our implementation for assimilating real atmospheric data with the global forecast model
used by the U.S. National Weather Service.
1 Introduction
Forecasting a physical system generally requires both a model for the time evolution of the system
and an estimate of the current state of the system. In some applications, the state of the system
can be measured directly with high accuracy. In other applications, such as weather forecasting,
direct measurement of the global system state is not feasible. Instead, the state must be inferred
from available data. While a reasonable state estimate based on current data may be possible, in
general one can obtain a better estimate by using both current and past data. “Data assimilation”
provides such an estimate on an ongoing basis, iteratively alternating between a forecast step and
a state estimation step; the latter step is often called the “analysis”. The analysis step combines
information from current data and from a prior short-term forecast (which is based on past data),
producing a current state estimate. This estimate is used toinitialize the next short-term forecast,
which is subsequently used in the next analysis, and so on. The data assimilation procedure is itself
a dynamical system driven by the physical system, and the practical problem is to achieve good
“synchronization” [40] between the two systems.
Data assimilation is widely used to study and forecast geophysical systems [13, 28]. The analy-
sis step is generally a statistical procedure (specifically, a Bayesian maximum likelihood estimate)
involving a prior (or “background”) estimate of the currentstate based on past data, and current data
(or “observations”) that are used to improve the state estimate. This procedure requires quantifi-
cation of the uncertainty in both the background state and the observations. While quantifying the
observation uncertainty can be a nontrivial problem, in this article we consider that problem to be
solved, and instead concentrate on the problem of quantifying the background uncertainty.
There are two main factors that create background uncertainty. One is the uncertainty in the
initial conditions from the previous analysis, which produces the background state via a short-term
forecast. The other is “model error”, the unknown discrepancy between the model dynamics and
actual system dynamics. Quantifying the uncertainty due tomodel error is a challenging problem,
and while this problem generally cannot be ignored in practice, we discuss only crude ways of
accounting for it in this article. For the time being, let us consider an idealized “perfect model”
scenario, in which there is no model error.
The main purpose of this article is to describe a practical framework for data assimilation that
is both relatively easy to implement and computationally efficient, even for large, spatiotemporally
chaotic systems. (By “spatiotemporally chaotic” we mean a spatially extended system that exhibits
2
![Page 3: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/3.jpg)
temporally chaotic behavior with weak long-range spatial correlations.) The emphasis here is on
methodology that scales well to high-dimensional systems and large numbers of observations, rather
than on what would be optimal given unlimited computationalresources. Ideally, one would keep
track of a probability distribution of system states, propagating the distribution using the Fokker-
Planck-Kolmogorov equation during the forecast step. While this approach provides a theoretical
basis for the methods used in practice [25], it would be computationally expensive even for a low-
dimensional system and is not at all feasible for a high-dimensional system. Instead one can use
a Monte Carlo approach, using a large ensemble of system states to approximate the distribution
(see [6] for an overview), or a parametric approach like the Kalman filter [26, 27], which assumes
Gaussian distributions and tracks their mean and covariance. (The latter approach was derived
originally for linear problems, but serves as a reasonable approximation for nonlinear problems
when the uncertainties remain sufficiently small.)
The methodology of this article is based on the Ensemble Kalman Filter [7, 8, 9], which has
elements of both approaches: it uses the Gaussian approximation and follows the time evolution of
the mean and covariance by propagating an ensemble of states. The ensemble can be reasonably
small relative to other Monte Carlo methods because it is used only to parametrize the distribution,
not to sample it thoroughly. The ensemble should be large enough to approximately span the space
of possible system states at a given time, because the analysis essentially determines which linear
combination of the ensemble members forms the best estimateof the current state, given the current
observations.
Many variations on the Ensemble Kalman Filter have been published in the geophysical liter-
ature, and this article draws ideas from a number of them [1, 2, 4, 17, 20, 21, 30, 36, 37, 45, 48].
These articles in turn draw ideas both from earlier work on geophysical data assimilation and from
the engineering and mathematics literature on nonlinear filtering. For the most part, we limit our
citations to ensemble-based articles rather than attempt to trace all ideas to their original sources.
We call the method described here a Local Ensemble TransformKalman Filter (LETKF), because
it is most closely related to the Local Ensemble Kalman Filter [36, 37] and the Ensemble Trans-
form Kalman Filter [4]. Indeed, it can produce analyses thatare equivalent to the LEKF in a more
efficient manner that is formally similar to the ETKF. While this article does not describe a fun-
damentally new method for data assimilation, it proposes a significant refinement of previously
published approaches that combines formal simplicity withthe flexibility to adapt to a variety of
applications.
In Section 2, we start by posing a general problem about whichtrajectory of a dynamical system
“best fits” a time series of data; this problem is solved exactly for linear problems by the Kalman
filter and approximately for nonlinear problems by ensembleKalman filters. Next we derive the
Kalman filter equations as a guide for what follows. Then we discuss ensemble Kalman filters
3
![Page 4: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/4.jpg)
in general and the issue of “localization”, which is important for applications to spatiotemporally
chaotic systems. Finally, we develop the basic LETKF equations, which provide a framework for
data assimilation that allows a system-dependent localization strategy to be developed and tuned.
We discuss also several options for “covariance inflation” to compensate for the effects of model
error and the deficiencies due to small sample size and linearapproximation that are inherent to
ensemble Kalman filters.
In Section 3, we give step-by-step instructions for efficient implementation of the approach de-
veloped in Section 2 and discuss options for further improving computational speed in certain cases.
Then in Section 4, we present a generalization that allows observations gathered at different times to
be assimilated simultaneously in a natural way. In Section 5, we present preliminary results using a
global atmospheric forecast model with real observations;these results compare favorably with the
data assimilation method used by the National Weather Service, and demonstrate the feasibility of
the LETKF algorithm for large models and data sets. Section 6is a brief conclusion. The notation
in this article is based largely on that proposed in [24], with some elements from [37].
2 Mathematical Formulation
Consider a system governed by the ordinary differential equation
dxdt
= F(t,x), (1)
wherex is anm-dimensional vector representing the state of the system ata given time. Suppose
we are given a set of (noisy) observations of the system made at various times, and we want to
determine which trajectory{x(t)} of (1) “best” fits the observations. For any givent, this trajectory
gives an estimate of the system state at timet.
To formulate this problem mathematically, we need to define “best fit” in this context. Let us
assume that the observations are the result of measuring quantities that depend on the system state
in a known way, with Gaussian measurement errors. In other words, an observation at timet j is a
triple (yoj ,H j ,R j), whereyo
j is a vector of observed values, andH j andR j describe the relationship
betweenyoj andx(t j):
yoj = H j(x(t j))+ ε j ,
whereε j is a Gaussian random variable with mean0 and covariance matrixR j . Notice that we are
assuming a perfect model here: the observations are based ona trajectory of(1), and our problem
is simply to infer which trajectory produced the observations. In a real application, the observations
come from a trajectory of the physical system for which(1) is only a model. So a more realistic (but
more complicated) problem would be to determine a pseudo-trajectory of(1), or a trajectory of an
4
![Page 5: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/5.jpg)
associated stochastic differential equation, that best fits the observations. Formulating this problem
mathematically then requires some assumptions about the size and nature of the model error. We
use the perfect model problem as motivation and defer the consideration of model error until later.
Given our assumptions about the observations, we can formulate a maximum likelihood estimate
for the trajectory of (1) that best fits the observations at timest1 < t2 < · · · < tn. The likelihood of a
trajectoryx(t) is proportional to
n
∏j=1
exp
(
−12[yo
j −H j(x(t j))]TR−1
j [yoj −H j(x(t j))]
)
.
The most likely trajectory is the one that maximizes this expression, or equivalently minimizes the
“cost function”
Jo({x(t)}) =n
∑j=1
[yoj −H j(x(t j))]
TR−1j [yo
j −H j(x(t j))]. (2)
Thus, the “most likely” trajectory is also the one that best fits the observations in a least square
sense.
Notice that (2) expresses the costJo as a function of the trajectory{x(t)}. To minimize the cost,
it is more convenient to writeJo as a function of the system state at a particular timet. Let Mt,t ′ be
the map that propagates a solution of (1) from timet to timet ′.1 Then
Jot (x) =
n
∑j=1
[yoj −H j(Mt,t j
(x))]TR−1j [yo
j −H j(Mt,t j(x))] (3)
expresses the cost in terms of the system statex at timet. Thus to estimate the state at timet, we
attempt to minimizeJot .
For a nonlinear model, there is no guarantee that a unique minimizer exists. And even if it does,
evaluatingJot is apt to be computationally expensive, and minimizing it may be impractical. But
if both the model and the observation operatorsH j are linear, the minimization is quite tractable,
becauseJot is then quadratic. Furthermore, instead of performing the minimization from scratch at
each successive timetn, one can compute the minimizer by an iterative method, namely the Kalman
filter [26, 27], which we now describe in the perfect model scenario. This method forms the basis
for the approach we will use in the nonlinear scenario.
2.1 Linear Scenario: the Kalman Filter
In the linear scenario, we can writeMt,t ′(x) = M t,t ′x and H j(x) = H jx whereM t,t ′ and H j are
matrices. Using the terminology from the introduction, we now describe how to perform a forecast1In the derivations that follow, we allowt ′ to be less thant, though in practice integrating (1) backward in time may
be problematic — for example, if (1) represents a discretization of a dissipative partial differential equation. Our use of
Mt,t′ for t ′ < t is entirely expository; the methodology we develop will notrequire backward integration of (1).
5
![Page 6: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/6.jpg)
step from timetn−1 to timetn followed by an analysis step at timetn, in such a way that if we start
with the most likely system state, in the sense described above, given the observations up to time
tn−1, we end up with the most likely state given the observations up to timetn. The forecast step
propagates the solution from timetn−1 to time tn, and the analysis step combines the information
provided by the observations at timetn with the propagated information from the prior observations.
This iterative approach requires that we keep track of not only the most likely state, but also its
uncertainty, in the sense described below. (Of course, the fact that the Kalman filter computes the
uncertainty in its state estimate may be viewed as a virtue.)
Suppose the analysis at timetn−1 has produced a state estimatexan−1 and an associated covariance
matrixPan−1. In probabilistic terms,xa
n−1 andPan−1 represent the mean and covariance of a Gaussian
probability distribution that represents the relative likelihood of the possible system states given the
observations from timet1 to tn−1. Algebraically, what we assume is that for some constantc,
n−1
∑j=1
[yoj −H jM tn−1,t j
x]TR−1j [yo
j −H jM tn−1,t jx] = [x− xa
n−1]T(Pa
n−1)−1[x− xa
n−1]+c. (4)
In other words, the analysis at timetn−1 has “completed the square” to express the part of the
quadratic cost functionJotn−1
that depends on the observations up to that time as a single quadratic
form plus a constant. The Kalman filter determinesxan andPa
n such that an analogous equation holds
at timetn.
First we propagate the analysis state estimatexan−1 and its covariancePa
n−1 using the forecast
model to produce a background state estimatexbn and covariance matrixPb
n for the next analysis:
xbn = M tn−1,tn
xan−1, (5)
Pbn = M tn−1,tn
Pan−1MT
tn−1,tn. (6)
Under a linear model, a Gaussian distribution of states at one time propagates to a Gaussian distri-
bution at any other time, and the equations above describe how the model propagates the mean and
covariance of such a distribution. (Usually, the Kalman filter adds a constant matrix to the right side
of (6) to represent additional uncertainty due to model error.)
Next, we want to rewrite the cost functionJotn given by (3) in terms of the background state
estimate and the observations at timetn. (This step is often formulated as applying Bayes’ rule to
the corresponding probability density functions.) In (4),x represents a hypothetical system state at
time tn−1. In our expression forJotn, we wantx to represent instead a hypothetical system state at
time tn, so we first replacex by M tn,tn−1x = M−1
tn−1,tnx in (4). Then using (5) and (6) yields
n−1
∑j=1
[yoj −H jM tn,t j
x]TR−1j [yo
j −H jM tn,t jx] = [x− xb
n]T(Pb
n)−1[x−xb
n]+c.
6
![Page 7: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/7.jpg)
It follows that
Jotn(x) = [x− xb
n]T(Pb
n)−1[x− xb
n]+ [yon−Hnx]TR−1
n [yon−Hnx]+c. (7)
To complete the data assimilation cycle, we determine the state estimatexan and its covariance
Pan so that
Jotn(x) = [x− xa
n]T(Pa
n)−1[x− xa
n]+c′
for some constantc′. Equating the terms of degree 2 inx, we get
Pan =
[
(Pbn)
−1+HTn R−1
n Hn
]−1. (8)
Equating the terms of degree 1, we get
xan = Pa
n
[
(Pbn)
−1xbn+HT
n R−1n yo
n
]
. (9)
Notice that when the model state is observed directly,Hn is the identity matrix, and equation (9)
expresses the analysis state estimate as a weighted averageof the background state estimate and the
observations, weighted according to the inverse covariance of each.
Equations (8) and (9) can be written in many different but equivalent forms, and it will be useful
later to rewrite both of them now. Using (8) to eliminate(Pbn)
−1 from (9) yields
xan = xb
n+PanHT
n R−1n (yo
n−Hnxbn). (10)
The matrixPanHT
n R−1n is called the “Kalman gain”. It multiplies the difference between the obser-
vations at timetn and the values predicted by the background state estimate toyield the increment
between the background and analysis state estimates. Next,multiplying (8) on the right by(Pbn)
−1Pbn
and combining the inverses yields
Pan = (I +Pb
nHTn R−1
n Hn)−1Pb
n. (11)
This expression provides a more efficient way than (8) to compute Pan, since it does not require
invertingPbn.
Initialization. The derivation above of the Kalman filter avoids the issue of how to initialize the
iteration. To solve the best fit problem we originally posed,we should make no assumptions about
the system state prior to the analysis at timet1. Formally we can regard the background covariance
Pb1 to be infinite, and forn = 1 use (8) and (9) with(Pb
1)−1 = 0. This works if there are enough
observations at timet1 to determine (aside from the measurement errors) the systemstate; that is,
if H1 has rank equal to the number of model variablesm. The analysis then determinesxa1 in the
appropriate least-square sense. However, if there are not enough observations, then the matrix to be
7
![Page 8: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/8.jpg)
inverted in (8) does not have full rank. To avoid this difficulty, one can assume a prior background
distribution at timet1, with Pb1 reasonably large but finite. This adds a small quadratic termto the
cost function being minimized, but with sufficient observations over time, the effect of this term on
the analysis at timetn decreases in significance asn increases.
2.2 Nonlinear Scenario: Ensemble Kalman Filtering
Many approaches to data assimilation for nonlinear problems are based on the Kalman filter, or at
least on minimizing a cost function similar to (7). At a minimum, a nonlinear model forces a change
in the forecast equations (5) and (6), while nonlinear observation operatorsHn force a change in the
analysis equations (10) and (11). The extended Kalman filter(see, for example, [25]) computes
xbn = Mtn−1,tn
(xan−1) using the nonlinear model, but computesPb
n using the linearizationM tn−1,tnof
Mtn−1,tnaroundxa
n−1. The analysis then uses the linearizationHn of Hn aroundxbn. This approach is
problematic for complex, high-dimensional models such as aglobal weather model for (at least) two
reasons. First, it is not easy to linearize such a model. Second, when the number of model variables
m is several million, computations involving them×m covariance matrices are very expensive.
Approaches used in operational weather forecasting generally eliminate, for pragmatic reasons,
the time iteration of the Kalman filter. The U.S. National Weather Service performs data assimi-
lation every 6 hours using the “3D-Var” method [32, 38], in which the background covariancePbn
in (7) is replaced by a constant matrixB representing typical uncertainty in a 6-hour forecast. This
simplification allows the analysis to be formulated in a manner that precomputes the most expensive
matrix operations, so that they do not have to be repeated at each timetn. The 3D-Var cost func-
tion also allows a nonlinear observation operatorHn, and is minimized numerically to produce the
analysis state estimatexan.
The “4D-Var” method [31, 42] used by the European Centre for Medium-Range Weather Fore-
casts uses a cost function that includes a constant-covariance background term as in 3D-Var, together
with a sum like (2) accounting for the observations collected over a 12-hour time window. Again
the cost function is minimized numerically; this procedureis computationally intensive, because
computing the gradient of the 4D-Var cost function requiresintegrating both the nonlinear model
and its linearization over the 12-hour window, and this procedure is repeated until a satisfactory
approximation to the minimum is found.
The key idea of ensemble Kalman filtering [7, 9] is to choose attime tn−1 an ensemble of initial
conditions whose spread aroundxan−1 characterizes the analysis covariancePa
n−1, propagate each
ensemble member using the nonlinear model, and computePbn based on the resulting ensemble at
time tn. Thus like the extended Kalman filter, the (approximate) uncertainty in the state estimate is
propagated from one analysis to the next, unlike 3D-Var (which does not propagate the uncertainty
at all) or 4D-Var (which propagates it only with the time window over which the cost function is
8
![Page 9: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/9.jpg)
minimized). Furthermore, ensemble Kalman filters do this without requiring a linearized model.
On the other hand, 4D-Var (with “weak constraint”) allows a wide variety of model error terms to
be incorporated into the cost function.
In spite of their differences, though, we emphasize that in the absence of computational limi-
tations, 4D-Var and ensemble Kalman filtering should be ableto produce similar results because
they both seek to minimize the same type of cost function. Indeed, in a perfect model scenario
[11], we obtained similar results with both methods when we used a sufficiently long time window
for 4D-Var and when we used enough ensemble members and performed the analysis sufficiently
frequently in a 4D version (described in Section 4 of this article) of our LETKF. In atmospheric data
assimilation, ensemble Kalman filtering has not yet equaledthe best results using 4D-Var, but it has
begun to achieve results that compare favorably with operational 3D-Var results [22, 46, 34]; see
also Section 5.
Perhaps the most important difference between ensemble Kalman filtering and the other methods
described above is that the former quantifies uncertainty only in the space spanned by the ensemble.
Assuming that computational resources restrict the numberof ensemble membersk to be much
smaller than the number of model variablesm, this can be a severe limitation. On the other hand, if
this limitation can be overcome (see the section on “Localization” below), then the analysis can be
performed in a much lower-dimensional space (k versusm). Thus, ensemble Kalman filtering has
the potential to be more computationally efficient than the other methods. Indeed, the main point
of this article is to describe how to do ensemble Kalman filtering efficiently without sacrificing
accuracy.
Notation. We start with an ensemble{xa(i)n−1
: i = 1,2, . . . ,k} of m-dimensional model state vectors
at timetn−1. One approach would be to let one of the ensemble members represent the best estimate
of the system state, but here we assume the ensemble to be chosen so that its average represents
the analysis state estimate. We evolve each ensemble memberaccording to the nonlinear model to
obtain a background ensemble{xb(i)n : i = 1,2, . . . ,k} at timetn:
xb(i)n = Mtn−1,tn
(xa(i)n−1
).
For the rest of this article, we will discuss what to do at the analysis timetn, and so we now drop the
subscriptn. Thus, for example,H andR will represent respectively the observation operator and the
observation error covariance matrix at the analysis time. Let ℓ be the number of scalar observations
used in the analysis.
For the background state estimate and its covariance, we usethe sample mean and covariance of
the background ensemble:
xb = k−1k
∑i=1
xb(i),
9
![Page 10: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/10.jpg)
Pb = (k−1)−1k
∑i=1
(xb(i)− xb)(xb(i)− xb)T = (k−1)−1Xb(Xb)T , (12)
whereXb is them× k matrix whoseith column isxb(i) − xb. (Notice that the rank ofPb is equal
to the rank ofXb, which is at mostk−1 because the sum of its columns is0. Thus, the ensemble
size limits the rank of the background covariance matrix.) The analysis must determine not only an
state estimatexa and covariancePa, but also an ensemble{xa(i) : i = 1,2, . . . ,k} with the appropriate
sample mean and covariance:
xa = k−1k
∑i=1
xa(i),
Pa = (k−1)−1k
∑i=1
(xa(i)− xa)(xa(i)− xa)T = (k−1)−1Xa(Xa)T , (13)
whereXa is them×k matrix whoseith column isxa(i)− xa.
In Section 2.3, we will describe how to determinexa andPa for a (possibly) nonlinear observa-
tion operatorH in a way that agrees with the Kalman filter equations (10) and (11) in the case that
H is linear.
Choice of analysis ensemble. Oncexa andPa are specified, there are still many possible choices
of an analysis ensemble (or equivalently, a matrixXa that satisfies (13) and the sum of whose
columns is zero). Many ensemble Kalman filters have been proposed, and one of the main differ-
ences among them is how the analysis ensemble is chosen. The simplest approach is to apply the
Kalman filter update (10) separately to each background ensemble member (rather than their mean)
to get the corresponding analysis ensemble member. However, this results in an analysis ensemble
whose sample covariance is smaller than the analysis covariancePa given by (11), unless the obser-
vations are artificially perturbed so that each ensemble member is updated using different random
realization of the perturbed observations [5, 20]. Ensemble square-root filters [1, 45, 4, 43, 36, 37]
instead use more involved but deterministic algorithms to generate an analysis ensemble with the
desired sample mean and covariance. As such, their analysescoincide exactly with the Kalman filter
equations in the linear scenario of the previous section. Wewill use this deterministic approach in
Section 2.3.
Localization. Another important issue in ensemble Kalman filtering of spatiotemporally chaotic
systems is spatial localization. If the ensemble hask members, then the background covariance
matrixPb given by (12) describes nonzero uncertainty only in the (at most)k-dimensional subspace
spanned by the ensemble, and a global analysis will allow adjustments to the system state only in
this subspace. If the system is high-dimensionally unstable — more precisely, if it has more than
k positive Lyapunov exponents — then forecast errors will grow in directions not accounted for by
10
![Page 11: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/11.jpg)
the ensemble, and these errors will not be corrected by the analysis. On the other hand, in a suffi-
ciently small local region, the system may behave like a low-dimensionally unstable system driven
by the dynamics in neighboring regions; such behavior was observed for a global weather model
in [39, 35]. Performing the analysis locally requires the ensemble to represent uncertainty in only
the local unstable space. By allowing the local analyses to choose different linear combinations of
the ensemble members in different regions, the global analysis is not confined to thek-dimensional
ensemble space and instead explores a much higher-dimensional space [12, 36, 37]. Another ex-
planation for the necessity of localization for spatiotemporally chaotic systems is that the limited
sample size provided by an ensemble will produce spurious correlations between distant locations
in the background covariance matrixPb [20, 17]. Unless they are suppressed, these spurious cor-
relations will cause observations from one location to affect, in an essentially random manner, the
analysis in locations an arbitrarily large distance away. If the system has a characteristic “correla-
tion distance”, then the analysis should ignore ensemble correlations over much larger distances. In
addition to providing better results in many cases, localization allows the analysis to be done more
efficiently as a parallel computation [30, 36, 37].
Localization is generally done either explicitly, considering only the observations from a region
surrounding the location of the analysis [29, 20, 30, 1, 36, 37], or implicitly, by multiplying the
entries inPb by a distance-dependent function that decays to zero beyonda certain distance, so
that observations do not affect the model state beyond that distance [21, 17, 45]. We will follow the
explicit approach here, doing a separate analysis for each spatial grid point of the model. (Our use of
“grid point” assumes the model to be a discretization of a partial differential equation, or otherwise
to be defined on a lattice, but the method is also applicable tosystems with other geometries.) The
choice of which observations to use for each grid point is up to the user of the method, and a good
choice will depend both on the particular system being modeled and on the size of the ensemble
(more ensemble members generally allow more distant observations to be used gainfully). It is
important, however, that most of the observations used in the analysis at a particular grid point also
be used in the analysis at neighboring grid points. This ensures that the analysis ensemble does not
change suddenly from one grid point to the next. For an atmospheric model, a reasonable approach
is to use observations within a cylinder of a given radius andheight centered at the analysis grid
point and to determine empirically which values of the radius and height work best. At its simplest,
the method we describe gives all of the chosen observations equal influence on the analysis, but we
will also describe how to make their influence decay gradually toward zero as their distance from
the analysis location increases.
Initial background ensemble. A common method for generating a background ensemble to use
at the first analysis time is to run the model for a while and to select model states at different ran-
11
![Page 12: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/12.jpg)
domly chosen times. The intent of this method is for the initial background ensemble to be sampled
from a climatological distribution. This is a reasonable choice for the background distribution when
no prior observations are available.
2.3 LETKF: A Local Ensemble Transform Kalman Filter
We now describe an efficient means of performing the analysisthat transforms a background en-
semble{xb(i) : i = 1,2, . . . ,k} into an appropriate analysis ensemble{xa(i) : i = 1,2, . . . ,k}, using
the notation defined above. We assume that the number of ensemble membersk is smaller than both
the number of model variablesm and the number of observationsℓ,2 even when localization has
reduced the effective values ofm andℓ considerably compared to a global analysis. (In this section
we assume that the choice of observations to use for the localanalysis has been performed already,
and consideryo, H, andR to be truncated to these observations; as such, correlations between errors
in the chosen observations and errors in other observationsare ignored.) Most of the analysis takes
place in ak-dimensional space, with as few operations as possible in the model and observation
spaces.
Formally, we want the analysis meanxa to minimize the Kalman filter cost function (7), modified
to allow for a nonlinear observation operatorH:
J(x) = (x− xb)T(Pb)−1(x− xb)+ [yo−H(x)]TR−1[yo−H(x)]. (14)
However, them×m background covariance matrixPb = (k−1)−1Xb(Xb)T has rank at mostk−1,
and is therefore not invertible. Nonetheless, as a symmetric matrix, it is one-to-one on its col-
umn spaceS, which is also the column space ofXb, or in other words the space spanned by the
background ensemble perturbations. So in this sense,(Pb)−1 is well-defined onS. ThenJ is also
well-defined forx− xb in S, and the minimization can be carried out in this subspace.3 As we have
said, this reduced dimensionality is an advantage from the point of view of efficiency, though the
restriction of the analysis mean toS is sure to be detrimental ifk is too small.
To perform the analysis onS, we must choose an appropriate coordinate system. A natural
approach is to use the singular vectors ofXb (the eigenvectors ofPb) to form a basis forS[1, 36, 37].
Here we avoid this step by using instead the columns ofXb to spanS, as in [4]. One conceptual
difficulty in this approach is that the sum of these columns iszero, so they are necessarily linearly2This assumption is only expository; our algorithm does not require an upper bound onk, but it is less efficient than
doing the analysis in model space ifm< k or in observation space ifℓ < k.3Considerably more general cost functions can be used, at theexpense of having to perform the minimization
numerically in the ensemble spaceS. Since this space is relatively low-dimensional, this approach is still feasible even
for high-dimensional systems. In [18] we use a non-quadratic background term within the LETKF framework, and [48]
uses a similar approach to allow a non-quadratic observation term.
12
![Page 13: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/13.jpg)
dependent. We could assume the firstk−1 columns to be independent and use them as a basis, but
this assumption is unnecessary and clutters the resulting equations. Instead, we regardXb as a linear
transformation from ak-dimensional spaceS ontoS, and perform the analysis inS. Let w denote
a vector inS; thenXbw belongs to the spaceSspanned by the background ensemble perturbations,
andx = xb+Xbw is the corresponding model state.
Notice that if w is a Gaussian random vector with mean0 and covariance(k− 1)−1I , then
x = xb+Xbw is Gaussian with meanxb and covariancePb = (k−1)−1Xb(Xb)T . This motivates the
cost function
J(w) = (k−1)wTw+[yo−H(xb+Xbw)]TR−1[yo−H(xb+Xbw)] (15)
on S. In particular, we claim that ifwa minimizes J, then xa = xb + Xbwa minimizes the cost
functionJ. Substituting the change of variables formula into (14) andusing (12) yields the identity
J(w) = (k−1)wT(I − (Xb)T [Xb(Xb)T ]−1Xb)w+J(xb+Xbw). (16)
The matrix I − (Xb)T [Xb(Xb)T ]−1Xb is the orthogonal projection onto the null spaceN of Xb.
(GenerallyN will be one-dimensional, spanned by the vector(1,1, . . . ,1)T , but it could be higher-
dimensional.) Thus, the first term on the right side of (16) depends only on the component ofw in
N, while the second term depends only on its component in the space orthogonal toN (which is in
one-to-one correspondence withSunderXb). Thus if wa minimizesJ, then it must be orthogonal
to N, and the corresponding vectorxa minimizesJ.
A cost function equivalent to (15) appears in [33]. More generally, implementations of 3D-Var
and 4D-Var commonly use a preconditioning step that expresses the cost function in a form similar
to (15).
Nonlinear observations. The most accurate way to allow for a nonlinear observation operatorH
would be to numerically minimizeJ in the k-dimensional spaceS, as in [48]. If H is sufficiently
nonlinear, thenJ could have multiple minima, but a numerical minimization using w = 0 (corre-
sponding tox = xb) as an initial guess would still be a reasonable approach. Having determinedwa
in this manner, one would compute the analysis covariancePa in S from the second partial deriva-
tives of J at wa, then useXb to transform the analysis results into the model space, as below. But
to formulate the analysis more explicitly, we now linearizeH about the background ensemble mean
xb. Of course, ifH is linear then we will find the minimum ofJ exactly. And if the spread of the
background ensemble is not too large, the linearization should be a decent approximation, simi-
lar to the approximation we have already made that a linear combination of background ensemble
members is also a plausible background model state.
13
![Page 14: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/14.jpg)
Since we only need to evaluateH in the ensemble space (or equivalently to evaluateH(xb +
Xbw) for w in S), the simplest way to linearizeH is to apply it to each of the ensemble members
xb(i) and interpolate. To this end, we define an ensembleyb(i) of background observation vectors by
yb(i) = H(xb(i)). (17)
We define also their meanyb, and theℓ×k matrixYb whoseith column isyb(i)− yb. We then make
the linear approximation
H(xb+Xbw) ≈ yb+Ybw. (18)
The same approximation is used in, for example, [21], and is equivalent to the joint state-observation
space method in [1].
Analysis. The linear approximation we have just made yields the quadratic cost function
J∗(w) = (k−1)wTw+[yo− yb−Ybw]TR−1[yo− yb−Ybw]. (19)
This cost function is in the form of the Kalman filter cost function (7), using the background mean
wb = 0 and background covariancePb = (k−1)−1I , with Yb playing the role of the observation
operator. The analogues of the analysis equations (10) and (11) are then
wa = Pa(Yb)TR−1(yo− yb), (20)
Pa = [(k−1)I +(Yb)TR−1Yb]−1. (21)
In model space, the analysis mean and covariance are then
xa = xb+Xbwa, (22)
Pa = XbPa(Xb)T . (23)
To initialize the ensemble forecast that will produce the background for the next analysis, we must
choose an analysis ensemble whose sample mean and covariance are equal toxa andPa. As men-
tioned above, this amounts to choosing a matrixXa so that the sum of its columns is zero and (13)
holds. Then one can form the analysis ensemble by addingxa to each of the columns ofXa.
Symmetric square root. Our choice of analysis ensemble is described byXa = XbWa, where
Wa = [(k−1)Pa]1/2 (24)
and by the 1/2 power of a symmetric matrix we mean its symmetric square root. Then Pa =
(k−1)−1Wa(Wa)T , and (13) follows from (23). The use of the symmetric square root to deter-
mineWa from Pa (as compared to, for example, a Cholesky factorization, or the choice described
14
![Page 15: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/15.jpg)
in [4]), is important for two main reasons. First, as we will see below, it ensures that the sum of
the columns ofXa is zero, so that the analysis ensemble has the correct samplemean (this is also
shown for the symmetric square root in [44]). Second, it ensures thatWa depends continuously on
Pa; while this may be a desirable property in general, it is crucial in a local analysis scheme, so
that neighboring grid points with slightly different matricesPa do not yield very different analysis
ensembles. Another potentially desirable property of the symmetric square root is that it minimizes
the (mean-square) distance betweenWa and the identity matrix, so that the analysis ensemble per-
turbations are in this sense as close as possible to the background ensemble perturbations subject to
the constraint on their sample covariance [36, 37].
To see that the sum of the columns ofXa is zero, we express this condition asXav = 0, wherev
is a column vector ofk ones:v = (1,1, . . . ,1)T . Notice that by (21),v is an eigenvector ofPa with
eigenvalue(k−1)−1:
(Pa)−1v = [(k−1)I +(Yb)TR−1Yb]v = (k−1)v,
because the sum of the columns ofYb is zero. Then by (24),v is also an eigenvector ofWa with
eigenvalue 1. Since the sum of the columns ofXb is zero,Xav = XbWav = Xbv = 0 as desired.
Finally, notice that we can form the analysis ensemble first in S by addingwa to each of the
columns ofWa; let {wa(i)} be the columns of the resulting matrix. These “weight” vectors specify
what linear combinations of the background ensemble perturbations to add to the background mean
to obtain the analysis ensemble in model space:
xa(i) = xb+Xbwa(i). (25)
Local implementation. Notice that once the background ensemble has been used to form yb
and Yb, it is no longer needed in the analysis, except in (25) to translate the results fromS to
model space. This point is useful to keep in mind when implementing a local filter that computes
a separate analysis for each model grid point. In principle,one should form a global background
observation ensembleyb(i)[g]
from the global background vectors, though in practice thiscan be done
locally when the global observation operatorH[g] uses local interpolation. After the background
observation ensemble is formed, the analyses at different grid points are completely independent
of each other and can be computed in parallel. The observations chosen for a given local analysis
dictate which coordinates ofyb(i)[g]
are used to form the local background observation ensembleyb(i),
and the analysis inSproduces the local weight vectors{wa(i)}. Computing the analysis ensemble
{xa(i)} for the analysis grid point using (25) then requires only using the background model states
at that grid point.
As long as the sets of observations used for a pair of neighboring grid points overlap heavily, the
local weight vectors{wa(i)} for the two grid points are similar. In a region over which theweight
15
![Page 16: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/16.jpg)
vectors do not change much, the analysis ensemble members are approximately linear combinations
of the background ensemble members, and thus should represent reasonably “physical” initial con-
ditions for the forecast model. However, if the model requires of its initial conditions high-order
smoothness and/or precise conformance to an conservation law, it may be necessary to post-process
the analysis ensemble members to smooth them and/or projectthem onto the manifold determined
by the conserved quantities before using them as initial conditions (this procedure is often called
“balancing” in geophysical data assimilation).
In other localization approaches [21, 17, 45], the influenceof an observation at a particular point
on the analysis at a particular model grid point decays smoothly to zero as the distance between the
two points increases. A similar effect can be achieved here by multiplying the entries in the inverse
observation error covariance matrixR−1 by a factor that decays from one to zero as the distance of
the observations from the analysis grid point increases. This “smoothed localization” corresponds
to gradually increasing the uncertainty assigned to the observations until beyond a certain distance
they have infinite uncertainty and therefore no influence on the analysis.
Covariance inflation. In practice, an ensemble Kalman filter that adheres strictlyto the Kalman
filter equations (10) and (11) may fail to synchronize with the “true” system trajectory that produces
the observations. One reason for this is model error, but even with a perfect model, the filter tends
to underestimate the uncertainty in its state estimate [45]. Regardless of the cause, underestimating
the uncertainty leads to overconfidence in the background state estimate, and, hence, the analysis
underweights the observations. If the discrepancy becomestoo large over time, the observations
are essentially ignored by the analysis, and the dynamics ofthe data assimilation system become
decoupled from the truth.
Generally this tendency is countered by anad hocprocedure (with at least one tunable pa-
rameter) that inflates either the background covariance or the analysis covariance during each data
assimilation cycle. (Doing this is analogous to adding a model error covariance term to the right side
of (6), as is usually done in the Kalman filter.) One “hybrid” approach adds a multiple of the back-
ground covariance matrixB from the 3D-Var method to the ensemble background covariance prior
to the analysis [16]. “Multiplicative inflation” [2, 17] instead multiplies the background covariance
matrix (or equivalently, the perturbations of the background ensemble members from their mean)
by a constant factor larger than one. “Additive inflation” adds a small multiple of the identity matrix
to either the background covariance or the analysis covariance during each cycle [36, 37]. Finally,
if one chooses the analysis ensemble in such a way that each member has a corresponding member
of the background ensemble, then one can inflate the analysisensemble by “relaxation” toward the
background ensemble: replacing each analysis perturbation from the mean by a weighted average
of itself and the corresponding background perturbation [47].
16
![Page 17: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/17.jpg)
Within the framework described in this article, the hybrid approach is not feasible because it re-
quires the analysis to consider uncertainty outside the space spanned by the background ensemble.
However, once the analysis ensemble is formed, one could develop a means of inflating it in direc-
tions (derived from the 3D-Var background covariance matrix B or otherwise) outside the ensemble
space so that uncertainty in these directions is reflected inthe background ensemble at the next anal-
ysis step. Additive inflation is feasible, but requires substantial additional computation to determine
the adjustment necessary in thek-dimensional spaceS that corresponds to adding a multiple of the
identity matrix to the model space covariancePb or Pa. Relaxation is simple to implement, and is
most efficiently done inSby replacingWa with a weighted average of it and the identity matrix.
Multiplicative inflation can be performed most easily on theanalysis ensemble by multiplying
Wa by an appropriate factor (namely√ρ , if one wants to multiply the analysis covariance byρ). To
perform multiplicative inflation on the background ensemble instead, one can multiplyXb by such
a factor, and adjust the background ensemble{xb(i)} accordingly before applying the observation
operatorH to form the background observation ensemble{yb(i)}. A more efficient approach, which
is equivalent ifH is linear, is simply to replace(k−1)I by (k−1)I/ρ in (21), since(k−1)I is the
inverse of the background covariance matrixPb in thek-dimensional spaceS. One can check that
this has the same effect on the analysis meanxa and covariancePa as multiplyingXb andYb by√ρ . If ρ is close to one, this is a good approximation to inflating the background ensemble before
applying the observation operator even when this operator is nonlinear.
Multiplicative inflation of the background covariance can be thought of as applying a discount
factor to the influence of past observations on the current analysis. Since this discount factor is
applied during each analysis, the cumulative effect is thatthe influence of an observation on future
analyses decays exponentially with time. The inflation factor determines the time scale over which
observations have a significant influence on the analysis. Other methods of covariance inflation
have a similar effect, causing observations from sufficiently far in the past essentially to be ignored.
Thus, covariance inflation localizes the analysis in time. This effect is especially desirable in the
presence of model error, because then the model can only reliably propagate information provided
by the observations for a limited period of time.
3 Efficient Computation of the Analysis
Here is a step-by-step description of how to perform the analysis described in the previous section,
designed for efficiency both in ease of implementation and inthe amount of computation and mem-
ory usage. Of course there are some trade-offs between theseobjectives, so in each step we first
describe the simplest approach and then in some cases mention alternate approaches and possible
gains in computational efficiency. We also give a rough accounting of the computational complexity
17
![Page 18: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/18.jpg)
of each step, and at the end discuss the overall computational complexity. After that, we describe
an approach that in some cases will produce a significantly faster analysis, at the expense of more
memory usage and more difficult implementation, by reorganizing some of the steps. As before, we
use “grid point” in this section to mean a spatial location inthe forecast model, whether or not the
model is actually based on a grid geometry; we use “array” to mean a vector or matrix. The use of
“columns” and “rows” below is for exposition only; one should of course store arrays in whatever
manner is most efficient for one’s computing environment.
The inputs to the analysis are a background ensemble ofm[g]-dimensional model state vectors
{xb(i)[g]
: i = 1,2, . . . ,k}, a functionH[g] from them[g]-dimensional model space to theℓ[g]-dimensional
observation space, anℓ[g]-dimensional vectoryo[g] of observations, and anℓ[g]×ℓ[g] observation error
covariance matrixR[g]
. The subscriptg here signifies that these inputs reflect the global model state
and all available observations, from which a local subset should be chosen for each local analysis.
How to choose which observations to use is entirely up to the user of this method, but a reasonable
general approach is to choose those observations made within a certain distance of the grid point at
which one is doing the local analysis and determine empirically which value of the cutoff distance
produces the “best” results. If one deems localization to beunnecessary in a particular application,
then one can ignore the distinction between local and global, and skip Steps 3 and 9 below.
Steps 1 and 2 are essentially global operations, but may be done locally in a parallel implemen-
tation. Steps 3–8 should be performed separately for each local analysis (generally this means for
each grid point, but see the parenthetical comment at the endof Step 3). Step 9 simply combines
the results of the local analyses to form a global analysis ensemble{xa(i)[g]
}, which is the final output
of the analysis.
1. Apply H[g] to eachxb(i)[g]
to form the global background observation ensemble{yb(i)[g]
}, and
average the latter vectors to get theℓ[g]-dimensional column vectoryb[g]. Subtract this vector
from each{yb(i)[g]
} to form the columns of theℓ[g] × k matrix Yb[g]. (This subtraction can be
done “in place”, since the vectors{yb(i)[g]
} are no longer needed.) This requiresk applications
of H, plus 2kℓ[g] (floating-point) operations. IfH is an interpolation operator that requires
only a few model variables to compute each observation variable, then the total number of
operations for this step is proportional tokℓ[g] times the average number of model variables
required to compute each scalar observation.
2. Average the vectors{xb(i)[g]
} to get the m[g]-dimensional vectorxb[g], and subtract this vector
from eachxb(i)[g]
to form the columns of the m[g]×k matrixXb
[g]. (Again the subtraction can be
done “in place”; the vectors{xb(i)[g]
} are no longer needed). This step requires a total of 2km[g]
operations. (IfH is linear, one can equivalently perform Step 2 before Step 1,and obtainyb[g]
18
![Page 19: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/19.jpg)
andYb[g] by applyingH to xb
[g] andXb[g].)
3. This step selects the necessary data for a given grid point(whether it is better to form the
local arrays described below explicitly or select them later as needed from the global arrays
depends on one’s implementation).Select the rows ofxb[g] andXb
[g] corresponding to the given
grid point, forming their local counterparts: the m-dimensional vectorxb and the m×k matrix
Xb, which will be used in Step 8. Likewise, select the rows ofyb[g] andYb
[g] corresponding to the
observations chosen for the analysis at the given grid point, forming theℓ-dimensional vector
yb and theℓ× k matrixYb. Select the corresponding rows ofyo[g] and rows and columns of
R[g]
to form theℓ-dimensional vectoryo and theℓ×ℓ matrixR. (For a high-resolution model,
it may be reasonable to use the same set of observations for multiple grid points, in which
case one should select here the rows ofXb[g] andxb
[g] corresponding to all of these grid points.)
4. Compute the k× ℓ matrix C = (Yb)TR−1. If desired, one can multiply entries ofR−1 or C
corresponding to a given observation by a factor less than one to decrease (or greater than one
to increase) its influence on the analysis. (For example, onecan use a multiplier that depends
on distance from the analysis grid point to discount observations near the edge of the local
region from which they are selected; this will smooth the spatial influence of observations, as
described in Section 2.3 under “Local Implementation”.) Since this is the only step in whichR
is used, it may be most efficient to computeC by solving the linear systemRCT = Yb rather
than invertingR. In some applications,R may be diagonal, but in othersR will be block
diagonal with each block representing a group of correlatedobservations. As long as the size
of each block is relatively small, invertingR or solving the linear system above will not be
computationally expensive. Furthermore, many or all of theblocks that make upR may be
unchanged from one analysis time to the next, so that their inverses need not be recomputed
each time. Based on these considerations, the number of operations required (at each grid
point) for this step in a typical application should be proportional tokℓ, multiplied by a factor
related to the typical block size ofR.
5. Compute the k×k matrixPa =[
(k−1)I/ρ +CYb]−1
, as in (21).Hereρ > 1 is a multiplica-
tive covariance inflation factor, as described at the end of the previous section. Though trying
some of the other approaches described there may be fruitful, a reasonable general approach
is to start withρ > 1 and increase it gradually until one finds a value that is optimal according
to some measure of analysis quality. MultiplyingC andYb requires less than 2k2ℓ operations,
while the number of operations needed to invert thek×k matrix is proportional tok3.
6. Compute the k× k matrixWa = [(k−1)Pa]1/2, as in (24). Again the number of operations
required is proportional tok3; it may be most efficient to compute the eigenvalues and eigen-
19
![Page 20: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/20.jpg)
vectors of[
(k−1)I/ρ +CYb]
in the previous step and then use them to compute bothPa and
Wa.
7. Compute the k-dimensional vectorwa = PaC(yo− yb), as in (20), and add it to each column
of Wa, forming a k× k matrix whose columns are the analysis vectors{wa(i)}. Computing
the formula forwa from right-to-left, the total number of operations required for this step is
less than 3k(ℓ+k).
8. Multiply Xb by eachwa(i) and addxb to get the analysis ensemble members{xa(i)} at the
analysis grid point, as in (25).This requires 2k2m operations.
9. After performing Steps 3–8 for each grid point, the outputs of Step 8 form the global analysis
ensemble{xa(i)[g]
}.
We now summarize the overall computational complexity of the algorithm described above. If
p is the number local analyses performed (equal to the number of grid points in the most basic
approach), then notice thatpm= m[g]
, while pℓ = qℓ[g]
, whereℓ is the average number of observa-
tions used in a local analysis andq is the average number of local analyses in which a particular
observation is used. Ifℓ is large compared tok andm, then the most computationally expensive step
is either Step 5, requiring approximately 2k2pℓ = 2k2qℓ[g]
operations over all the local analyses,
or Step 4, whose overall number of operations is proportional to kpℓ = kqℓ[g], but with a propor-
tionality constant dependent on the correlation structureof R[g]
. In any case, as long as the typical
number of correlated observations in a block ofR[g] remains constant, the overall computation time
grows at most linearly with the total numberℓ[g]
of observations. It also grows at most linearly with
the total numberm[g] of model variables; ifm[g] is large enough compared toℓ[g], then the most
expensive step is Step 8, with 2k2m[g]
overall operations. The terms in the computation time that
grow with the number of observations or number of model variables are at most quadratic in the
numberk of ensemble members. However, for a sufficiently large ensemble, the matrix operations
in Steps 5 and 6 that take of orderk3 operations per local analysis, ork3p operations overall, will
become significant.
In Section 5, we present some numerical results for which we find the computation time indeed
grows roughly quadratically withk, linearly withq, and sublinearly withℓ[g].
Batch processing of observations. Some of the steps above have aq-fold redundancy, in that
computations involving a given observation are repeated over an average ofq different local analy-
ses. For a general observation error covariance matrixR[g]
this redundancy may be unavoidable, but
it can be avoided as described below if the global observations can be partitioned into local groups
(or “batches”) numbered 1,2, . . . that meet the following conditions. First, all of the observations in
20
![Page 21: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/21.jpg)
a given batch must be used in the exact same subset of the localanalyses. Second, observations in
different batches must have uncorrelated errors, so that each batchj corresponds to a blockR j in
a block diagonal decomposition ofR[g]. (These conditions can always met ifR[g] is diagonal, by
making each batch consist of a single observation. However,as explained below, for efficiency one
should make the batches as large as possible while still meeting the first condition.) Then at Step 3,
instead of selecting (overlapping) submatrices ofyb[g], Yb
[g], yo[g], andR[g], for each grid point, letyb
j ,
Ybj , yo
j , represent the rows corresponding to the observations in batch j, and do the following for
each batch. Compute and store thek× k matrix C jYbj and thek-dimensional vectorC j(y
oj − yb
j ),
whereC j = (Ybj )
TR−1j as in Step 4. (This can be done separately for each batch, in parallel, and
the total number of operations is roughly 2k2ℓ[g].) Then do Steps 5–8 separately for each local
analysis; whenCYb andC(yo− yb) are required in Steps 5 and 7, compute them by summing the
corresponding arraysC jYbj andC j(y
oj − yb
j ) over the batchesj of observations that are used in the
local analysis. To avoid redundant addition in these steps,batches that are used in exactly the same
subset of the local analyses should be combined into a singlebatch. The total number of operations
required by the summations over batches roughlyk2ps, wheres is the average number of batches
used in each local analysis. Both this and the 2k2ℓ[g]
operations described before are smaller than
the roughly 2k2pℓ = 2k2qℓ[g] operations they combine to replace.
This approach has similarities with the “sequential” approach of [21] and [45], in which ob-
servations are divided into uncorrelated batches and a separate analysis is done for each batch; the
analysis is done in the observation space whose dimension isthe number of observations in a batch.
However, in the sequential approach, the analysis ensemblefor one batch of observations is used
as the background ensemble for the next batch of observations. Since batches with disjoint local
regions of influence can be analyzed separately, some parallelization is possible, though the LETKF
approach described above is more easily distributed over a large number of processors. For a serial
implementation, either approach may be faster depending onthe application and the ensemble size.
4 Asynchronous Observations: 4D-LETKF
In theory, one can perform a new analysis each time new observations are made. In practice, this
is a good approach if observations are made at regular and sufficiently infrequent time intervals.
However, in many applications, such as weather forecasting, observations are much too frequent
for this approach. Imagine, for example, a 6-hour interval between analyses, like at the National
Weather Service. Since weather can change significantly over such a time interval, it is important
to consider observations taken at intermediate times in a more sophisticated manner than to pretend
that they occur at the analysis time (or to simply ignore them). Operational versions of 3D-Var and
4D-Var (see Section 2.2) do take into account the timing of the observations, and one of the primary
21
![Page 22: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/22.jpg)
strengths of 4D-Var is that it does so in a precise manner, by considering which forecast model
trajectory best fits the observations over a given time interval (together with assumed background
statistics at the start of this interval).
We have seen that the analysis step in an ensemble Kalman filter considers model states that
are linear combinations of the background ensemble states at the analysis time, and compares these
model states to observations taken at the analysis time. Similarly, we can consider approximate
model trajectories that are linear combinations of the background ensemble trajectories over an
interval of time, and compare these approximate trajectories with the observations taken over that
time interval. Instead of asking which model trajectory best fits the observations, we ask which
linear combination of the background ensemble trajectories best fits the observations. As before, this
is relatively a low-dimensional optimization problem thatis much more computationally tractable
than the full nonlinear problem.
This approach is similar to that of an ensemble Kalman smoother [10, 8], but over a much
shorter time interval. As compared to a “filter”, which estimates the state of a system at time
t using observations made up to timet, a “smoother” estimates the system state at timet using
observations made before and after timet. Over a long time interval, one must generally take a more
sophisticated approach to smoothing than to simply consider linear combinations of an ensemble of
trajectories generated over the entire interval, both because the trajectories may diverge enough that
linear combinations of them will not approximate model trajectories, and because in the presence of
model error there may be no model trajectory that fits the observations over the entire interval. Over
a sufficiently short time interval however, the approximation of true system trajectories by linear
combinations of model trajectories with similar initial conditions is quite reasonable.
While this approach to assimilating asynchronous observations is suitable for any ensemble
Kalman filter [23], it is particularly simple to implement inthe LETKF framework. We call this
extension 4D-LETKF; see [19] for an alternate derivation ofthis algorithm.
To be more concrete, suppose that we have observations(τ j ,yoτ j
) taken at various timesτ j since
the previous analysis. LetHτ jbe the observation operator for timeτ j and letRτ j
be the error
covariance matrix for these observations. In Section 2.3, we mapped a vectorw in thek-dimensional
spaceS into observation space using the formulayb + Ybw, where the background observation
meanyb and perturbation matrixYb were formed by applying the observation operatorH to the
background ensemble at the analysis time. So now, for each timeτ j , we applyHτ jto the background
ensemble at timeτ j , calling the mean of the resulting vectorsybτ j
and forming their differences from
the mean into the matrixYbτ j
.
We now form a combined observation vectoryo by concatenating (vertically) the (column) vec-
torsyoτ j
, and similarly by vertical concatenation of the vectorsybτ j
and matricesYbτ j
respectively, we
form the combined background observation meanyb and perturbation matrixYb. We form the cor-
22
![Page 23: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/23.jpg)
responding error covariance matrixR as a block diagonal matrix with blocksRτ j(this assumes that
observations taken at different times have uncorrelated errors, though such correlations if present
could be included inR).
Given this notation, we can then use the same analysis equations as in the previous sections,
which are based on minimizing the cost functionJ∗ given by (19). (We could instead write down
the appropriate analogue to (15) and minimize the resultingnonlinear cost functionJ; this would
be no harder in principle than in the case of synchronous observations.) Referring to Section 3,
the only change is in Step 1, which one should perform for eachobservation timeτ j (using the
background ensemble and observation operator for that time) and then concatenate the results as
described above. Step 2 still only needs to be done at the analysis time, since its output is used only
in Step 8 to form the analysis ensemble in model space. All of the intermediate steps work exactly
the same, in terms of the output of Step 1.
In practice, the model will be integrated with a discrete time step that in general will not coincide
with the observation timesτ j . One should either interpolate the background ensemble trajectories
to the observation times, or simply round the observation times off to the nearest model integration
time. In either case, one must either store the background ensemble trajectories until the analysis
time, or perform Step 1 of Section 3 during the model integration and store its results. The latter
approach will require less storage if the number of observations per model integration time step is
less than the number of model variables.
One can perform localization in the same manner as with synchronous observations, but it may
be advantageous to take into account the timing of the observations when deciding which of them
to use in a given local analysis. For example, due to spatial propagation in the model dynamics,
one may wish to include earlier observations from a greater distance than later observations. On the
other hand, earlier observations may be less useful than observations closer to the analysis time due
to model error; it may help then to decrease the influence of the earlier observations as described in
Step 4 of Section 3.
5 Numerical Experiments with Real Atmospheric Observations
We have implemented the 4D-LETKF algorithm, as described inSections 3 and 4, with the oper-
ational Global Forecast System (GFS) model [14] of the U.S. National Centers for Environmental
Prediction (NCEP). This model is used (currently at higher resolution than we describe below) for
National Weather Service forecasts. Previously we have published results using this model with the
LEKF algorithm of [36, 37], in a perfect model scenario (withsimulated observations) [41]. Using
the same parameters for the LETKF algorithm, we have obtained results very similar to those in
[41], which we do not repeat here; with LETKF, the data assimilation steps run 3 to 5 times as fast
23
![Page 24: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/24.jpg)
as with LEKF.
Here, we present some preliminary results obtained using the 4D-LETKF algorithm with the
same model and real atmospheric observations collected in January and February 2004, and com-
pare them with results from the NCEP Spectral Statistical Interpolation (SSI) [15], a state-of-the-art
implementation of 3D-Var. Further results will appear in a future publication. Our data set in-
cludes all operationally assimilated observations exceptfor satellite radiances and measurements
of atmospheric humidity. Observations include vertical sounding profiles of temperature and wind
by weather balloons, surface pressure observations by landand sea stations, temperature and wind
reports by commercial aircraft, and wind vectors derived from satellite based observation of clouds.
For all of the results below, we assimilate observations every 6 hours, and we use a model
resolution of T62 (a 192×94 longitude-latitude grid) with 28 vertical levels, for a total of about
500,000 points. In our 4D-LETKF implementation, for each grid point, we selected observations
from within ah×h×v subset of the model grid, centered at the analysis grid point, with the vertical
heightvvarying (depending on the vertical level) from 1 to 7 grid points as in [41], and the horizontal
width h held constant for each experiment at either 5 or 7 grid points. The number of ensemble
membersk we use in each experiment is either 40 or 80. In all 4D-LETKF experiments we used
a spatially-dependent multiplicative covariance inflation factorρ , which we taper from 1.15 at the
surface to 1.1 at the top of the model atmosphere in the Southern Hemisphere, and from 1.25 at the
surface to 1.15 at the top in the Northern Hemisphere (between 30◦S and 30◦N latitudes, we linearly
interpolate between these values).
5.1 Analysis Quality
In this section, we compare the analyses from our 4D-LETKF implementation, usingk = 40 ensem-
ble members and a 7×7 horizontal grid (h= 7) for each local region, and from the NCEP SSI, using
the same model resolution and the same observational data set for its global analysis (we call this the
“benchmark” SSI analysis). To estimate the analysis error for a given state variable, we compute
the spatial RMS difference between its analysis and the operational high-resolution SSI analysis
computed by NCEP (we call this the “verification” SSI analysis). While this verification technique
favors the benchmark SSI analysis, which is obtained with the same data assimilation method, it can
provide useful information in regions where the 4D-LETKF and benchmark SSI analyses exclude
a large portion of the observations assimilated into the verifying SSI analysis. Such a region is the
Southern Hemisphere, where satellite radiances are known to have a strong positive impact on the
quality of the analysis.
We initialize 4D-LETKF with a random ensemble of physicallyplausible global states at mid-
night on 1 January 2004. Specifically, we take each initial ensemble member from an operational
NCEP analysis at a randomly chosen time between 15 January and 31 March 2004. The 4D-LETKF
24
![Page 25: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/25.jpg)
analyses start to synchronize with the observations after afew days. To exclude from our compari-
son the transient error due to initialization of 4D-LETKF, we average all estimated errors over the
month of February 2004 only.
Figure 1 shows the estimated analysis error of each method for temperature in the Southern
Hemisphere extratropics (20◦S to 90◦S latitudes) as a function of atmospheric pressure. The 4D-
LETKF analysis is more accurate than the benchmark SSI at allexcept near the surface, where the
two methods are quite similar in accuracy. The advantage of the 4D-LETKF analysis is especially
large in the upper atmosphere, where observations are extremely sparse. Figure 2 makes the same
comparison between the 48-hour forecasts generated from the respective analyses, again verified
against the operational high-resolution SSI analysis at the appropriate time. We see that the 4D-
LETKF forecasts are also more accurate than those from the background SSI analysis, especially in
the upper atmosphere.
atm
osph
eric
pre
ssur
e (h
Pa)
RMS analysis temperature error (K)
Figure 1: Vertical profile of the estimated analysis temperature error, in degrees Kelvin, for the 4D-
LETKF (solid) and benchmark SSI (dashed) analyses in the Southern Hemisphere extratropics. The
atmospheric height is indicated by pressure, in hectopascal. The estimate of the error is obtained by
calculating the root-mean-square difference between eachanalysis and the verifying SSI analysis
for latitudes between 20◦S and 90◦S and averaging over all the analysis times in February 2004.
For wind in the Southern Hemisphere and temperature and windin the Northern Hemisphere,
the RMS analysis and forecast errors for the 4D-LETKF and benchmark SSI are more similar to
each other, though in all cases the 4D-LETKF results are significantly better in the highest part of
the atmosphere (0 hPa to 100 hPa).
25
![Page 26: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/26.jpg)
atm
osph
eric
pre
ssur
e (h
Pa)
RMS forecast temperature error (K)
Figure 2: Estimated 48-hour forecast temperature error versus atmospheric pressure for the 4D-
LETKF (solid) and benchmark SSI (dashed) methods in the Southern Hemisphere extratropics.
The estimated forecast error is computed in the same way as the estimated analysis error shown in
Figure 1.
We emphasize that these results are obtained with modest tuning of the 4D-LETKF parameters,
and we expect further significant improvements from a more thorough exploration of the algorithm’s
parameter space, as well as a more sophisticated approach tomodel error, such as the adaptive bias-
correction technique of [3].
Qualitatively similar results with the same model and a similar data set are reported in [46], using
both an alternate implementation of the 4D-LETKF algorithm(with a different covariance inflation
approach) and a related method based on the Ensemble Square-Root Filter of [45]. The latter method
was slightly more accurate in the Northern Hemisphere and slightly less accurate in the Southern
Hemisphere, and both methods were more accurate than the corresponding benchmark SSI, when
verified both against the operational high-resolution SSI analysis and against observations. See
also [22] and [34] for comparisons of ensemble Kalman filter results to those of other operational
3D-Var methods, using forecast models from the Meteorological Service of Canada and the Japan
Meteorological Agency, respectively; the latter article also implements the LETKF approach.
5.2 Computational Speed
In this section, we present and discuss timing results from several representative analyses of the
4D-LETKF experiment using the GFS described above, with different numbers of observations. In
26
![Page 27: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/27.jpg)
addition, we vary the number of ensemble members (k = 40 or 80) and the horizontal width of
the local region (h = 5 or 7 grid points in both latitude and longitude). Though we use a parallel
implementation, we report in Table 1 below the total CPU timeused on a Linux cluster of forty 3.2
GHz Intel Xeon processors. The actual run time is many times faster; with the larger local region
(h = 7) the analysis takes about 6 minutes on our cluster withk = 40 ensemble members, and 18
minutes withk = 80. Thus, the results shown in Figures 1 and 2 can be obtained in an operational
setting that allots only a few minutes for each analysis. Furthermore, because the observations
are very nonuniformly distributed spatially, we expect to be able to reduce the parallel run time
considerably by balancing the load more evenly between processors. We will report details of our
parallel implementation in a future publication.
Table 1 shows the total CPU time in seconds for 4 different 4D-LETKF parameter sets at each
of 4 different analysis times. Different numbers of observations are available for each analysis time,
with about 50% more at 1200 GMT than at 0600 GMT. The computation time generally grows with
the number of observations, though not by as large a factor. Referring to the discussion immediately
following Steps 1 to 9 in Section 3, this indicates that the matrix multiplication portion of Step 5
that requires on the order ofk2qℓ[g]
total floating point operations is a significant component ofthe
computation time, but that other parts of the computation are significant too. (Recall thatℓ[g] is the
global number of observations andq is the average number of analyses in which each observation is
used, which in this implementation is roughly the average number of grid points per local region.)
As h increases from 5 to 7, the value ofq approximately doubles, and so does the computation
time. And ask increases from 40 to 80, the computation time grows by a factor of 4 to 5, indicating
that the time is roughly quadratic ink but suggesting that terms that are cubic ink are becoming
significant.
analysis time 0600 GMT 1800 GMT 0000 GMT 1200 GMT
# observations 159,947 193,877 236,168 245,850
k = 40,h = 5 945 945 1244 1142
k = 40,h = 7 1846 2076 2105 2200
k = 80,h = 5 4465 4453 5124 5010
k = 80,h = 7 9250 10631 10463 10943
Table 1: Total CPU time in seconds (on 3.2GHz Intel Xeon processors) for various analyses with
4D-LETKF using the GFS with approximately 500,000 model grid points. Columns represent
different analysis times, arranged in increasing order of the number of observations assimilated.
Rows represent different values of the ensemble sizek and horizontal localization widthh. Notice
that even on a single processor, all of these analyses can be done in less than real time.
27
![Page 28: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/28.jpg)
Indeed, examining the CPU time spent in various subroutineson different processors confirms
that most of the time is spent in Steps 5 and 6, and that in localanalyses where observations are
dense, the matrix multiplication in Step 5 dominates the computation time, while in local analyses
where observations are sparse, the matrix inverse and square root in Steps 5 and 6 dominate. We find
that the latter operations take more time in the analyses with the larger local region; this suggests
that the iterate eigenvalue routine we use takes longer to run in cases when the presence of more
observations causesPa to be further from a multiple of the identity matrix. There isalso some
computational overhead not accounted for in Section 3 whosecontribution to the computation time
is not negligible, in particular determining which observations to use for each local analysis.
Overall, our timing results indicate that with a model and data set of this size, a substantially
larger ensemble size than we currently use may be problematic, but that our implementation of
4D-LETKF should be able to assimilate more observations with at most linear growth in the com-
putation time. Furthermore, though we do not vary the numberof model variables in Table 1, our
examination of the time spent performing each of the steps from Section 3 suggests that we can
increase the model resolution significantly without havingmuch effect on the analysis computation
time (though the time spent running the model would of courseincrease accordingly).
6 Summary and Acknowledgments
In this article, we have described a general framework for data assimilation in spatiotemporally
chaotic systems using an ensemble Kalman filter that in its basic version (Section 3) is relatively
efficient and simple to implement. In a particular application, one may be able to improve accuracy
by experimenting with different approaches to localization (see the discussion in Sections 2.2 and
2.3), covariance inflation (see the end of Section 2.3), and/or asynchronous observations (Section 4).
For very large systems and/or when maximum efficiency is important, one should consider carefully
the comments about implementation in Section 3 (and at the end of Section 4, if applicable). One
can also apply this method to low-dimensional chaotic systems, without using localization.
In Section 5, we presented preliminary results for a relatively straightforward implementation
of the LETKF approach with real atmospheric data and an operational global forecast model. Our
results demonstrate that this implementation can produce results of operational quality within a few
minutes on a parallel computer of reasonable size. The efficiency of the basic algorithm provides
many opportunities to improve the quality of the results with the variations discussed and referred
to in this article.
Many colleagues have contributed to this article in variousways. We thank M. Cornick, E. Fer-
tig, J. Harlim, H. Li, J. Liu, T. Miyoshi, and J. Whitaker for sharing their thoughts and experiences
implementing and testing the LETKF algorithm in a variety ofscenarios. We also thank C. Bishop,
28
![Page 29: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/29.jpg)
T. Hamill, K. Ide, E. Kalnay, E. Ott, D. Patil, T. Sauer, J. Yorke, M. Zupanski, and the anonymous
reviewers for their generous input. This feedback resultedin many improvements to the exposition
in this article and to our implementation of the algorithm. We thank Y. Song, Z. Toth, and R. Wobus
for providing us with the observations and benchmark analyses used in Section 5, and we thank
G. Gyarmati for developing software to read the observations on our computers. This research was
supported by grants from NOAA/THORPEX, the J. S. McDonnell Foundation, and the National Sci-
ence Foundation (grant #ATM034225). The second author gratefully acknowledges support from
the NSF Interdisciplinary Grants in the Mathematical Sciences program (grant #DMS0408012).
References
[1] J. L. Anderson, An ensemble adjustment Kalman filter for data assimilation, Mon. Wea. Rev.
129, 2884–2903 (2001).
[2] J. L. Anderson & S. L. Anderson, A Monte Carlo implementation of the nonlinear filtering prob-
lem to produce ensemble assimilations and forecasts, Mon. Wea. Rev.127, 2741–2758 (1999).
[3] S.-J. Baek, B. R. Hunt, E. Kalnay, E. Ott, I. Szunyogh, Local ensemble Kalman filtering in the
presence of model bias, Tellus A58, 293–306 (2006).
[4] C. H. Bishop, B. J. Etherton, S. J. Majumdar, Adaptive sampling with the ensemble transform
Kalman filter. Part I: Theoretical aspects, Mon. Wea. Rev.129, 420–436 (2001).
[5] G. Burgers, P. J. van Leeuwen, G. Evensen, Analysis scheme in the ensemble Kalman filter,
Mon. Wea. Rev.126, 1719–1724 (1998).
[6] A. Doucet, N. de Freitas, N. Gordon, eds.,Sequential Monte Carlo Methods in Practice,
Springer-Verlag, 2001.
[7] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte
Carlo methods to forecast error statistics, J. Geophys. Res. 99, 10143–10162 (1994).
[8] G. Evensen, The ensemble Kalman filter: theoretical formulation and practical implementation,
Ocean Dynam.53, 343–367 (2003).
[9] G. Evensen,Data Assimilation: the Ensemble Kalman Filter, Springer (2006).
[10] G. Evensen & P. J. van Leeuwen, An ensemble Kalman smoother for nonlinear dynamics,
Mon. Wea. Rev.128, 1852–1867 (2000).
29
![Page 30: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/30.jpg)
[11] E. J. Fertig, J. Harlim, B. R. Hunt, A comparative study of 4D-VAR and a 4D ensemble Kalman
filter: perfect model simulations with Lorenz-96, Tellus A59 (2007), in press.
[12] I. Fukumori, A partitioned Kalman filter and smoother, Mon. Wea. Rev.130, 1370–1383
(2002).
[13] M. Ghil & P. Malanotte-Rizzoli, Data assimilation in meteorology and oceanography, Adv.
Geophys.33, 141–266 (1991).
[14] Global Climate and Weather Modeling Branch (Environmental Modeling Center,
Camp Springs MD), The GFS Atmospheric Model, NCEP Office Note442 (2003),
http://www.emc.ncep.noaa.gov/officenotes/FullTOC.html
[15] Global Climate and Weather Modeling Branch (Environmental Modeling Center,
Camp Springs MD), SSI Analysis System 2004, NCEP Office Note443 (2004),
http://www.emc.ncep.noaa.gov/officenotes/FullTOC.html
[16] T. M. Hamill & C. Snyder, A hybrid ensemble Kalman filter–3D variational analysis scheme
Mon. Wea. Rev.1282905–2919 (2000).
[17] T. M. Hamill, J. S. Whitaker, and C. Snyder, Distance-dependent filtering of background error
covariance estimates in an ensemble Kalman filter, Mon. Wea.Rev.129, 2776–2790 (2001).
[18] J. Harlim & B. R. Hunt, A non-Gaussian ensemble filter forassimilating infrequent noisy
observations, Tellus A, to appear.
[19] J. Harlim & B. R. Hunt, Four-dimensional local ensembletransform Kalman filter: numerical
experiments with a global circulation model, preprint.
[20] P. L. Houtekamer & H. L. Mitchell, Data assimilation using an ensemble Kalman filter tech-
nique, Mon. Wea. Rev.126, 796–811 (1998).
[21] P. L. Houtekamer & H. L. Mitchell, A sequential ensembleKalman filter for atmospheric data
assimilation, Mon. Wea. Rev.129, 123–137 (2001).
[22] P. L. Houtekamer, H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, B. Hansen,
Atmospheric data assimilation with an ensemble Kalman filter: results with real observations,
Mon. Wea. Rev.133, 604–620 (2005).
[23] B. R. Hunt, E. Kalnay, E. J. Kostelich, E. Ott, D. J. Patil, T. Sauer, I. Szunyogh, J. A. Yorke,
A. V. Zimin, Four-dimensional ensemble Kalman filtering, Tellus A 56, 273–277 (2004).
30
![Page 31: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/31.jpg)
[24] K. Ide, P. Courtier, M. Ghil, A. C. Lorenc, Unified notation for data assimilation: operational,
sequential, and variational, J. Meteo. Soc. Japan75, 181–189 (1997).
[25] A. H. Jazwinski,Stochastic Processes and Filtering Theory, Academic Press (1970).
[26] R. E. Kalman, A new approach to linear filtering and prediction problems, Trans. ASME, Ser.
D: J. Basic Eng.82, 35–45 (1960).
[27] R. E. Kalman & R. S. Bucy, New results in linear filtering and prediction theory, Trans. ASME,
Ser. D: J. Basic Eng.83, 95–108 (1961).
[28] E. Kalnay,Atmospheric Modeling, Data Assimilation and Predictability, Cambridge Univ.
Press (2002).
[29] E. Kalnay & Z. Toth, Removing growing errors in the analysis, Proc. of the 10th Conf. on
Numerical Weather Prediction, Amer. Meteo. Soc., Portland, Oregon (1994).
[30] C. L. Keppenne, Data assimilation into a primitive-equation model with a parallel ensemble
Kalman filter, Mon. Wea. Rev.128, 1971–1981 (2000).
[31] F.-X. Le Dimet & O. Talagrand, Variational algorithms for analysis and assimilation of mete-
orological observations: theoretical aspects, Tellus A38, 97–110 (1986).
[32] A. C. Lorenc, A global three-dimensional multivariatestatistical interpolation scheme, Mon.
Wea. Rev.109, 701–721 (1981).
[33] A. C. Lorenc, The potential of the ensemble Kalman filterfor NWP—a comparison with 4D-
Var, Quart. J. Roy. Meteo. Soc.129, 3183–3203 (2003).
[34] T. Miyoshi & S. Yamane, Local ensemble transform Kalmanfiltering with an AGCM and a
T159/L48 resolution, preprint.
[35] M. Oczkowski, I. Szunyogh, D. J. Patil, Mechanisms for the development of locally low-
dimensional atmospheric dynamics, J. Atmos. Sci.62, 1135–1156 (2005).
[36] E. Ott, B. R. Hunt, I. Szunyogh, M. Corazza, E. Kalnay, D.J. Patil, J. A. Yorke, A. V. Zimin,
E. J. Kostelich, Exploiting local low dimensionality of theatmospheric dynamics for efficient
ensemble Kalman filtering,arXiv:physics/0203058v3 (2002).
[37] E. Ott, B. R. Hunt, I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay, D. J.
Patil, J. A. Yorke, A local ensemble Kalman filter for atmospheric data assimilation, Tellus A56,
415–428 (2004).
31
![Page 32: Hunt_et_al_2007](https://reader034.vdocument.in/reader034/viewer/2022051216/563db9c9550346aa9a9fea4f/html5/thumbnails/32.jpg)
[38] D. F. Parrish & J. C. Derber, The National Meteorological Center’s spectral statistical-
interpolation analysis system, Mon. Wea. Rev.120, 1747–1763 (1992).
[39] D. J. Patil, B. R. Hunt, E. Kalnay, J. A. Yorke, E. Ott, Local low dimensionality of atmospheric
dynamics, Phys. Rev. Lett.86, 5878–5881 (2001).
[40] L. M. Pecora & T. L. Carroll, Synchronization in chaoticsystems, Phys. Rev. Lett. 64, 821–824
(1990).
[41] I. Szunyogh, E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, J. A.
Yorke, Assessing a local ensemble Kalman filter: perfect model experiments with the National
Centers for Environmental Prediction global model, TellusA 57, 528–545 (2005).
[42] O. Talagrand & P. Courtier, Variational assimilation of meteorological observations with the
adjoint vorticity equation I: theory, Quart. J. Roy. Meteo.Soc.113, 1311–1328 (1987).
[43] M. K. Tippett, J. L. Anderson, C. H. Bishop, T. M. Hamill,J. S. Whitaker, Ensemble square-
root filters, Mon. Wea. Rev.131, 1485–1490 (2003).
[44] X. Wang, C. H. Bishop, S. J. Julier, Which is better, an ensemble of positive-negative pairs or
a centered spherical simplex ensemble?, Mon. Wea. Rev.132, 1590–1605 (2004).
[45] J. S. Whitaker, T. M. Hamill, Ensemble data assimilation without perturbed observations, Mon.
Wea. Rev.130, 1913–1924 (2002).
[46] J. S. Whitaker, T. M. Hamill, X. Wei, Y. Song, Z. Toth, Ensemble data assimilation with the
NCEP global forecast system, preprint.
[47] F. Zhang, C. Snyder, J. Sun, Impacts of initial estimateand observation availability on
convective-scale data assimilation with an ensemble Kalman filter, Mon. Wea. Rev.132, 1238–
1253 (2004).
[48] M. Zupanski, Maximum likelihood ensemble filter: theoretical aspects, Mon. Wea. Rev.133,
1710–1726 (2005).
32