+r- c{vmdavis/docs/nonlinear_filtering.pdf · diffusion processes is presented. using the...
TRANSCRIPT
+r-C{v
:/.AN INTRODUCTIONTO NONLINEARFILTERING
M.H.A.Davis
Department of Electrical EngineeringImperial College, London SW72BT, England.
Steven I. Marcus
Department of Electrical EngineeringThe University of Texas at AustinAustin, Texas 78712, U.S.A.
ABSTRACT
l-e--
In this paper we provide an introduction to nonlinearfiltering from two points of view: the innovations approachand the approach based upon an unnormalized conditional density.The filtering problem concerns the estimation of an unobservedstochastic process {Xt} given observations of a related process
{Yt}; the classic problem is to calculate, for each t, the
conditional distribution of Xt given {Ys,02.S~ t}. First, abrief review of key results on martingales and Markov anddiffusion processes is presented. Using the innovations approach,stochastic differential equations for the evolution of conditionalstatistics and of the conditional measure of Xt given {ys,°2.S2.t}are given; these equations are the analogs for the filteringproblem of the Kolmogorov forward equations. Several examplesare discussed. Finally, a less complicated evolution equation isderived by considering an "unnormalized" conditional measure.
53
M. Hazewinkel and J. C. Willems (eds.), Stochastic Systems: The Mathematics of Filtering and Identificationand Applications, 53-75.Copyright @ 1981 by D. Reidel Publishing Company.
54 M. H. A. DAVIS AND S. I. MARCUS
I. INTRODUCTION
Filtering problems concern "estimating" something about anunobserved stochastic process {Xt} given observations of a related
process {Yt}; the classic problem is to calculate, for each t,
the conditional distribution of Xt given {ys,O~s~t}. This wassolved in the context of linear system theory by Ka1man and Bucy[1],[2] in 1960,1961, and the resulting "Kalman filter" has ofcourse enjoyed immense success in a wide variety of applications.Attempts were soon made to generalize the results to systems withnonlinear dynamics. This is an essentially more difficult problem,being in general infinite-dimensional, but nevertheless equationsdescribing the evolution of conditional distributions wereobtained by several authors in the mid-sixties; for example, Bucy[3], Kushner [4], Shiryaev [5], Stratonovich [6], Wonham [7]. In1969 Zakai [8] obtained these equations in substantially simplerform using the so-called "reference probability" method (see Wong[9)) .
In 1968 Kailath [10] introduced the "innovations approach" tolinear filtering, and the significance for nonlinear filtering wasimmediately appreciated [11], namely that the filtering problemought to be formulated in the context of martingale theory. Thedefinitive treatment from this point of view was given in 1972 byFujisaki, Kal1ianpur and Kunita [12]. Textbook accounts includingall the mathematical background can be found in Liptser andShiryaev [13] and Kallianpur [14].
More recent work on nonlinear filtering has concentrated onthe following areas (this list and the references are not intendedto be exhaustive):
(i) Rigorous formulation of the theory of stochastic partialdifferential equations (Pardoux [15], Kry10v and Rozovskii [16]);
(ii) Introduction of Lie algebraic and differentialgeometric methods (Brockett [17]);
(iii) Discovery of finite dimensional non1inear filters(Benes [18]);
(iv) Development of "robust" or "pathwise" solutions of thefiltering equations (Davis [19]);
(v) Functional integration and group representation methods(Mitter [30)).
All of these topics are dealt with in this volume and all ofthem use the basic equations of non1inear filtering theory: theFujisaki, et.a1., equation [12] and/or the Zakai equation [8].These equations can be derived in a quick and self-contained way,modulo some technical results, the statements of which arereadily appreciated and the details of which can be found in the
~.
.
...-.
AN INTRODUCTION TO NONLINEAR FILTERING 55
references [13],[14]. This is the purpose of the present article.
~
The general problem can be described as follows. The signal
or state process {Xt} is a stochastic process which cannot be
observed directly. Information concerning {Xt} is obtained from
the observation process {Yt}, which we will assume is given by. t
Yt = f zsds+wt0
where {Zt} is a process "related" to {Xt} (e.g., Zt = h(xt)) and
{Wt} is a Brownian motion process. The process {Yt} is to be
thought of as noisy nonlinear observations of the signal {Xt}.The objective is to compute least squares estimates of functionsof the signal Xt given the "past" observations {ys'O.::.s.::.t} --
i.e. to compute quantities of the form E[</>(xt)IYs,O.::.s.::.t]. Inaddition, it is desired that this computation be done recursively
in terms of a statistic {~t} which can be updated using only newobservations:
(1)
~t+T = y(t,T,~t'{Yt+u,O.::.u.::.-r}), (2)
and from which estimates can be calculated in a "pointwise" or"memoryless" fashion:
E[</>(Xt)IYs,O.::.s.::.t]= 6(t'Yt,~t).
In general, ~t will be closely related to the conditional
distribution of Xt given {ys'O.::.s.::.t}, but in certain special
cases ~t will be computable with a finite set of stochastic
differential equations driven by {Yt} (see [20] for some examples).
( 3)
In order to obtain specific results, additional structurewill be assumed for the process {Xt}; we will assume throughout
that {Xt} is a semimartingale (see Section 11), but more detailed
results will be derived under the assumption that {Xt} is a Markovprocess or in particular a vector diffusion process of the form
t t
Xt = xO+ J f(xs)ds+ I G(xs)d8s,0 0
where Xt €]Rn and 8t €]Rmis a vector of independent Brownian
(4 )
.
56 M. H. A. DA VIS AND S. I. MARCUS
motion processes. General terminology and precise assumptionswill be presented in Section 11. In Section Ill, Markov processesof the form (4) will be studied, and Kolmogorov's equations forthe evolution of the unconditional distribution (i.e. withoutobservations) of the process {Xt} will be presented. The
corresponding equations for the conditional distribution of Xtgiven {ys,°2.S2.t} will be derived in Section IV using the"innovations approach". Finally, in Section V we derive a lesscomplex set of equations for an unnormalized conditionaldistribution of Xt' in the form given by Zakai [8].
II. TERMINOLOGYAND ASSUMPTIONS
In this section we review certain notions concerningstochastic processes and martingales; for further tutorialmaterial on martingale integrals and stochastic calculus, thereader is referred to the tutorial of R. Curtain in this volumeand the paper of Davis [21] (see also [9],[13],[22]-[24]). Allstochastic processes will be defined on a fixed probability space(~,F,P) and a finite time interval [O,T], on which there isdefined an increasing family of a-fields {Ft,02.t2.T}. It is
assumed that each process {Xt} is adapted to Ft--i.e. Xt is Ft-
measurable for all t. The a-field generated by {xs,°2.S2.t} is
denoted by Xt = a{xs,°2.S2.t}. (xt,Ft) is a mar>tingaZe if Xt is
adapted to Ft, Elxtl <00, and E[xtIFs] = Xs for t~s. (xt,Ft) isa super>mar>tingaZe if E[Xt IF] < x and a submar>tingaZeifs - sE[xtIFs] ~ xS' The process (xt,Ft) is a semimar>tingaZeif it has
a decomposition xt=xO+at+mt' where (mt,Ft) is a martingale and
{at} is a process of bounded variation. Given two square
integrable martingales (mt,Ft) and (nt,Ft), one can define the
pr>edictabZe quadr>atic covar>iation «m,n>t,Ft) to be the unique"predictable process of integrable variation" such that(mtnt-<m,n>t,Ft) is a martingale [29, p.34]. For the purposes ofthis paper, however, the only necessary facts concerning <m,n>are that (a) <m,n>t =0 if mtnt is a martingale; and (b) if S is astandard Brownian motion process, then
Lt 1
It 2 It 1 2<B,B>t = t and < 11dB, 11dB > t = 1111ds.
0 s sOs sOs s
:2
AN INTRODUCTION TO NONLlNEAR FILTERING 57
'.
In this tutorial exposition, the following hypotheses willbe assumed for all nonlinear estimation problems:
HI. {Yt} is a real-valued process;
H2. {Wt} is a standard Brownian motion process;
JT 2H3. E[ z ds] < 00
0 s
H4. {Zt} is independent of {Wt}.Hypotheses (HI) and (H4) can be weakened, but the calculationsbecomemore involved [8],[12],[13, Chapter 8]. Similar resultsto those derived here can also be derived in the case that {Wt}is replaced by the sum of a Brownian motion and a counting process[25]. Hypotheses on the process {Xt} and the relationship betweenXt and Wt will be imposed as they are needed in the sequel.
Finally, we will need two special cases of Ito's differential
rule. Suppose that (~~,Ft)' i=I,2, are semimartingales of theformi i i i
~t = ~O+ at + mt'
where {m~}, i=I,2, are square integrable martingales with
and {a~} sample continuous. Then
1 2 1 2 (t 1 2 Jt 2 1 1 2
~t ~t = ~O~O+ JO ~s d~s + 0 ~s d~s + <m ,m >t'
(5 )
1{mt}
(6a)
Also, if ~ is a twice continuously differentiable function of aprocess x of the form (4), then
n . n
It 2 . .
1J;(Xt)= 1J;(Xo)+.I ~ (Xs)dX~ + i . ~ di \jJj (xs)a 1J(Xs)ds1=1 dX 1,J=1 0 dX dX (6b). . . th
where A(x) = [a1J(x)]:=G(x}G'(x) and Xl denotes the i componentof x.
I I!. MARKOVAND DIFFUSION PROCESSES
A very clear account of the material in this section can be
found in Wong's book [9]. A stochastic process {xt,te [a,T]} isa Markov process if for any a.::. s .::.t.::. T and any Bore 1 set B of thestate space S,
P(xteB!\) = P(xteBlxs)'
58 M. H. A. DA VIS AND S. I. MARCUS
For any Markov process {Xt}' we can define the transitionprobability function
P(s,x,t,B):= P(xteB/xs=x),
which can easily be shown to satisfy the Chapman-Kolmogorovequation: for any 0.:::.s .:::.u.:::.t.:::.T,
P(s,x,t,B) = 1 P(u,y,t,B) P(s,x,u,dy).S
In addition, all finite dimensional distributions of a Markovprocess are determined by its initial distribution and transition
probabili~y function. A Markov process {Xt} ~s homogeneousifP(s+u,x,t+u,B) = P(s,x,t,B) for all O.:::.S.:::.t.:::.T and O.:::.s+u~t+u < T.
(7)
For a homogeneous Markov process {Xt} and f€B(S) (i.e. f isa bounded measurable real-valued function on S), define
Ttf(x) = Ex[f(xt)]:= J~ f(y)P(O,x,t,dy).
The Chapman-Kolmogorovequation then implies that Tt is a
semigroup of operators acting on B(s); i.e. Tt+sf(x) = Tt(Tsf)(x)
for t,s~O. The generator L of Tt (or, of {Xt}) is the operatoracting on a domainD(L)C B(S) given by
L<p = lim 1. (T <p - <p),t+ot t
the limit being uniform in xeS and D(L) consisting of allfunctions such that this limit exists. It is immediate from thisand the semigroup property that
ddt Tt<P = LTt<P ( 8)
and (8) is, in abstract form, the backward equation for theprocess. Writing it out in integral form and recalling thedefinition of Tt gives the Dynkin formula:
t
Ex[<P(Xt)] - <p(x) = Ex ~ L<P(xs)ds.
This implies, using the Markov property again, that the process
M~ defi ned for <pe D(L) by
( 9)
-.
AN INTRODUCTION TO NONLlNEAR FILTERING
t
M~ = IjJ(Xt) -1jJ(X) - 1 LIjJ(xs)ds0
is a martingale [26. p.4]. This property can be used as adeflnition of'L; this is the'approach pioneered by Stroock andVaradhan[26]. Then L is knownas the extendedgenepatopof {Xt}.
since there maybe functions IjJ for which M~is a martingale butwhich are not in D(L) as previously defined.
There is another semigroup of operators associated with {Xt}'namely the operators which transfer the initial distribution ofthe process into the distributions at later times t. Moreprecisely. let M(S) be the set of probability measures on Sanddenote
59
(la)
<1jJ.]l> = IIjJ(X)]l(dX)S
for IjJ€B(S). ].l€M(S). Suppose Xo has
the distribution of Xt is given by
UtTI(A) = P[xt€A] = E(IA(Xt)) = <TtIA.TI>.
distribution TI€ M(S); then
This shows that Ut is adjoint to Tt in that
<1jJ,Ut7T>= <Tt 1jJ.TI> (=EIjJ(xt))
for IjJ€ B(S). TI€ M(S). Thus the generator of Ut is L*. the adjoint
of L. and TIt:=UtTI satisfies
d *dt TIt = L TIt' TIo=TI. ( 11)
This is the fOPWaPdequation of Xt in that it describes theevolution of the distribution TIt of Xt. The objective offiltering theory is to obtain a similar description of theconditional distribution Of Xt given {Ys'S2. t}.
In order to get these results in more explicit form weconsider in the remainder of this section a process {Xt}satisfying a stochastic differential equation of the form (4).
where {St} is an Rm-valued standard Brownian motion process
independent of xo' For simplicity we assume that fand G do notdepend explicitly on t (this is no loss of generality. since the
,
60 M. H. A. DA VIS AND S. I. MARCUS
"process" ,(t) = t can be accommodated by augmenting (4) with theequation d,/dt=l, ,(0)=0). Under the usual Lipschitz andgrowth assumptions which guarantee existence and uniqueness of(strong) solutions of (4), the following results can be prove9[9],[22]-[24].
Theorem 1: The solution of (4) is a homogeneous Markovprocess with infinitesimal generator
n. ail n . . a2L = L f'(x)--'-+2" L a'J(x) . .
i=1 ax' i,j=1 ax'axJij i
where A(x) = [a (x)]:=G(x)G'(x), and f andcomponents of f and x, respectively.
(12)
id t th . th
x eno e e,
Hence Ito's rule (6b) in this case can be written ast t
1jJ(Xt) = 1jJ(xO)+ J L1jJ(xs)ds + I 'V1jJ'(xs)G(xs)dl\,0 0
emphasizing again that M~(see (10)) is a martingale (here V1jJisthe gradient of 1jJwith respect to x, expressed as a column vector).It can also be shown [24] that the solution of (4) satisfies theFeller and strong Markov properties, and is a diffusion processwith drift vector f and diffusion matrix A. If this process hasa smooth density then the abstract equations (8) and (11)translate into Kolmogorov's backward and forward equations forthe transition density.
Theorem 2 [24, p.l04]:has a transition density:
Assume that the solution {Xt} of (4)
P(s,x,t,B) = J p(s,x,t,y)dyB
satisfying
a) for t-s > 0 > 0, p(s,x,t,y) is continuous and bounded in s, t,and x;
b)i 2
th t. 1 d
.t . le. U a p . te par,a enva ,ves a' ., . . ex,s.s ax' ax' axJ
Then for 0 < s < t, p satisfies the Kolmogorov backward equation
aas p(s,x,t,y) + Lp(s,x,t,y) = 0 ( 13)
AN INTRODUCTION TO NONLINEAR FILTERING 61
with lim p(s,x,t,y) = 6(x-y) and L given by (12); i.e. p is thes t t
fundamental solution of (13).
Outline of Proof: From (7), we have
p(s+h,x,t,y) - p(s,x,t,y)
= fp(S,X,S+h,Z)[P(S+h,x,t,y) - p(s+h,z,t,y)]dz.
Dividing both sides by h and letting h-+O yields (13) by usingthe definition of L.
More relevant to filtering problems is the Kolmogorovforward equation.
Theorem 3 [24, p.l02]: Assume that {Xt} satisfying (4) has2. t
.d .
t (,
t ) d th t af aA a A .£Q.a tranSl lon enSl y p s,x, ,y , an a --;-,--;-, i j' at'
2 ax1 ax1 ax ax
~, and -4 exist. Then for 0< s < t, P satisfies the Kolmogorovy ay
fOY'liJard equation
.£Q. (s,x,t,y)atn a i
= - L 1 (f (y)p(s,x,t,y))i=1 ay
1 n a2 ij+"2 L .. (a (y)p(s,x,t,y))
i,j=1 aylai
:= L* p(s,x,t,y)
(14)
where L* is the formal adjoint of L. Also, the initial condition
is lim p(s,x,t,y) = 6(y-x).t + s .
Outline of Proof: Assume, for simplicity of notation, that{Xt} is a scalar diffusion (n=I). From (9), we have
;t fp(s,x,t,Z)~(Z)dZ =fp(s,x,t,Z)L~(Z)dZ ( 15)
for some twice continuously differentiable function which vanishesoutside some finite interval. The derivative and integral on theleft-hand side of (15) can be interchanged, and an integration byparts then yields
62 M. H. A. DAVIS AND S. I. MARCUS
- a~ f aJp(s,x,t,Z)f(Z)az(z)dz = - ~(z) az(f(z)p(s,x,t,z))dz,
2 a2~f
a2 2fp(s,x,t,z)g (z) -:2 (z)dz = ~(z) ~ (g (z)p(s,x,t,z))dz,. az azhence
f{~ (s,x,t,z) + a~ [f(z)p(s,x,t,z)]
1 a2 2-2~ [g (z)p(s,x,t,z)J) Hz)dz = 0,az
Since the expression in curly brackets is continuous and ~(z) isan arbitrary twice differentiable function vanishing outside afinite interval, (14) follows.
Wenote that if Xo has distribution PO' then the density of
Xt is p(t,YJ .= !p(O,x,t,y)PO(dx), and p(t,y) also satisfies (14).Conditions for the existence of a density satisfying thedifferentiability hypotheses of Theorems 2 and 3 are given in[24, pp.96-99] (see also Pardoux [15]).
IV. THE INNOVATIONS APPROACH TO NONLINEAR FILTERING
In this section we derive stochastic differential equationsfor the evolution of conditional statistics and of the conditionaldensity for nonlinear filtering problems of the types discussedin Sections I and 11; the equations will be the analogs of (9)and the Kolmogorov forward equation for the filtering problem.We will follow the innovations approach, as presented in [12] and[13]; this approach was originally suggested by Kailath [10] (forlinear filtering) and Frost and Kailath [11].
Assume that the observations have the form (1) and that
(H1)-(H4) hold. Define Yt:=CJ{ys,02.S~t}; for any process Ilt
we use the notation fit:=E[llt]Yt]. Nowintroduce the innovationsprocess:
t
\'\:=Yt- f zsds.0
The incremental innovations Vt+h-Vt represent the "new
information" concerning the process {Zt} available from the
observations between t and t+h, in the sense that Vt+h-Vt is
(16)
AN INTRODUCTION TO NONLINEAR FILTERING 63
independent of Yt'are crucial.
The following properties of the innovations
Lemma1: The process (Vt'Yt) is a standard Brownian motion
process. Furthermore, Ysand a{v -Vt'O 2-s~. t < u2-T} areindependent. u
Proof: From (16) we have for s<t,t -
E[vtIYs] = vs+E[ f (z -2 )du+w t -w IYs]'s u u s( 17)
The second term on the ri.ght-hand side of (17) is zero; here wehave used the fact that Wt-Wsis independent of Vs' Hence
(Vt'Yt) is a martingale. Consider now the quadratic variation
of {Vt}: for t € [O,T] fix an integer n and define
Q~ = L n [v«k+l)/2n) -v(k/2n)]2.O<k<2 t
The almost sure limit (as n+oo) of Q~:=Qt is the quadratic
variation of Vt. It is easy to see that the quadratic variationtof J (z -2 )du is zero, so that the quadratic variation of Vt° u u
is the same as that of Wt' or Qt=t. But by a theorem of Doob[12, Lemma2.1], a square integrable martingale with continuoussample paths and quadratic variation t is a standard Brownianmotion, and the lemmafollows.
Notice that the very specific conclusions of Lemma1regarding the structure of the innovations process are validwithout any restrictions on the distributions of Zt' The nextlemma is related to Kailath's "innovations conjecture". By
definition Vt is Yt-measurable and a{vs,°2-S2-t}<=Yt. The
innovations conjecture is that Yt<=o{vs,°2-S2-t}, and hence thatthe two a-fields are equal; i.e. the observations and innovationsprocesses contain the same information. At the time that [12]was written, the answer to this question was not known under verygeneral conditions on {Zt}; recently, it has been shown in [27]that the conjecture is true under the conditions (Hl)-(H4). Itis a well-known fact [13, Theorem 5.6] that all martingales ofBrownian motion are stochastic integrals, and the point of a
64 M. H. A. DA VIS AND S. I. MARCUS
positive answer to the innovations conjecture is that it enablesany Yt-martingale to be written as a stochastic integral with
respect to the innovationsprocess {Vt}. The essentialcontribution of Fujisaki, Kallianpur and Kunita [12] was to showthat this representation holds whether or not the innovationsconjecture is valid. Specifically, they showed:
Lemma2: Every square integrable martingale (mt'Yt) with
respect to the observation cr-fields Yt is sample continuous andhas the representation
t
mt = E[mO]+ f Tlsdvs0
where r E[Tls2]ds<oo and {Tlt} is jointly measurable and adapted0
to Yt. In other words, mt can be written as a stochastic integral
with respect to the innovations process. (But note that {Tlt} isadapted to Yt and not necessarily to F~.)
In order to obtain a general filtering equation, let usconsider a real-valued Ft-semimartingale ~t and derive an equation
satisfied by ~t. Wehave in mind semimartingales <p(Xt)where <pis some smooth real-valued function and {Xt} is the signal process,but it i~ just as easy to consider a general semimartingale of theform
(18)
t
~t = ~O + f asds + nt0
where (nt,Ft) is a martingale.
Theorem4: Assume that {~t} and {Yt} are given by (19) and
(1), respectively, and that <n,w>t = O. Then {~t} satisfies thestochastic differential equation
t t
~t = ~O + J asds + f [~- ~SZs] dvs'0 0
(19)
(20)
Proof: First we define
~ ~
Jt
~t:= ~t-~o - 0 &sds
AN INTRODUCTION TO NONLlNEAR FILTERING 65
and show that (]Jt,Vt) is a martingale. Now, for s < t,
E[€t-€sIVs] = E[~t-~sIVs]t
= E[~ auduIVs]+E[nt-nsIVs]
t
= E[f E[au!Vu]duIVs] + E[E[nt-nsIFs] IVs]'s( 21)
The last term in (21) is zero, since (nt,Ft) is a martingale;
thus (21) proves that (]Jt,Vt) is a martingale. Hence,
€ = € +t 0
t
fa as ds + ]Jt
t t
I asds+ lllsdvs0 0( 22)
A
= ~ +0
where the last term in (22) follows from Lemma2.
It remains only to identify the precise form of llt, usingIto's differential rule (6a) and an idea introduced by Wong [28].
From (1) and (19), and since <n,w>t =0,
t t
~tYt = ~oYa+ f ~s(zsds + dws) + f ys(asds + dns)'0 0(23)
Also, from (16) and (22),
t t t
€tYt = ~oYa+ ~ ~s(z~ds+dvs)+ ~ys(asds+llsdVs)+ ~ llsds.(24)
Nowit follows immediately from properties of conditionalexpectations that for t2:.s,
E[~tYt-~tYtIVs] = a.
Calculating this from (23),(24) we see thatA A
llt = ~tZt - ~tZt. (25 )
I
66 M. H. A. DA VIS AND S. I. MARC US
Inserting (25) into (22) gives the desired result (20).
Formula (20) is not very useful as it stands (it is not a
recursive equation for €t), but we can use it to obtain moreexplicit results for filtering of Markov processes.
Theorem5: Assumethat {Xt} is a homogeneousMarkovprocesswith infinitesimal generator L, that {Yt} is given by (1) with
Zt = h(xt)' and that {Xt} and {Wt} are independent. Then for any
<P € D(L), TIt(<P):=E[<p(xt) IVt] satisfies
t t
TIt(<P) = TIO(<p) + fa TIs(L<P)ds+ fa [TIs(h<p)-TIs(h)TIs(<P)]dvs' (26)
Proof: Notice that (Mt,Ft) (see (10)) is a martingale, so
that ~t:=<P(Xt) is of the form (19) with at:=L<P(xt)' nt:=M~.Also, it is shown in [12, Lemma4.2] that the independence of
{Xt} and {Wt} impl ies <M<P,w>t =O. The theorem then followsimmediately from Theorem 4.
Remarks: (i) Since {TIt(<P): <P€ D(L)} determines a measured
valued stochastic process TIt' (26) can be regarded as a recursive(infinite-dimensional) stochastic differential equation for theconditional measure TIt of Xt given Vt' and TIt(<P) is a conditional
statistic computed from TIt in a memoryless fashion (see (2)-(3)).In general, however, it is not possible to derive a finite ~
dimensional recursive filter, even for the conditional mean Xt;some special cases in which finite dimensional recursive filtersexist are given in Examples 1 and 3 below.
. 1
(ii) If Wt in (1) were multiplied by r>2with r>O, one wouldsuspect that as r-HO the observations would become infinitelynoisy, thus giving no information about the state; i.e. TIt(<P)
would reduce to the unconditional expectation E[<P(xt)]. In fact,in this case the last term in (26) is multiplied by r-1, so (26)reduces to (9) as r -HO.
Example 1 [7]: Let {Xt} be a fini~e state Markov process
taking values in S= {s1,...,sN}. Let p~ be the probability that
AN INTRODUCTION TO NONLINEAR FILTERING
Xt=Si' and assume that Pt:=[P~,...,p~]1 satisfies
67
ddt Pt = APt.
(This is the forward equation for {Xt}; cf.(ll).) Given the
observations (1), the conditional distribution of Xt given Yt can
be determined from (26) as follows. Let <)J(x)= [<)J1(x),...,<)JN(x)]',where
11
x=s., 1
<)Ji(X)= 0, XFsi.
Then applying (26) to each <)J. yields the following: let1
B = diag(h(sl),...,h(sN)) and let b = [h(sl),...,h(sN)]'Of -i - P - IY d - - -1 -N, h
1 Pt - [Xt-Si t] an Pt - [Pt'."'Pt] , we ave
t tPt = PO+i APsds+ J [B-(b'ps)I] ps(dys-(b'ps)ds).
0 0
. Then
In this case, the conditional distribution is determinedrecursively by N stochastic differential equations.
Example2: Assumethat {Xt} is a diffusion process given by(4) with infinitesimal g~nerator (12) and that the conditionaldistribution of Xt given Yt has a density p(t,x). Then underappropriate differentiability hypotheses [13, Theorem 8.6], onecan do an integration by parts in (26) (precisely as in Theorem 3above) to obtain the stochastic partial differential equation
dp(t,x) = L*p(t,x)dt+p(t,x) [h(x)-1ft(h)]dVt ( 27)
where
1ft(h) =fh(X)P(t,X)dX. ( 28)
This is a recursive equation for the computation of p(t,x); it isnot only infinite dimensional but has a complicated structure dueto the presence of the integral in (28). Equation (27) is theanalog of the Kolmogorov forward equation; in fact, (27) reducesto (13) as the observation noise approaches 00 (see Remark (ii)).
68 M. H. A. DA VIS AND S. I. MARCUS
The conditional mean cannot in general be computed with afinite dimensional recursive filter, as is seen by letting~(x) = x in (26):
t t
Xt = Xo+ fo Tfs(f)ds + ~ [Tfs(hx)-Tfs(h)xs] dvs'(29)
Hence, Tft(f), Tft(hx), and Tft(h) are all necessary for the
computation of Xt' etc. One case in which this calculation ispossible is given in the next example.
Example 3 (Kalman-Bucy Filter): Suppose, for simplicity,that {Xt} and {Yt} are given by the following scalar "linear-Gaussian" equations:
t
Xt = Xo+ faxsds +bWt0
t
Yt = f cxsds + Vt0
where Xo is Gaussian and independent of {Wt} and {Vt}. Then(29) yields
t tA A f A
f (2
)A 2 A
Xt=Xo+ axsds+c [TfSX -Xs] [dYs-cxsds]0 0
t t
=xO+ faxs+cf Pt(dYs-cxsds)0 0
where Pt:=E[(Xt-Xt)2)Yt] is the conditional error covariance.
However, since {Xt} and {Yt} are jointly Gaussian, Pt isnonrandom and constitutes a "gain" process which can beprecomputed and stored. Pt satisfies the differential equation(derived from (26) by noticing that the third central momentofa Gaussian distribution is zero):
(30)
ddt Pt = 2aPt + b2 - c2p2t.
Since Pt is nonrandom and the differential equation for Xt
'"
--
AN INTRODUCTION TO NONLlNEAR FILTERING 69
involves no other conditional statistics, it constitutes arecursive one-dimensional filter (the Kalman-Bucyfilter) forthe computation of the conditional mean.
V. THEUNNORMALIZEDEQUATIONS
Throughout this section it will be assumed that {Xt} is a
homogeneousMarkov process with infinitesimal generator L, {Yt}
is given by (1) with Zt = h(xt)' and {Xt} and {Wt} are
independent. In this case, the conditional measure TIt satisfiesthe equation (26), but it is often more convenient to work witha less complicated equation which is obtained by considering an"unnormalized" version of TIt' The unnormalized equations arederived in [9, Chapter 6] and [8]; the use of measuretransformations will follow these references, but we will use ashorter derivation of the unnormalized equations, via (26) andIto Is rul e.
The first step is to define a new measure Po on themeasurable space (Q,F) by
f dPOPO(A) = 1 dP (w) P{dw)A
for all A€ F, where
dP0J
T 1 IT 2dP = exp{- h{xs)dys +"2 h (xs)ds)0 0
is the Radon-Nikodymderivative of Po with respect to P.
Lemma3 [9, p.232]: Po has the following properties:
(a) Po is a probability measure -- i.e. PO(Q) = 1;
(b) Under PO' {Yt} is a standard Brownian motion;
(c) Under PO' {Xt} and {Yt} are independent;
(d) {Xt} has the same distributions under Po as under P;
(e) P is absolutely continuous with respect to Po with Radon-Nikodymderivative
70 M. H. A. DA VIS AND S. 1. MARCUS
dP dP0 -1 iT T
dPO = (dP) = exp( 0 h(xs)dys - -}~ h2(xs)dS).
It can also be shown [13, Section 6.2] that
Jt 1 It 2
At:= exp( 0 h(xs)dys - 2" 0 h (Xs)dS)
is a martingale with respect to Ft and PO' so thatdP
At = EO[dP IFt] ,0
where EO is the expectation with respect to PO' It can be shown[9, p.234] that
EO[~(Xt}AtIYt] -. 0t(~)'lTt(~):= E[~(xt)IYt] = E [A IY] -. cyn-.0 t t
(31 )
Hence conditional statistics of Xt given Yt' in terms of theoriginal measure P, can be calculated in terms of conditionalstatistics under the measure PO' We now proceed to derive a
recursive equation for the measure at; an approach to solving(31) by a path integration of the numerator and denominator ispursued in some other papers in this volume.
Since 0t(4J) = 0t(l) . 'lTt(4J) and we have the equation (26)
for 'lTt(4J), an equation for 0t(4J) is derived by finding a
stochastic differential equation for 0t(1):= EO[AtIYt] andapplying Ito's rule.
Lemma4: EO[AtIYt] is given by the formula
A Jt 1 it 2)At:= Eo[AtIYt] = exp( 'lTs(h)dys- 2" 7Ts(h)ds.0 0
( 32)
Proof: By Ito's rule, At satisfiest
At = 1 + ~ Ash(xs)dys.( 33)
i'
AN INTRODUCTION TO NONLINEAR FILTERING 71
It follows as in the proof of Theorem 4 that At is a martingale
with respect to Yt. Since {Yt} is a Brownian motion under Po'
there must exist a Yt-adapted process {nt} such that[13, Theorem 5.6]
t
At = 1+ fo nsdys'(34)
We identify nt by the same technique as in Theorem 4:and Ito's rule,
from (33)
-t rt it
AtYt = Jo Asdys+ Jo YsAsh(xs)dys + 0 Ash(xs)ds.(35)
From (34) and Ito's rule,
t t t
AtYt = I Asdys + i Ysnsdys + i nsds.0 0 0
Now E[AtYt-AtYtIYs] = 0 for t~s, and calculating this from(35) and (36) yields
(36)
-------nt = Ath(xt):= EO[Ath(Xt)IYt].
But from (31),
(37)
EO[Ath(Xt)IYt] = nt(h)At,
so (34) becomes
t
At = 1+ f Asn s ( h) dy s .0
( 38)
However, this has the unique solution
A t t
At = exp(Jo ns(h)dys- iJo n~(h)ds)
and the lemmais proved.
72 M. H. A. DA VIS AND S. I. MARCUS
Theorem 6: For any cp € D(L), 0t(CP) satisfies
t t
0t(CP) = 00(CP)+ I 0s(Lcp)ds + f 0s(hcp)dys.0 0( 39)
Proof: By Ito's rule, we have from (26) and (38):
d(At1ft(CP)) = At[1ft(Lcp)dt + (1ft(hct»-1ft(h)1ft(cp) )(dYt-1ft(h)dt)]
+ 1ft(CP)[At1ft(h)dYt] + [1ft(hcp)-1ft(h)1ft(cp) ]At1ft(h)dt
= At 1f t ( Lcp) d t + At 1ft ( hcp) dy t '
which gives (39) since 0t(cp) = At1ft(cp).
The remarks following Theorem 5 are also applicable here.In addition, we note that the Stratonovich version of (39), whichis utilized in a number of papers in this volume, is:
t t
0t(cp) = 00(cp) + Io 0s(Lcp)ds + fo 0s(hcp)odys(40)
where
- 1 2Lcp(x) = Lcp ( x) - "2 h (x) cp (x)
and 0 denotes a Stratonovich (symmetric) stochastic integral[9], [22].
Example 4: Under the assumptions of Example 2, we can
derive a stochastic differential equation for q(t,x):= Atp(t,x);this is interpreted as an unnormalized conditional densitY3 sincethen
p(t,x) = ~~yq1WTdx.
As in Example 2, an integration by parts in (39) yields thestochastic partial differential equation:
dq(t,x) = L*q(t,x)dt+h(x)q(t,x)dYt. (41)
~
..,
AN INTRODUCTION TO NONLINEAR FILTERING 73
Notice that (41) has a much simpler structure than (27): it doesnot involve an integral such as TIt(h), and it is a bilinear
stochastic differential equation with {Yt} as its input. Thisstructure is utilized by a number of papers in this volume.
ACKNOWLEDGMENT
The work of S. I. Marcus was supported in part by the U.S.National Science Foundation under grant ENG-76-11106.
REFERENCES
1. R. E. Kalman, "A new approach to linear filtering andprediction problems," J. Basic Eng. ASME,82, 1960,pp. 33-45.
2. R. E. Kalman and R. S. Bucy, "Newresults in 1inearfiltering and prediction theory," J. Basic Engr. ASMESeries D, 83, 1961, pp. 95-108.
3. R. S. Bucy, "Nonlinear filtering," IEEETrans. AutomaticControl, AC-10, 1965, p. 198.
4. H. J. Kushner, "On the differential equations satisfied byconditional probability densities of Markov processes," SIAMJ. Control, 2, 1964, pp. 106-119.
5. A. N. Shiryaev, "Some new results in the theory of controlledstochastic processes [Russian]," Trans. 4th Prague Conferenceon Information Theory~ Czech. Academy of Sciences, Prague,1967.
6. R. L. Stratonovich, Conditional Markov Processes and TheirApplication to the Theory of Optimal Control. NewYork:Elsevier, 1968. .
7. W. M. Wonham, "Some applications of stochastic differentialequations to optimal nonlinear filtering," SIAMJ. Control,2, 1965, pp. 347-369.
8. M. Zakai, "On the optimal filtering of diffusion processes,"Z. Wahr. Verw. Geb., 11, 1969, pp. 230-243.
9. E. Wong,Stochastic Processes in Information and DynamicalSystems. New York: McGraw-Hill, 1971.
74 M. H. A. DA VIS AND S. I. MARCUS
10. T. Kailath, "An innovations approach to least-squaresestimation -- Part I: Linear filtering in additive whitenoise," IEEE Trans. Automatic Control, AC-13, 1968,pp. 646-655.
11. P. A. Frost and T. Kailath, "An innovations approach toleast-squares estimation Ill," IEEE Trans. Automatic Control,AC-16, 1971, pp. 217-226.
12. M. Fujisaki, G. Kallianpur, and H. Kunita, "Stochasticdifferential equations for the nonlinear filtering problem,"Osaka J. Math., 1, 1972, pp. 19-40.
13. R. S. Liptser and A. N. Shiryaev, Statistics of RandomProcesses I. New York: Springer-Verlag, 1977.
14. G. Kallianpur, Stochastic Filtering Theory. Berlin-Heidelberg-New York: Springer-Verlag, 1980.
15. E. Pardoux, "Stochastic partial differential equations andfiltering of diffusion processes," Stochastics, 2, 1979,pp. 127-168 [see also Pardoux' article in this volume].
16. N. V. Krylov and B. L. Rozovskii, "On the conditionaldistribution of diffusion processes [Russian]," IzvestiaAkad. Nauk SSSR, Math Series 42, 1978, pp. 356-378.
17.
18.
R. W. Brockett, this volume.
V. E. Benes, "Exact finite dimensional filters for certaindiffusions with nonlinear drift," Stochastics, to appear.
19.
20.
M. H. A. Davis, this volume.
J. H. Van Schuppen, "Stochastic filtering theory: Adiscussion of concepts, methods, and results," inStochasticControl Theory and Stochastic DifferentialSystems, M. Kohlmann and W. Vogel, eds. New York:Springer-Verlag, 1979.
M. H. A. Davis, "Martingale integrals and stochasticcalculus," in Corronunication Systems and Random ProcessTheory, J. K. Skwirzynski, ed. Leiden: Noordhoff, 1978.
21.
22. L. Arnold, Stochastic Differential Equations.Wiley, 1974.
New York:
23. A. Friedman, Stochastic Differential Equations andApplications, Vol. 1. New York: Academic Press, 1975.
.
.,
.
'0
>:
AN INTRODUCTION TO NONLlNEAR FILTERING 75
24. I. I. Gihmanand A. V. Skorohod, Stochastic DifferentialEquations. New York: Springer-Verlag, 1972.
25. 1. Gertner, "An alternative approach to nonlinear filtering,"Stochastic Processes and their Applications, 7, 1978,pp. 231-246.
26. D. W. Stroock and S. R. S. Varadhan,MultidimensionalDiffusion Processes. New York: Springer-Verlag, 1979.
27. D. All inger and S. K. Mitter, "New results on the innovationsproblem for nonlinear filtering," Stochastics, to appear.
28. E. Wong, "Recent progress in stochastic process -- a survey,"IEEE Trans. Inform. Theory, IT-19, 1973, pp. 262-275.
29. J. Jacod, Calcul Stochastique et Problemes de Martingales.Berlin-Heidelberg-New York: Springer-Verlag, 1979.
S. K. Mitter, this volume.30.