3 recursive bayesian estimation

1

RecursiveBayesian Estimation

SOLO HERMELIN

Updated: 22.02.09 11.01.14

http://www.solohermelin.com

http://www.solohermelin.com/

2

SOLOTable of Content Recursive Bayesian Estimation

Review of Probability

Conditional Probability

Total Probability Theorem

Conditional Probability - Bayes Formula

Statistical Independent Events

Expected Value or Mathematical Expectation

Variance and Central Moments

Characteristic Function and Moment-Generating Function

Probability Distribution and Probability Density Functions (Examples)

Normal (Gaussian) Distribution

Existence Theorems 1 & 2

Monte Carlo Method Estimation of the Mean and Variance of a Random Variable

Generating Discrete Random Variables

Existence Theorem 3Markov Processes

Functions of one Random Variable

The Laws of Large Numbers

Central Limit Theorem

Problem Definition

Stochastic Processes

3

SOLO

Table of Content (continue -1)Recursive Bayesian Estimation

Bayesian Estimation Introduction

Linear Gaussian Markov SystemsClosed-Form Solutions of Estimation

Kalman Filter

Extended Kalman FilterGeneral Bayesian Nonlinear Filters

Additive Gaussian Nonlinear FilterGauss – Hermite Quadrature ApproximationUnscented Kalman Filter

Monte Carlo Kalman Filter (MCKF)Non-Additive Non-Gaussian Nonlinear Filter

Nonlinear Estimation Using Particle FiltersImportance Sampling (IS)

Sequential Importance Sampling (SIS)

Sequential Importance Resampling (SIR)

Monte Carlo Particle Filter (MCPF)Bayesian Maximum Likelihood Estimate (Maximum Aposteriori – MAP Estimate)

4

SOLO

Table of Content (continue -2)Recursive Bayesian Estimation

References

Nonlinear Filters based on the Fokker-Planck Equation

5

SOLO Recursive Bayesian Estimation

kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )11, −− kk wxf

( )kk vxh ,

( )00 , wxf

( )11,vxh

( )11, wxf

( )22 ,vxh

Since this is a probabilistic problem, we start with a remainder of Probability Theory

A discrete nonlinear system is defined by

( )( )kkk

kkk

vxkhz

wxkfx

,,

,,1 11

=−= −− State vector dynamics

Measurements

kk vw ,1− State and Measurement Noise Vectors, respectively

Problem Definition:Estimate the hidden States of a Non-linear Dynamic Stochastic System from Noisy Measurements .

kx

kz

Table of Content

6

SOLO

Pr (A) is the probability of the event A if

S nAAAA ∪∪∪= 21

1A 2A nA

jiOAA ji ≠∀/=∩

( ) 0Pr ≥A(1)

(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21

( ) 1Pr =S(2)

then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++=

Probability Axiomatic Definition

Probability Geometric Definition

Assume that the probability of an event in a geometric region A is defined as theratio between A surface to surface of S.

( ) ( )( )SSurface

ASurfaceA =Pr

( ) 0Pr ≥A(1)

( ) 1Pr =S(2)

(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21


S

A

Review of ProbabilityA more detailed explanationof the subject is given in the“Probability” Presentation

7

SOLO

From those definition we can prove the following:( ) 0=/OP(1’)

Proof: OOSandOSS /=/∩/∪=( )

( ) ( ) ( ) ( ) 0PrPrPrPr3

=/⇒/+=⇒ OOSS

( ) ( )APAP −= 1(2’)

Proof: OAAandAAS /=∩∪= ( )( ) ( )

( ) ( ) ( ) ( )AAAAS Pr1PrPrPr1Pr32

−=⇒+==⇒

( ) 1Pr0 ≤≤ A(3’)

Proof: ( ) ( )( )

( )( ) 1Pr0Pr1Pr

1'2

≤⇒≥−= AAA( )

( )APr01

≤

( ) 0Pr ≥A(1) ( ) 1Pr =S(2) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21


( ) ( )AABAIf PrPr ≤⇒⊂(4’)

Proof: ( )( )

( ) ( ) ( ) ( )BAAABB PrPr0PrPrPr00

3

≤⇒≥+−=≥≥

( ) ( ) OAABandAABB /=∩−∪−=( ) ( ) ( ) ( )BABABA ∩−+=∪ PrPrPrPr(5’)

Proof: ( ) ( )( ) ( ) ( ) ( ) OABBAandABBAB

OABAandABABA

/=−∩∩−∪∩=/=−∩−∪=∪

( )( )

( ) ( )( )

( )( ) ( )

( ) ( ) ( ) ( )BABABAABBAB

ABABA∩−+=∪⇒

−+∩=

−+=∪PrPrPrPr

PrPrPr

PrPrPr3

3

Table of Content


8

SOLO

Conditional Probability

S nAAAA ααα ∪∪∪= 21

1αA


1αβA

mAAAB βββ ∪∪∪= 212αA

2αβA 1βA 2βA

Given two events A and B decomposed in elementary events

jiOAAandAAAAA ji

n

iin ≠∀/=∩=∪∪∪=

=αααααα

121

lkOAAandAAAAB lk

m

kkm ≠∀/=∩=∪∪∪=

=ββββββ

121

jiOAAandAAABA jir ≠∀/=∩∪∪∪=∩ αβαβαβαβαβ 21

( ) ( ) ( ) ( )nAAAA ααα PrPrPrPr 21 +++= ( ) ( ) ( ) ( )mAAAB βββ PrPrPrPr 21 +++=

( ) ( ) ( ) ( ) nmrAAABA r ,PrPrPrPr 21 ≤+++=∩ βαβαβα

We want to find the probability of A event under the condition that the event B had occurred designed as P (A|B)

( ) ( ) ( ) ( )( ) ( ) ( )

( )( )B

BA

AAA

AAABA

m

r

Pr

Pr

PrPrPr

PrPrPr|Pr

21

21 ∩=++++++

=βββ

βαβαβα


9

SOLO

Conditional Probability S nAAAA ααα ∪∪∪= 21

1αA


1αβA

mAAAB βββ ∪∪∪= 212αA

2αβA 1βA 2βA

If the events A and B are statistical independent, that the fact that B occurred will not affect the probability of A to occur.

( ) ( )( )B

BABA

Pr

Pr|Pr

∩= ( ) ( )( )A

BAAB

Pr

Pr|Pr

∩=

( ) ( )ABA Pr|Pr = ( ) ( ) ( ) ( ) ( ) ( ) ( )BAAABBBABA PrPrPr|PrPr|PrPr ⋅=⋅=⋅=∩

Definition:

n events Ai i = 1,2,…n are statistical independent if:

( ) nrAAr

ii

r

ii ,,2PrPr

11

=∀=

∏==

Table of Content


10

SOLO

Conditional Probability - Bayes Formula

Using the relation:

( ) ( ) ( ) ( ) ( )llll AABBBABA ββββ Pr|PrPr|PrPr ⋅=⋅=∩

( ) ( ) ( ) klOBABABAB lk

m

kk ,

1

∀/=∩∩∩∩==

βββ

( ) ( )∑=

∩=m

kk BAB

1

PrPr β

we obtain:

( ) ( ) ( )( )

( ) ( )( ) ( )∑

=

⋅

⋅=

⋅=

m

kkk

lllll

AAB

AAB

B

AABBA

1

Pr|Pr

Pr|Pr

Pr

Pr|Pr|Pr

ββ

βββββ

Bayes Formula

Thomas Bayes 1702 - 1761

Table of Content


11

SOLO

Total Probability Theorem

Table of Content

jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If

we say that the set space S is decomposed in exhaustive andincompatible (exclusive) sets.

The Total Probability Theorem states that for any event B,its probability can be decomposed in terms of conditionalprobability as follows:

( ) ( ) ( ) ( )∑∑==

==n

ii

n

ii BPBABAB

11

|Pr,PrPr

Using the relation:

( ) ( ) ( ) ( ) ( )llll AABBBABA Pr|PrPr|PrPr ⋅=⋅=∩

( ) ( ) ( ) klOBABABAB lk

n

kk ,

1

∀/=∩∩∩∩==

( ) ( )∑=

∩=n

kk BAB

1

PrPr

For any event B

we obtain:


12

SOLO

Statistical Independent Events

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )∏∑∏∑∏∑

∑∑∑

=

−

≠≠=

≠=

=

=

−

≠≠

≠

==

−+−+−=

−+−+−=

n

ii

n

n

kjikji i

i

n

jiji i

i

n

ii

tIndependenlStatisticaA

n

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

ii

AAAA

AAAAAAAA

i

1

13

,.

3

1

2

.

2

1

1

1

1

13

,.

2

.

1

11

Pr1PrPrPr

Pr1PrPrPrPr

From Theorem of Addition

Therefore

( )[ ]∏==

−=

−n

ii


n

ii AA

i

11

Pr1Pr1 ( )[ ]∏==

−−=

n

ii


n

ii AA

i

11

Pr11Pr

Since OAASAAn

ii

n

ii

n

ii

n

ii /=

=

====

1111

&

=

−==

n

ii

n

ii AA

11

PrPr1

( )∏==

=

n

ii


n

ii AA

i

11

PrPr If the n events Ai i = 1,2,…n are statistical independent than are also statistical independent iA

( )∏=

=n

iiA

1

Pr

==

n

ii

MorganDe

A1

Pr ( )[ ]∏=

−=n

ii


A

i

1

Pr1

( ) nrAAr

ii

r

ii ,,2PrPr

11

=∀=

∏==

Table of Content


13

SOLO Review of Probability

Expected Value or Mathematical Expectation

Given a Probability Density Function p (x) we define the Expected Value

For a Continuous Random Variable: ( ) ( )∫+∞

∞−

= dxxpxxE X:

For a Discrete Random Variable: ( ) ( )∑=k

kXk xpxxE :

For a general function g (x) of the Random Variable x: ( )[ ] ( ) ( )∫

+∞

∞−

= dxxpxgxgE X:

( )xp

x

0 ∞+∞−

0.1

( )xE

( )( )

( )∫

∫∞+

∞−

+∞

∞−=dxxp

dxxpxxE

X

X

:

The Expected Value is the center of surface enclosed between the Probability Density Function and x axis.

Table of Content

14


Variance

Given a Probability Density Functions p (x) we define the Variance

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) 22222 2: xExExExExxExExExVar −=+−=−=

Central Moment

( ) kk xEx =:'µ

Given a Probability Density Functions p (x) we define the Central Moment of order k about the origin

( ) ( )[ ] ( ) ( )∑=

−−−

=−=

k

j

jkj

jkkk xE

j

kxExEx

0

'1: µµ

Given a Probability Density Functions p (x) we define the Central Moment of order k about the Mean E (x)

Table of Content

15


Moments

Normal Distribution ( ) ( ) ( )[ ]σπ

σσ2

2/exp;

22xxpX

−=

[ ] ( ) −⋅

=oddnfor

evennfornxE

nn

0

131 σ

[ ]( )

+=

=−⋅=

+ 12!22

2131

12 knfork

knforn

xEkk

n

n

σπ

σ

Proof:

Start from: and differentiate k time with respect to a( ) 0exp 2 >=−∫∞

∞−

aa

dxxaπ

Substitute a = 1/(2σ2) to obtain E [xn]

( ) ( )0

2

1231exp

1222 >−⋅=− +

∞

∞−∫ a

a

kdxxax

kkk π

[ ] ( ) ( )[ ] ( ) ( )[ ]( ) ( ) 12

!

0

122/

0

222221212

!22

exp2

22

2/exp2

22/exp

2

1

2

+∞+=

∞∞

∞−

++

=−=

−=−=

∫

∫∫

kk

k

k

kxy

kkk

kdyyy

xdxxxdxxxxE

σπσ

σπ

σσπ

σσπ

σ

Now let compute:

[ ] [ ]( )2244 33 xExE == σ

Chi-square

16


Functions of one Random Variable

Let y = g (x) a given function of the random variable x defined o the domain Ω, withprobability distribution pX (x). We want to find pY (y).

Fundamental Theorem

Assume x1, x2, …, xn all the solutions of the equation( ) ( ) ( )nxgxgxgy ==== 21

( ) ( )( )

( )( )

( )( )n

nXXXY xg

xp

xg

xp

xg

xpyp

''' 2

2

1

1 +++=

( ) ( )xd

xgdxg =:'

Proof

( ) ( ) ( ) ( ) ( )( )∑∑∑

===

==±≤≤=+≤≤=n

i i

iXn

iiiX

n

iiiiY yd

xg

xpxdxpxdxxxydyYyydyp

111 'PrPr:

q.e.d.

17


Functions of one Random Variable (continue – 1)

Example 1

bxay += ( )

−=

a

byp

ayp XY

1

Example 2

x

ay = ( )

=

y

ap

y

ayp XY 2

Example 32xay = ( ) ( )yU

a

yp

a

yp

yayp XXY

−+

=

2

1

Example 4

xy = ( ) ( ) ( )[ ] ( )yUypypyp XXY −+=

Table of Content

18


Characteristic Function and Moment-Generating Function

Given a Probability Density Functions pX (x) we define the Characteristic Function or Moment Generating Function

( ) ( )[ ]( ) ( ) ( ) ( )

( ) ( )

===Φ

∑

∫∫+∞

∞−

+∞

∞−

xX

XX

X

discretexxpxj

continuousxxPdxjdxxpxjxjE

ω

ωωωω

exp

expexpexp:

This is in fact the complex conjugate of the Fourier Transfer of the Probability Density Function. This function is always defined since the sufficient condition of the existence of a Fourier Transfer :

Given the Characteristic Function we can find the Probability Density Functions pX (x) using the Inverse Fourier Transfer:

( )( )

( ) ∞<== ∫∫+∞

∞−

≥+∞

∞−

10

dxxpdxxp X

xp

X

( ) ( ) ( )∫+∞

∞−

Φ−= ωωωπ

dxjxp XX exp2

1

is always fulfilled.

19


Properties of Moment-Generating Function

( ) ( ) ( )∫+∞

∞−

=Φdxxpxxjj

d

dX

X ωω

ωexp

( ) ( ) 10

==Φ ∫+∞

∞−=

dxxpXX ωω

( ) ( ) ( )xEjdxxpxjd

dX

X ==Φ∫

+∞

∞−=0ωωω

( ) ( ) ( ) ( )∫+∞

∞−

=Φdxxpxxjj

d

dX

X 22

2

2

exp ωω

ω ( ) ( ) ( ) ( ) ( )2222

0

2

2

xEjdxxpxjd

dX

X ==Φ∫

+∞

∞−=ωωω

( ) ( ) ( ) ( )∫+∞

∞−

=Φdxxpxxjj

d

dX

nn

n

X

n

ωω

ωexp

( ) ( ) ( ) ( ) ( )nnX

nn

nX

n

xEjdxxpxjd

d ==Φ∫

+∞

∞−=0ωωω

( ) ( ) ( )∫+∞

∞−

=Φ dxxpxj XX ωω exp

This is the reason why ΦX (ω) is also called the Moment-Generation Function.

20


Properties of Moment-Generating Function

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

+++++=

+Φ++Φ+Φ+Φ=Φ===

=

nn

nn

Xn

XXXX

xEn

jxE

jxE

j

d

d

nd

d

d

d

!!2!11

!

1

!2

1

22

0

2

0

2

2

00

ωωω

ωω

ωωω

ωωω

ωωωωωω

ω

Develop ΦX (ω) in a Taylor series

( ) ( ) ( )∫+∞

∞−

=Φ dxxpxj XX ωω exp

21



(2) Poisson’s Distribution ( ) ( )00 exp!

, kk

knkp

k

−≈

(1) Binomial (Bernoulli) ( ) ( ) ( ) ( ) knkknk ppk

npp

knk

nnkp −− −

=−

−= 11

!!

!,

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k

( )nkP ,

(3) Normal (Gaussian) ( ) ( ) ( )[ ]σπ

σµσµ2

2/exp,;

22−−= xxp

(4) Laplacian Distribution ( )

−−=

b

x

bbxp

µµ exp

2

1,;

22



(5) Gama Distribution ( )( )( )

<

≥Γ

−=

−

00

0/exp

,;1

x

xxk

x

kxpk

kθθ

θ

(6) Beta Distribution( ) ( )

( )( )

( ) ( ) ( ) 11

1

0

11

11

11

1,; −−

−−

−−

−ΓΓ+Γ=

−

−=∫

βα

βα

βα

βαβαβα xx

duuu

xxxp

(7) Cauchy Distribution ( ) ( )

+−

=22

0

0

1,;

γγ

πγ

xxxxp

23



SOLO

(8) Exponential Distribution

( ) ( )

<≥−

=00

0exp;

x

xxxp

λλλ

(9) Chi-square Distribution

( )( )

( ) ( )

<

≥−Γ=

−

00

02/exp2/

2/1;

12/

2/

x

xxxkkxp

k

k

Γ is the gamma function ( ) ( )∫∞

− −=Γ0

1 exp dttta a

(10) Student’s t-Distribution

( ) ( )[ ]( ) ( ) ( ) 2/12 /12/

2/1; ++Γ

+Γ= νννπννν

xxp

24



SOLO

(11) Uniform Distribution (Continuous)

( )

>>

≤≤−=

bxxa

bxaabbaxp

0

1,;

(12) Rayleigh Distribution

( )2

2

2

2exp

;σ

σσ

−

=

xx

xp

(13) Rice Distribution

( )

+−=

202

2

22

2exp

,;σσ

σσ vx

I

vxx

vxp

25



(14) Weibull Distribution

SOLO

( )

<

>≥

−−

−

=

−

00

0,,exp,,;

1

x

xxx

xpαγµ

αµ

αµ

αγ

αµγ

γγ

Table of Content

26


Normal (Gaussian) Distribution

Karl Friederich Gauss1777-1855

( )

( )

( )σµσπσµ

σµ ,;:2

2exp

,;2

2

x

x

xp N=

−−=

( ) ( )∫∞−

−−=x

duu

xP2

2

2exp

2

1,;

σµ

σπσµ

( ) µ=xE

( ) σ=xVar

( ) ( )[ ]( ) ( )

−=

−−=

=Φ

∫∞+

∞−

2exp

exp2

exp2

1

exp

22

2

2

σωµω

ωσµ

σπ

ωω

j

duuju

xjE

Probability Density Functions

Cumulative Distribution Function

Mean Value

Variance

Moment Generating Function

27


Moments

Normal Distribution ( ) ( ) ( )[ ] ( )σσπ

σσ ,0;:2

2/exp,0;

22

xx

xpX N=−=

[ ] ( ) −⋅

=oddnfor

evennfornxE

nn

0

131 σ

[ ]( )

+=

=−⋅=

+ 12!22

2131

12 knfork

knforn

xEkk

n

n

σπ

σ

Proof:

Start from: and differentiate k time with respect to a( ) 0exp 2 >=−∫∞

∞−

aa

dxxaπ

Substitute a = 1/(2σ2) to obtain E [xn]

( ) ( )0

2

1231exp

1222 >−⋅=− +

∞

∞−∫ a

a

kdxxax

kkk π

[ ] ( ) ( )[ ] ( ) ( )[ ]( ) ( ) 12

!

0

122/

0

222221212

!22

exp2

22

2/exp2

22/exp

2

1

2

+∞+=

∞∞

∞−

++

=−=

−=−=

∫

∫∫

kk

k

k

kxy

kkk

kdyyy

xdxxxdxxxxE

σπσ

σπ

σσπ

σσπ

σ

Now let compute:

[ ] [ ]( )2244 33 xExE == σ

Chi-square

28


Normal (Gaussian) Distribution (continue – 1)

Karl Friederich Gauss1777-1855

( ) ( ) ( ) ( )PxxxxPxxPPxxpT

,;:2

1exp2,; 12/1

N=

−−−= −−π

A Vector – Valued Gaussian Random Variable has theProbability Density Functions

where

xEx

= Mean Value

( ) ( ) TxxxxEP −−= Covariance Matrix

If P is diagonal P = diag [σ12σ2

2 … σk2] then the components of the random vector

are uncorrelated, andx

( )

( ) ( ) ( ) ( )

∏=

−

−

−−=

−−

−−

−−=

−

−−

−

−−

−=

k

i i

i

ii

k

k

kk

kkk

T

kk

xxxxxxxx

xx

xx

xx

xx

xx

xx

PPxxp

1

2

2

2

2

2

22

222

1

21

211

22

11

1

2

22

21

22

11

2/1

2

2exp

2

2exp

2

2exp

2

2exp

0

0

2

1exp2,;

σπσ

σπσ

σπσ

σπσ

σ

σ

σ

π

therefore the components of the random vector are also independent

29


The Laws of Large Numbers

The Law of Large Numbers is a fundamental concept in statistics and probability thatdescribes how the average of randomly selected sample of a large population is likelyto be close to the average of the whole population. There are two laws of large numbersthe Weak Law and the Strong Law.

The Weak Law of Large Numbers

The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequenceof random variables that have the same expected value μ and variance σ2, and areuncorrelated (i.e., the correlation between any two of them is zero), then

( ) nXXX nn /: 1 ++=

converges in probability (a weak convergence sense) to μ . We have

∞→=<− nforX n 1Pr εµconverges in probability

The Strong Law of Large Numbers The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequenceof random variables that have the same expected value μ and variance σ2, and areuncorrelated (i.e., the correlation between any two of them is zero), and E (|Xi|) < ∞then ,i.e. converges almost surely to μ. ∞→== nforX n 1Pr µ

converges almost surely

3030


The Law of Large Numbers

Differences between the Weak Law and the Strong Law

The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ. Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite number of times, although it happens at infrequent intervals.

The Strong Law shows that this almost surely will not occur. In particular, it implies that with probability 1, we have for any positive value ε, the inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to an infinite, but infrequent, number of times).

Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.

3131


The Law of Large Numbers

Proof of the Weak Law of Large Numbers

( ) iXE i ∀= µ ( ) iXVar i ∀= 2σ ( ) ( )[ ] jiXXE ji ≠∀=−− 0µµ

( ) ( ) ( )[ ] µµ ==++= nnnXEXEXE nn //1

( ) ( )[ ] ( ) ( )

( ) ( )[ ] ( )[ ] ( )[ ]nn

n

n

XEXE

n

XXE

n

XXEXEXEXVar

njiXXE

nnnnn

ji 2

2

2

2

221

0

2

1

2

12

σσµµ

µµµ

µµ

==−++−=

−++−=

−++=−=

≠∀=−−

Given

we have:

Using Chebyshev’s inequality on we obtain:nX ( )2

2 /Pr

εσεµ n

X n ≤≥−Using this equation we obtain:

( ) ( ) ( )n

XXX nnn 2

2

1Pr1Pr1Prεσεµεµεµ −≥≥−−≥>−−=≤−

As n approaches infinity, the expression approaches 1.

Chebyshev’sinequality

q.e.d.

Monte CarloIntegration


Table of Content

3232


Central Limit Theorem

The first version of this theorem was first postulated by the French-born English mathematician Abraham de Moivre in1733, using the normal distribution to approximate thedistribution of the number of heads resulting from many tossesof a fair coin. This was published in1756 in “The Doctrine of Chance” 3th Ed.

Pierre-Simon Laplace(1749-1827)

Abraham de Moivre(1667-1754)

This finding was forgotten until 1812 when the French mathematician Pierre-Simon Laplace recovered it in his work “Théory Analytique des Probabilités”, in which he approximate the binomial distribution with the normal distribution. This is known as the De Moivre – Laplace Theorem.

De Moivre – Laplace Theorem

The present form of the Central Limit Theorem was given by theRussian mathematician Alexandr Lyapunov in 1901.

Alexandr MikhailovichLyapunov

(1857-1918)

3333


Central Limit Theorem (continue – 1)

Let X1, X2, …, Xm be a sequence of independent random variables with the sameprobability distribution function pX (x). Define the statistical mean:

m

XXXX m

m

+++=

21

( ) ( ) ( ) ( ) µ=+++

=m

XEXEXEXE m

m

21

( ) ( )[ ] ( ) ( ) ( )mm

m

m

XXXEXEXEXVar m

mmmX m

2

2

22

2122 σσµµµσ ==

−++−+−

=−==

Define also the new random variable

( ) ( ) ( ) ( )m

XXXXEXY m

X

mm

mσ

µµµσ

−++−+−=−= 21:

We have:

The probability distribution of Y tends to become gaussian (normal) as m tends to infinity, regardless of the probability distribution of the random variable, as long as the mean μ and the variance σ2 are finite.

3434



( ) ( ) ( ) ( )m

XXXXEXY m

X

mm

mσ

µµµσ

−++−+−=−= 21:

Proof

The Characteristic Function

( ) ( )[ ] ( ) ( ) ( )

( ) ( )( )

m

X

m

im

i

i

mY

m

X

m

jE

m

XjE

m

XXXjEYjE

i

Φ=

−=

−=

−++−+−==Φ

−=

∏ ωσ

µωσ

µω

σµµµωωω

σµexpexp

expexp

1

21

( )( ) ( ) ( ) ( ) ( ) ( )

0/lim2

1

!3

/

!2

/

!1

/1

2222

33

1

22

0

=

Ο/

Ο/+−=

+

−+

−+

−+=

Φ

∞→

−

mmmm

XE

mjXE

mjXE

mj

m

m

iiiX i

ωωωω

σµω

σµω

σµωω

σµ

Develop in a Taylor series( )

Φ −

miX

ω

σµ

35



Proof (continue – 1)

The Characteristic Function ( ) ( )

m

XYm

Ei

Φ=Φ −

ωωσ

µ

( ) 0/lim2

12222

=

Ο/

Ο/+−=

Φ

∞→− mmmmm mX i

ωωωωωσ

µ

( ) ( )2/exp2

1 222

ωωωω −→

Ο/+−=Φ

∞→mm

Y mm

Therefore

( ) ( ) ( ) ( ) ( )2/exp2

12/exp

2

1exp

2

1 22 ydyjdyjypm

YY −=−−→Φ−= ∫∫+ ∞

∞−

∞→+ ∞

∞− πωωω

πωωω

π

The probability distribution of Y tends to become gaussian (normal) as m tends to infinity(Convergence in Distribution).

Characteristic Functionof Normal Distribution

ConvergenceConcepts


Table of Content

36



( ) ( ) ( ) ( )m

XXXXEXY m

X

mm

mσ

µµµσ

−++−+−=−= 21:

Proof

The Characteristic Function

( ) ( )[ ] ( ) ( ) ( )

( ) ( )( )

m

X

m

im

i

i

mY

m

X

m

jE

m

XjE

m

XXXjEYjE

i

Φ=

−=

−=

−++−+−==Φ

−=

∏ ωσ

µωσ

µω

σµµµωωω

σµexpexp

expexp

1

21

( )( ) ( ) ( ) ( ) ( ) ( )

0/lim2

1

!3

/

!2

/

!1

/1

2222

33

1

22

0

=

Ο/

Ο/+−=

+

−+

−+

−+=

Φ

∞→

−

mmmm

XE

mjXE

mjXE

mj

m

m

iiiX i

ωωωω

σµω

σµω

σµωω

σµ

Develop in a Taylor series( )

Φ −

miX

ω

σµ

37



Proof (continue – 1)

The Characteristic Function ( ) ( )

m

XYm

Ei

Φ=Φ −

ωωσ

µ

( ) 0/lim2

12222

=

Ο/

Ο/+−=

Φ

∞→− mmmmm mX i

ωωωωωσ

µ

( ) ( )2/exp2

1 222

ωωωω −→

Ο/+−=Φ

∞→mm

Y mm

Therefore

( ) ( ) ( ) ( ) ( )2/exp2

12/exp

2

1exp

2

1 22 ydyjdyjypm

YY −=−−→Φ−= ∫∫+ ∞

∞−

∞→+ ∞

∞− πωωω

πωωω

π

The probability distribution of Y tends to become gaussian (normal) as m tends to infinity(Convergence in Distribution).

Characteristic Functionof Normal Distribution

ConvergenceConcepts

Table of Content

38


Existence Theorems

Existence Theorem 1

Given a function G (x) such that

( ) ( ) ( ) 1lim,1,0 ==∞+=∞−∞→

xGGGx

( ) ( ) 2121 0 xxifxGxG <=≤ ( G (x) is monotonic non-decreasing)

( ) ( ) ( )xGxGxG n

xxxx

n

n

==≥→+ lim

We can find an experiment X and a random variable x, defined on X, such thatits distribution function P (x) equals the given function G (x).

Proof of Existence Theorem 1

Assume that the outcome of the experiment X is any real number -∞ <x < +∞. We consider as events all intervals, the intersection or union of intervals on thereal axis.

5x1x 2x 3x 4x 6x 7x 8x

∞− ∞+To specify the probability of those events we define P (x)=Prob x ≤ x1= G (x1).From our definition of G (x) it follows that P (x) is a distribution function.

Existence Theorem 2 Existence Theorem 3

39


Existence Theorems

Existence Theorem 2

If a function F (x,y) is such that

( ) ( ) ( )( ) ( ) ( ) ( ) 0,,,,

1,,0,,

11122122 ≥+−−=+∞∞+=−∞=∞−yxFyxFyxFyxF

FxFyF

for every x1 < x2, y1 < y2, then two random variables x and y can be found such thatF (x,y) is their joint distribution function.


Assume that the outcome of the experiment X is any real number -∞ <x < +∞.Assume that the outcome of the experiment Y is any real number -∞ <y < +∞. We consider as events all intervals, the intersection or union of intervals on thereal axes x and y.

To specify the probability of those events we define P (x,y)=Prob x ≤ x1, y ≤ y1, = F (x1,y1).From our definition of F (x,y) it follows that P (x,y) is a joint distribution function.

The proof is similar to that in the Existence Theorem 1

40

SOLO Review of ProbabilityMonte Carlo Method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used when simulating physical and mathematical systems. Because of their reliance on repeated computation and random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm.

The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam, Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money to gamble)

Stanislaw Ulam1909 - 1984

Enrico - Fermi1901 - 1954

John von Neumann1903 - 1957

Monte Carlo Casino

Nicholas Constantine Metropolis

(1915 –1999)

http://en.wikipedia.org/wiki/Image:Stanislaw_Ulam.jpg

http://en.wikipedia.org/wiki/Image:Enrico_Fermi_1943-49.jpg

http://en.wikipedia.org/wiki/Image:JohnvonNeumann-LosAlamos.gif

41


Monte Carlo Approximation

Monte Carlo runs, generate a set of random samples that approximate the distribution p (x). So, with P samples, expectations with respect to the filtering distribution are approximated by

( ) ( ) ( )( )∑∫=

≈P

L

LxfP

dxxpxf1

1

and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some degree of approximation.

( ) ( )∑∫=

≈==P

L

LxP

dxxpxxE1

1

1µ

( ) ( ) ( ) ( )( )∑∫=

−≈−=−=P

L

nLnnn x

PdxxpxxE

1111

1 µµµµ

Table of Content

x(L) are generated (draw) samples from distribution p (x)( ) ( )xpx L ~

42

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (Unknown Statistics)

jimxExE ji ,∀==

DefineEstimation of thePopulation mean

∑=

=k

iik x

km

1

1:ˆ

A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,and sample variance, , as estimates of the population mean, m, and variance, σ2.

2ˆkσkm

( )

( ) ( ) ( )[ ] ( ) ( )[ ]2

1

2

1

2222

22222

1 112

1

2

2

11

2

1

2

111

1

11

121

112

1

ˆˆ21

ˆ1

σσ

σσσ

k

k

kk

mkmkkk

mmkk

mk

xxk

Exk

xExEk

mxmxEk

mxk

E

k

i

k

i

k

i

k

ll

k

jj

k

jjii

k

k

iik

k

ii

k

iki

−=

−=

++−+++−−+=

+

−=

+−=

−

∑

∑

∑ ∑∑∑

∑∑∑

=

=

= ===

===

jimxExE ji ,2222 ∀+== σ

mxEk

mEk

iik == ∑

=1

1ˆ

jimxExExxE ji

tindependenxx

ji

ji

,2,

∀==

Compute

Biased

Unbiased

Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.

43

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 1)

jimxExE ji ,∀==

DefineEstimation of thePopulation mean

∑=

=k

iik x

km

1

1:ˆ


2ˆkσkm

( ) 2

1

2 1ˆ

1 σk

kmx

kE

k

iki

−=

−∑

=

jimxExE ji ,2222 ∀+== σ

mxEk

mEk

iik == ∑

=1

1ˆ

jimxExExxE ji

tindependenxx

ji

ji

,2,

∀==

Biased

Unbiased

Therefore, the unbiased estimation of the sample variance of the population is defined as:

( )∑=

−−

=k

ikik mx

k 1

22 ˆ1

1:σ since ( ) 2

1

22 ˆ1

1:ˆ σσ =

−

−= ∑

=

k

ikik mx

kEE

Unbiased


44



2ˆkσkm

mxEk

mEk

iik == ∑

=1

1ˆ

( ) 2

1

22 ˆ1

1:ˆ σσ =

−

−= ∑

=

k

ikik mx

kEE


45


mxEk

mEk

iik == ∑

=1

1ˆ ( ) 2

1

22 ˆ1

1:ˆ σσ =

−

−= ∑

=

k

ikik mx

kEE

We found:

Let Compute:

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) k

mxEmxEmxEk

mxmxEmxEk

mxk

Emxk

EmmE

k

i

k

ijj

ji

k

ii

k

i

k

ijj

ji

k

ii

k

ii

k

iikmk

2

1 100

1

2

2

1 11

2

2

2

1

2

1

22ˆ

2

1

1

11ˆ:

σ

σ

σ

=

−−+−=

−−+−=

−=

−=−=

∑ ∑∑

∑∑∑

∑∑

=≠==

=≠==

==

( ) k

mmE kmk

222

ˆ ˆ:σσ =−=

46


Let Compute:

( ) ( ) ( )

( ) ( ) ( ) ( )[ ]

( ) ( ) ( ) ( )

−−

−+−

−−+−

−=

−−+−−+−

−=

−−+−

−=

−−

−=−=

∑∑

∑

∑∑

==

=

==

2

22

11

2

2

2

1

22

2

2

1

22

2

1

22222

ˆ

ˆ11

ˆ2

1

1

ˆˆ21

1

ˆ1

1ˆ

1

1ˆ:2

σ

σ

σσσσσσ

k

k

ii

kk

ii

k

ikkii

k

iki

k

ikik

mmk

kmx

k

mmmx

kE

mmmmmxmxk

E

mmmxk

Emxk

EEk

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

k

k

k

ii

kk

ii

k

k

k

ii

k

k

ii

k

kk

ii

k

k

k

k

ii

k

kk

i

k

ijj

ji

k

k

ii

mmEk

kmxE

k

mmEmxE

k

mmEk

mxEk

mxEk

mmEkmxE

k

mmE

mmEk

kmxE

k

mmEmxEmxEmxE

kk

/

22

10

2

0

10

2

3

1

22

1

2

2

/

2

1

3

2

0

44

2

2

1

2

2

/

2

1 1

22

1

4

2

2

ˆ

2

222

22

22

4

2

ˆ1

2

1

ˆ4

1

ˆ4

1

2

1

ˆ2

1

ˆ4

ˆ11

ˆ4

1

1

σ

σσσ

σσ

σσµ

σ

σσ

σ

σσ

−−

−−−

−−−−

−+

−−

−−−

−+−−

−+

+−−

+−−−+

−−+−−

≈

∑∑

∑∑∑

∑∑ ∑∑

==

===

==≠==

Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−

47


Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−

( )( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) 4

2

24

224

44

2

4

44

2

2

2

4

2

4

242

ˆ

ˆ11

7

11

2

1

2

1

2

ˆ11

4

1

1

12

k

k

mmEk

k

k

k

k

k

kk

k

k

k

mmEk

k

kk

kk

k

kk

−−

+−+−+

−=

−−

−−

−+

+−−

+−

+−−+

−≈

σµσσσ

σσσµσσ

kk

442

ˆ 2

σµσσ

−≈ ( ) 44 : mxE i −=µ

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

k

k

k

ii

kk

ii

k

k

k

ii

k

k

ii

k

kk

ii

k

k

k

k

ii

k

kk

i

k

ijj

ji

k

k

ii

mmEk

kmxE

k

mmEmxE

k

mmEk

mxEk

mxEk

mmEkmxE

k

mmE

mmEk

kmxE

k

mmEmxEmxEmxE

kk

/

22

10

2

0

10

2

3

1

22

1

2

2

/

2

1

3

2

0

44

2

2

1

2

2

/

2

1 1

22

1

4

2

2

ˆ

2

222

22

22

4

2

ˆ1

2

1

ˆ4

1

ˆ4

1

2

1

ˆ2

1

ˆ4

ˆ11

ˆ4

1

1

σ

σσσ

σσ

σσµ

σ

σσ

σ

σσ

−−

−−−

−−−−

−+

−−

−−−

−+−−

−+

+−−

+−−−+

−−+−−

≈

∑∑

∑∑∑

∑∑ ∑∑

==

===

==≠==

48


mxEk

mEk

iik == ∑

=1

1ˆ

( ) 2

1

22 ˆ1

1:ˆ σσ =

−

−= ∑

=

k

ikik mx

kEE

We found:

( ) k

mmE kmk

222

ˆ ˆ:σσ =−=

( ) ( )k

mxk

EEk

ikik

k

44

2

2

1

22222

ˆˆ

1

1ˆ:2

σµσσσσσ

−≈

−−

−=−= ∑

=

( ) 44 : mxE i −=µ

Kurtosis of random variable xiDefine

44:

σµλ =

( ) ( ) ( )k

mxk

EEk

ikik

k

42

2

1

22222

ˆ

1ˆ

1

1ˆ:2

σλσσσσσ

−≈

−−

−=−= ∑

=

49


[ ] ϕσσσ σσ =≤≤ 2ˆ

2k

2

kˆ-0Prob n

For high values of k, according to the Central Limit Theorem the estimations of mean and of variance are approximately Gaussian Random Variables.

km2ˆkσ

We want to find a region around that will contain σ2 with a predefined probabilityφ as function of the number of iterations k.

2ˆkσ

Since are approximately Gaussian Random Variables nσ is given by solving:

2ˆkσ

ϕζζπ

σ

σ

=

−∫

+

−

n

n

d2

2

1exp

2

1

nσ φ

1.000 0.6827

1.645 0.9000

1.960 0.9500

2.576 0.9900

Cumulative Probability within nσ

Standard Deviation of the Mean for aGaussian Random Variable

22k

22 1ˆ-

1 σλσσσλσσ k

nk

n−≤≤−−

22k

2 11

ˆ-11 σλσσλ

σσ

−−≤≤

+−−

kn

kn

( ) ( ) ( ) ( )( )42222 1,0;ˆ~ˆ&,0;ˆ~ˆ σλσσσσ −−− kkkk kmmmk NN

50


[ ] ϕσσσ σσ =≤≤ 2ˆ

2k

2

kˆ-0Prob n

22k

22 1ˆ-

1 σλσσσλσσ k

nk

n−≤≤−−

22k

2 11

ˆ-11 σλσσλ

σσ

−−≤≤

+−−

kn

kn

22

ˆ

12

kσλσ

σ k

−=

22k

2 11ˆ

11 σλσσλ

σσ

−−≥≥

−+k

nk

n

−−≥≥

−+k

nk

n1

1

ˆ1

1

22

k

2

λσσ

λσ

σσ

kn

kn

11

:ˆ:1

1

k

−−

=≥≥=−+ λ

σσσσλ

σ

σσ

51


52


53


kn

kn kk 1ˆ

1

:&1ˆ

1

:

00−

−

=−

+

=λ

σσλ

σσ

σσ

Monte-Carlo Procedure

Choose the Confidence Level φ and find the corresponding nσ

using the normal (Gaussian) distribution.

nσ φ

1.000 0.6827

1.645 0.9000

1.960 0.9500

2.576 0.9900

1

Run a few sample k0 > 20 and estimate λ according to2

( )

( )2

1

2

0

1

4

0

0

0

0

0

0

ˆ1

ˆ1

:ˆ

−

−=

∑

∑

=

=

k

iki

k

iki

k

mxk

mxkλ∑

==

0

010

1:ˆ

k

iik x

km

3 Compute and as function of kσ σ

4 Find k for which

[ ] ϕσσσ σσ =≤≤ 2ˆ

2k

2

kˆ-0Prob n

5 Run k-k0 simulations

54

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue – 11)

Monte-Carlo Procedure

Choose the Confidence Level φ = 95% that gives the corresponding nσ=1.96.

nσ φ

1.000 0.6827

1.645 0.9000

1.960 0.9500

2.576 0.9900

1

The kurtosis λ = 32

3 Find k for which ϕσλσσ

σ

σ =

−≤≤

2kˆ

22k

2 1ˆ-0Prob

kn

4 Run k>800 simulations

Example:Assume a Gaussian distribution λ = 3

95.02

96.1ˆ-0Prob

2kˆ

22k

2 =

≤≤

σ

σσσk

Assume also that we require also that with probability φ = 95 % 22k

2 1.0ˆ- σσσ ≤

1.02

96.1 =k

800≈k

55



Pseudo-Random Number Generators

• First attempts to generate “random numbers”:- Draw balls out of a stirred urn- Roll dice

• 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from census reports.

• 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to generate random numbers. They published a table of 100,000 digits.

• 1946: J. Von Neumann proposed the “middle square method”.

• 1948: D.H. Lehmer introduced the “linear congruential method”.

• 1955: RAND Corporation published a table of 1,000,000 random digits obtainedfrom electronic noise.

• 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruentialgenerators.

• 1989: R.S. Wikramaratna proposed the additive congruential method.

56



Pseudo-Random Number Generators

A Random Number represents the value of a random variable uniform distributed on (0,1). Pseudo-Random Numbers constitute a sequence of values, which although are deterministically generated, have all the appearances of being independent uniform distributed on (0,1).One approach

1. Define x0 = integer initial condition or seed

2. Using integers a and m recursively compute

mxax nn modulo1−= mxIntegerxkmaxmkxa nnn <∈+⋅=− ,,,1

Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-randomnumber is an approximation to the value of uniform (0,1) random variable.

In general the integers a and m should be chose to satisfy three criteria:

1. For any initial seed, the resultant sequence has the “appearance” of being a sequence of independent (0,1) random variables.

2. For any initial seed, the number of variables that can be generated before repetitionbegins is large.

3. The values can be computed efficiently on a digital computer.

Multiplicative congruential method

Return toMonte Carlo Approximation

57



Pseudo-Random Number Generators (continue – 1)

A guideline is to choose m to be a large prime number compared to the computer word size.

Examples:

32 bits word computer: 807,16712 531 ==−= am

125,35312 535 ==−= am36 bits word computer:

Another generator of pseudo-random numbers uses recursions of the type:

( ) mcxax nn modulo1 += − mxIntegerxkmcaxmkcxa nnn <∈+⋅=+− ,,,,1

Mixed congruential method

58



Histograms

Return to Table of Content

A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories: it is a form of data binning. The categories are usually specified as non-overlapping intervals of some variable. The categories (bars) must be adjacent. The intervals (or bands, or bins) are generally of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram always equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as:

An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1.

Mathematical Definition

∑=

=k

iimn

1

In a more general mathematical sense, a histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions:

∑=

=i

jji mM

1

http://upload.wikimedia.org/wikipedia/commons/5/53/Cumulative_vs_normal_histogram.svg

59



The Inverse Transform Method

Suppose we want to generate a discrete random variable X having probability density function:

( ) 1,1,0)( ==−= ∑j

jjj pjxxpxp δ

To accomplish this, let generate a random number U that is uniformly distributedover (0,1) and set:

<≤

+<≤<

=

∑∑=

−

=

j

ii

j

iij pUpifx

ppUpifx

pUifx

X

1

1

1

1001

00

j

j

ii

j

iij ppUpPxXP =

<<== ∑∑=

−

= 1

1

1

)(

Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed P (a ≤ U < b) = b-a, we have:

and so X has the desired distribution.

60



The Inverse Transform Method (continue – 1)

Suppose we want to generate a discrete random variable X having probability density function: ( ) 1,1,0)( ==−= ∑

jjjj pjxxpxp δ

Draw X, N times, from p (x)

Histogram of theResults

61




Generating a Poisson Random Variable: 1,1,0!

)( ===== ∑−

ii

i

i pii

eiXPp λλ

( )1

!

!1

1

1

+=+=

−

+−

+

ii

e

ie

p

pi

i

i

i λλ

λ

λ

λ

Draw X , N times, from Poisson Distribution

Histogram of the Results

62




Generating Binominal Random Variable:

( ) ( ) 1,1,01!!

!)( ==−

−=== ∑−

ii

inii pipp

ini

niXPp

( ) ( ) ( )

( ) ( ) p

p

i

in

ppini

n

ppini

n

p

pini

ini

i

i

−+−=

−−

−−−+=

−

−−+

+

111!!

!

1!1!1

! 11

1

Return to Table of Content

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k

( )nkP ,

Histogram of the Results

63



The Accaptance-Rejection Technique

Suppose we have an efficient method for simulating a random variable having aprobability density function qj, j ≥0 . We want to use this to obtain a randomvariable that has the probability density function pj, j ≥0 .

Let c be a constant such that: 0.. ≠∀≤ jj

j qtsjcq

p

If such a c exists, it must satisfy: cqcpj

jj

j ≤⇒≤ ∑∑ 1

11

Rejection Method

Step 1: Simulate the value of Y, having probability density function qj.

Step 2: Generate a random number U (that is uniformly distributedover (0,1) ).Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.

64



The Acceptance-Rejection Technique (continue – 1)

Theorem

The random variable X obtained by the rejection method has probability densityfunction P X=i = pi.Proof

Acceptance

,

Acceptance

Acceptance,Acceptance

MethodAcceptance

MethodAcceptance

P

qc

pUiYP

P

iYPiYPiXP i

i

Bayes

≤=======

AcceptanceAcceptanceAcceptance

(0,1) ddistributeuniformlyU

ceindependenby

Pc

p

P

qc

pq

P

qc

pUPiYP

ii

ii

i

i

qi

==

≤==

Summing over all i, yields

Acceptance

1

1

Pc

piXP i

i

i

∑∑ ==

1Acceptance =Pc

ipiXP ==

11

Acceptance ≤=c

P

q.e.d.

65



The Acceptance-Rejection Technique (continue – 2)

Example

Generate a truncated Gaussian using theAccept-Reject method. Consider the case with

( ) [ ]

−∈≈

−

otherwise

xexp

x

0

4,42/2/2

π

Consider the Uniform proposal function

( ) [ ] −∈

≈otherwise

xxq

0

4,48/1

In Figure we can see the results of theAccept-Reject method using N=10,000 samples.

66


Generating Continuous Random Variables

The Inverse Transform Algorithm

Let U be a uniform (0,1) random variable. For any continuous distribution function F the random variable X defined by

( )UFX 1−=has distribution F. [ F-1(u) is defined to be that value of x such that F (x) = u ]

Proof

Let Px(x) denote the Probability Distribution Function X=F-1(U)

( ) ( ) xUFPxXPxPx ≤=≤= −1

Since F is a distribution function, it means that F (x) is a monotonic increasing function of x and so the inequality “a ≤ b” is equivalent to the inequality“F (a) ≤ F (b)”, therefore

( ) ( )[ ] ( ) ( )[ ]

( ) ( )

( )( )xFxFUP

xFUFFPxPuniformU

xF

UUFF

x

1,0

10

1

1

≤≤

=

−

=≤=

≤=−

67


Importance Sampling

Let Y = (Y1,…,Ym) a vector of random variables having a joint probability densityfunction f (y1,…,ym), and suppose that we are interested in estimating

( )[ ] ( ) ( )∫== mmmmf dydyyyfyyhYYhE 1111 ,,,,,,θ Suppose that a direct generation of the random vector Y so as to compute h (Y) is inefficient possible because (a) is difficult to generate the random vector Y, or

(b) the variance of h (Y) is large, or

(c) both of the above

Suppose that W=(W1,…,Wm) is another random vector, which takes values in thesame domain as Y, and has a joint density function g(w1,…,wm) that can be easily generated. The estimation θ can be expressed as:

( )[ ] ( ) ( )( ) ( ) ( ) ( )

( )

=== ∫ Wg

WfWhEdwdwwwg

wwg

wwfwwhYYhE gmm

m

mmmf

11

1

111 ,,

,,

,,,,,,θ

Therefore, we can estimate θ by generating values of random vector W, and thenusing as the estimator the resulting average of the values h (W) f (W)/ g (W).

Return to Particle Filters

68


Monte Carlo Integration

Monte Carlo Method can be used to numerically evaluate multidimensional integrals

( ) ( )∫∫ == xdxgdxdxxxgI mm 11 ,,

To use Monte Carlo we factorize ( ) ( ) ( )xpxfxg ⋅=

( ) ( ) 1&0 =≥ ∫ xdxpxp

in such a way that is interpreted as a Probability Density Function such that( )xp

We assume that we can draw NS samples from ( )xp( )Si Nix ,,1, =

( ) Si Nixpx ,,1~ =

Using Monte Carlo we can approximate ( ) ( )∑=

−≈SN

iS

i Nxxxp1

/δ

( ) ( ) ( ) ( )

( ) ( ) ( )∑∑∫

∫ ∑∫

==

=

=−⋅=

−⋅=≈⋅=

SS

S

S

N

i

i

S

N

i

i

S

N

iS

iN

xfN

xdxxxfN

xdNxxxfIxdxpxfI

11

1

11

/

δ

δ

69


Monte Carlo Integration

we draw NS samples from ( )xp( )Si Nix ,,1, =

( ) Si Nixpx ,,1~ =

( ) ( ) ( )∑∫=

=≈⋅=S

S

N

i

i

SN xf

NIxdxpxfI

1

1

If the samples are independent, then INS is an unbiased estimate of I.

ix

According to the Law of Large Numbers INS will almost surely converge to I:

IIsa

NN

SS

..

∞→→

( )[ ] ( ) ∞<−= ∫ xdxpIxff22 :σIf the variance of is finite; i.e.:( )xf

then the Central Limit Theorem holds and the estimation error converges indistribution to a Normal Distribution:

( ) ( )2,0~lim fNSN

IINS

S

σN−∞→

The error of the MC estimate, e = INS – I, is of the order of O (NS

-1/2), meaning

that the rate of convergence of the estimate is independent of the dimension ofthe integrand.

Numerical Integration of and ( )kk xzp |( )1| −kk xxp

Return to Particle Filters

70


Existence Theorems

Existence Theorem 3

Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)having S (ω) as its power spectrum or R (τ) as its autocorrelation.


Define ( ) ( ) ( ) ( ) ( )ωπ

ωπ

ωωωωπ

−=−=== ∫+∞

∞−

fa

S

a

SfdSa

222 :&

1:

Since , according to Existence Theorem 1,

we can find a random variable ω with the even density function f (ω), andprobability density function

( ) ( ) 1&0 =≥ ∫+∞

∞−

ωωω dff

( ) ( )∫∞−

=ω

ττω dfP :

We now form the process , where is a random variableuniform distributed in the interval (-π,+π) and independent of ω.

( ) ( )ϑω += tatx cos: ϑ

71


Existence Theorems

Existence Theorem 3

Proof of Existence Theorem 3 (continue – 1)

Since is uniform distributed in the interval (-π,+π) and independent of ω,its spectrum is

( ) ( ) ( ) ( ) ( ) 0sinsincoscos00

,

=−=ϑωϑω ϑωϑω

ϑωEtEaEtEatxE

tindependen

ϑ

( ) ( )ϖπ

ϖπϖπϖπ

ϑπ

ϖπϖπϖπ

π

ϑϖπ

π

ϑϖϑϖϑϑ

sin

2

1

2

1

2

1 =−====−+

−

+

−∫ j

ee

j

edeeES

jjjjj

or ( ) ( ) ( )ϖπ

ϖπϑϖϑϖ ϑϑϑϖ

ϑsin

sincos =+= EjEeE j

1=ϖ 1=ϖ

( ) ( ) ( ) ( )[ ]

( ) ( )[ ]

( ) ( )[ ] ( ) ( )[ ] ( ) 02022,

22

2

2sin2sin2

2cos2cos2

cos2

22cos2

cos2

coscos

ϑτωϑτωτω

ϑτωτω

ϑτωϑωτ

ϑωϑωω

ϑωEtE

aEtE

aE

a

tEa

Ea

ttEatxtxE

tindependen

+−++=

+++=

+++=+

2=ϖ 2=ϖ


72


Existence Theorems

Existence Theorem 3

Proof of Existence Theorem 3 (continue – 2)

( ) 0=txE

( ) ( ) ( ) ( ) ( ) ( )τωωτωτωτ ω xRdfa

Ea

txtxE ===+ ∫+∞

∞−

cos2

cos2

22

( ) ( )ϑω += tatx cos:We have

Because of those two properties x (t) is wide-sense stationary with a power spectrumgiven by:

( ) ( ) ( ) ( )[ ]( ) ( )

( ) ( )∫∫+∞

∞−

−=+∞

∞−

=−= ττωτττωτωτωττ

dRdjRS x

RR

xx

xx

cossincos

( ) ( ) ( ) ( )[ ]( ) ( )

( ) ( )∫∫+∞

∞−

−=+∞

∞−

=+= ωτωωπ

ωτωτωωπ

τωω

dSdjSR x

SS

xx

xx

cos2

1sincos

2

1

Therefore ( ) ( )ωπω faSx2=

q.e.d.

Fourier InverseFourier

( ) ( )∫+ ∞

∞−

= ωωτω dfa

cos2

2

f (ω) definition

( )ωS=


73

SOLO

Markov Processes

A Markov Process is defined by:

Andrei AndreevichMarkov

1856 - 1922

( ) ( )( ) ( ) ( )( ) 111 ,|,,,|, tttxtxptxtxp >∀ΩΩ=≤ΩΩ ττ

i.e. the Random Process, the past up to any time t1 is fully defined by the process at t1.

Examples of Markov Processes:

1. Continuous Dynamic System( ) ( )( ) ( )vuxthtz

wuxtftx

,,,

,,,

==

2. Discrete Dynamic System

( ) ( )( ) ( )kkkkk

kkkkk

vuxthtz

wuxtftx

,,,

,,, 1111

== −−−−

x - state space vector (n x 1)u - input vector (m x 1)

- measurement vector (p x 1)z

v - white measurement noise vector (p x 1)

- white input noise vector (n x 1)w

Recursive Bayesian Estimation

74

Recursive Bayesian EstimationSOLO

Using this property we obtain:

( ) ( )1021 |,,,| −−− = kkkkk xxpxxxxp

Markov Processes

( ) ( )( )

( )

( ) ( )( )

( )

( ) ( )∏=

−

−−−−

−−−−−−

=

=

=

−−

−

k

iii

k

xxp

kkkk

kk

xxp

kkkkkk

xxpxp

xxpxxxpxxp

xxxpxxxxpxxxxp

kk

kk

110

02

|

0211

021

|

021021

|

,,,,||

,,,,,,|,,,,

21

1

Markov Process:

Table of Content

the present discrete state probability depends only on the previous state.

The Markov Process is defined if we know p (x0) and p(xi|xi-1) for each i.

75


In a Markovian system the probability of the current true state depends only on the previous state, and is independent of the other earlier states

( ) ( )1021 |,,,| −−− = kkkkk xxpxxxxp

Similarly the measurements at the k-th time-step is dependent upon the current true state, so is conditionally independent of all other earlier states, given the current state

( ) ( )kkkkk xzpxxxzp |,,,| 01 =−

( ) ( ) ( ) ( ) ( )kkkkkkkk zpzxpxpxzpxzp ||, ==

From the definition of the Markovian system (see Figure) p (xk|xk-1) is defined byf and the statistics of x and w and p (zk|xk) is defined by h and statistics of x and v.

kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )111 ,, −−− kkk wuxf

( )kk vxh ,

Markov Processes

( )000 ,, wuxf

( )11,vxh

( )111 ,, wuxf

( )22 ,vxh

Hidden States

Measurements

76


( ) ( ) ( )( ) ( )kvkkk

xkkwkkkk

vpgivenvxhz

xpuwpgivenwuxfx

:,

,,:,, 011111 0

=

= −−−−−

Markov Processes

( ) ( )jkkkkxkkkw

jk wuxfxtsNjuxxfw

k 111111

1 ,,..,..,1,, −−−−−−

− ===Suppose that we can obtain all for which

jkw 1−

( ) ( ) ( )∑=

−

−−−−− ∇=kx

N

j

jkkkw

jkwkk wuxfwpxxp

1

1

11111 ,,|then

( ) ( ) ( )∑=

−∇=

kx

k

N

j

jkkv

jkvkk vxhvpxzp

1

1,|

( ) ( )jkkkzkkvjk vxhztsNjxzhv

k,..,..,1,1 === −

In the same way, suppose that we can obtain all for whichjkv

then

( ) ( ) ( )

( ) ( )∑

∑

=

−

−−−−

=−−−−

∇=

=+≤≤=

kx

kx

N

jk

jkkkw

jkw

N

j

jk

jkwkkkkkkkk

xdwuxfwp

wdwpxxdxXxxdxxp

1

1

1111

11111

,,

|Pr|

This is a

Conceptual

Not a Practical Procedure

Analytic Computations of and . ( )kk xzp |( )1| −kk xxp

77


( ) ( ) ( )( ) ( )kvkkk

xkkwkkkk

vpgivenvxhz

xpuwpgivenwuxfx

:

,,:, 011111 0

+=

+= −−−−−

kx1−kx

kz1−kz

( ) 111, −−− + kkk wuxf

( ) kk vxh +

Markov Processes

( ) ( )[ ]111 ,| −−− −= kkkwkk uxfxpxxptherefore

( ) ( )[ ]kkvkk xhzpxzp −=|and

For additive noise

we have( )

( )kkk

kkkk

xhzv

uxfxw

−=−= −−− 111 ,

Analytic Computations of and (continue – 1) ( )kk xzp |( )1| −kk xxp

78

SOLO

( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

kk vw &1− are system and measurement white-noise sequencesindependent of past and current states and on each other andhaving known P.D.F.s ( ) ( )kk vpwp &1−

We want to compute p (xk|Z1:k) recursively, assuming knowledge of p(xk-1|Z1:k-1) in two stages, prediction (before) and update (after measurement)

( ) ( )( ) ( )∫ −−−−− −= 11111 ,| kkkkkkk wdwpwxfxxxp δWe need to evaluate the following integrals:

( ) ( )( ) ( )∫ −= kkkkkkk vdvpvxhzxzp ,| δ

We use the numeric Monte Carlo Method to evaluate the integrals:

Generate (Draw): ( ) ( ) Skikk

ik Nivpvwpw ,,1~&~ 11 =−−

( ) ( )( ) S

N

i

ik

ik

ikkk Nwxfxxxp

S

∑=

−−− −≈1

111 /,| δ

( ) ( )( ) S

N

i

ik

ik

ikkk Nvxhzxzp

S

∑=

−≈1

/,| δor

( ) ( ) ( ) S

N

i

ikkkk

ik

ik

ik Nxxxxpwxfx

S

∑=

−−− −≈→=1

111 /|, δ

( ) ( ) ( ) S

N

i

ikkkk

ik

ik

ik Nzzxzpvxhz

S

∑=

−≈→=1

/|, δ

Analytic solutions for those integralequations do not exist in the generalcase.

Recursive Bayesian EstimationNumerical Computations of and .( )kk xzp |( )1| −kk xxpMarkov Processes

Prediction (before measurement) ( ) ( ) ( )∫ −−−−− = 11:1111:1 ||| kkkkkkk xdZxpxxpZxp1Update (after measurement)

( ) ( )( ) ( ) ( )

( )

( ) ( )( )

( ) ( )( ) ( )∫ −

−

−

−

=− ===

kkkkk

kkkk

kk

kkkkBayes

bp

apabpbap

kkkkkxdZxpxzp

Zxpxzp

Zzp

ZxpxzpZzxpZxp

1:1

1:1

1:1

1:1

||

1:1:1||

||

|

||,||

2

79


( ) ( ) ( )( ) ( )kvkkk

xkkwkkkk

vpgivenvxhz

xpuwpgivenwuxfx

:,

,,:,, 011111 0

=

= −−−−−

Markov ProcessesMonte Carlo Computations of and . ( )kk xzp |( )1| −kk xxp

Generate (Draw) ( ) Sxi Nixpx ,,1~ 00 0

=For ∞∈ ,,1 k

Initialization0

1 At stage k-1

Generate (Draw) NS samples ( ) Skwik Niwpw ,,1~ 11 =−−

2 State Update ( ) Sikk

ik

ik Niwuxfx ,,1,, 111 == −−−

3 Generate (Draw) Measurement Noise ( ) Skvik Nivpv ,,1~ =

k:=k+1 & return to 1

Compute Histograms of to obtain ( )kk xzp |

kk xz |

( ) ( )∑=

− −≈SN

iS

ikkkk Nxxxxp

11 /| δ

( ) ( )∑=

−≈SN

iS

ikkkk Nzzxzp

1

/| δ

Compute Histograms of to obtain

1| −kk xx( )1| −kk xxp

4 Measurement , Update ( ) Sik

ik

ik Nivxhz ,,1, ==kz

SOLO

Stochastic Processes deal with systems corrupted by noise. A description of those processes is given in “Stochastic Processes” Presentation. Here we give only one aspect of those processes.

( ) ( ) ( ) [ ]fttttwddttxftxd ,, 0∈+=A continuous dynamic system is described by:


( )tx - n- dimensional state vector

( )twd - n- dimensional process noise vector

Assuming system measurements at discrete time tk given by:

( ) ( )( ) [ ]fkkkkk tttvttxhtz ,,, 0∈=

kv - m- dimensional measurement noise vector at tk

We are interested in the probability of the state at time t given the set of discrete measurements until (included) time tk < t.

x

( )kZtxp |,

kk zzzZ ,,, 21 = - set of all measurements up to and including time tk.

The time evolution of the probability density function is described by the Fokker–Planck equation.

A solution to the one-dimensional Fokker–Planck equation, with both the drift and the diffusion term. The initial condition is a Dirac delta function in x = 1, and the distribution drifts towards x = 0.

The Fokker–Planck equation describes the time evolution of the probability density function of the position of a particle, and can be generalized to other observables as well. It is named after Adriaan Fokker and Max Planck and is also known as the Kolmogorov forward equation. The first use of the Fokker–Planck equation was the statistical description of Brownian motion of a particle in a fluid. In one spatial dimension x, the Fokker–Planck equation for a process with drift D1(x,t) and diffusion D2(x,t) is

More generally, the time-dependent probability distribution may depend on a set of N macrovariables xi. The general form of the Fokker–Planck equation is then

where D1 is the drift vector and D2 the diffusion tensor; the latter results from the presence of the stochastic force.

Fokker – Planck Equation

Adriaan Fokker 1887 - 1972

Max Planck1858 - 1947

SOLO

Adriaan Fokker„Die mittlere Energie rotierender elektrischer Dipole im Strahlungsfeld" Annalen der Physik 43, (1914) 810-820 Max Plank, „Ueber einen Satz der statistichen Dynamik und eine Erweiterung in der Quantumtheorie“, Sitzungberichte der Preussischen Akadademie der Wissenschaften (1917) p. 324-341


( ) ( ) ( )[ ] ( ) ( )[ ]txftxDx

txftxDx

txft

,,,,, 22

2

1 ∂∂+

∂∂−=

∂∂

( )[ ] ( )[ ]∑∑∑= == ∂∂

∂+∂∂−=

∂∂ N

i

N

jNji

ji

N

iNi

i

ftxxDxx

ftxxDx

ft 1 1

12

2

11

1 ,,,,,,

Fokker – Planck Equation (continue – 1)

The Fokker–Planck equation can be used for computing the probability densities of stochastic differential equations.

where is the state and is a standard M-dimensional Wiener process. If the initial probability distribution is , then the probability distribution of the stateis given by the Fokker – Planck Equation with the drift and diffusion terms:

Similarly, a Fokker–Planck equation can be derived for Stratonovich stochastic differential equations. In this case, noise-induced drift terms appear if the noise strength is state-dependent.

SOLO

Consider the Itô stochastic differential equation:

( ) ( ) ( )[ ] ( ) ( )[ ]txftxDx

txftxDx

txft

,,,,, 22

2

1 ∂∂+

∂∂−=

∂∂


Derivation of the Fokker–Planck Equation

SOLO

Start with ( ) ( ) ( )11|1, 111|, −−− −−−

= kxkkxxkkxx xpxxpxxpkkkkk

and ( ) ( ) ( ) ( )∫∫+∞

∞−−−−

+∞

∞−−− −−−

== 111|11, 111|, kkxkkxxkkkxxkx xdxpxxpxdxxpxp

kkkkkk

define ( ) ( )ttxxtxxttttt kkkk ∆−==∆−== −− 11 ,,,

( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( )∫+∞

∞−∆−∆− ∆−∆−∆−= ttxdttxpttxtxptxp ttxttxtxtx ||

Let use the Characteristic Function of

( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )ttxtxtxtxdttxtxpttxtxss ttxtxttxtx ∆−−=∆∆−∆−−−=Φ ∫+∞

∞−∆−∆−∆ |exp: ||

( ) ( ) ( ) ( )[ ]ttxtxp ttxtx ∆−∆− ||

The inverse transform is ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )∫∞+

∞−∆−∆∆− Φ∆−−=∆−

j

j

ttxtxttxtx sdsttxtxsj

ttxtxp || exp2

1|

π

Using Chapman-Kolmogorov Equation we obtain:

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )

( ) ( ) ( ) ( )[ ]

( ) ( )[ ] ( )

( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( )ttxdsdttxpsttxtxsj

ttxdttxpsdsttxtxsj

txp

j

j

ttxttxtx

ttx

ttxtxp

j

j

ttxtxtx

ttxtx

∆−∆−Φ∆−−=

∆−∆−Φ∆−−=

∫ ∫

∫ ∫

∞+

∞−

∞+

∞−∆−∆−∆

+∞

∞−∆−

∆−

∞+

∞−∆−∆

∆−

|

|

|

exp2

1

exp2

1

|

π

π



Derivation of the Fokker–Planck Equation (continue – 1)

SOLO

The Characteristic Function can be expressed in terms of the moments about x (t-Δt) as:

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( )ttxdsdttxpsttxtxsj

txpj

j

ttxttxtxtx ∆−∆−Φ∆−−= ∫ ∫+∞

∞−

∞+

∞−∆−∆−∆ |exp

2

1

π

( ) ( ) ( ) ( )( ) ( ) ( ) ( )[ ] ( ) ∑

∞

=∆−∆∆−∆ ∆−∆−−−+=Φ

1|| |

!1

i

ittxtx

i

ttxtx ttxttxtxEi

ss

Therefore

( ) ( )[ ] ( ) ( )[ ] ( )( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ] ( )ttxdsdttxpttxttxtxE

i

sttxtxs

jtxp

j

j

ttxi

ittxtx

i

tx ∆−∆−

∆−∆−−−+∆−−= ∫ ∫ ∑+∞

∞−

∞+

∞−∆−

∞

=∆−

1| |

!1exp

2

1

π

Use the fact that ( ) ( ) ( )[ ] ( ) ( ) ( )[ ]( )[ ] ,2,1,01exp

2

1 =∂

∆−−∂−=∆−−−∫∞+

∞−

itx

ttxtxsdttxtxss

j i

ii

j

j

i δπ

( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( )

( ) ( ) ( )[ ]( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] ( )∫∑

∫ ∫∞+

∞−

∞

=∆−

+∞

∞−∆−

∞+

∞−

∆−∆−∆−∆−−∂

∆−−∂−+

∆−∆−∆−−=

1

|!

1

exp2

1

ittx

i

i

ii

ttx

j

j

tx

ttxdttxpttxttxtxEtx

ttxtx

i

ttxdttxpsdttxtxsj

txp

δ

π

where δ [u] is the Dirac delta function:

[ ] ( ) [ ] ( ) ( ) ( ) ( ) ( )000..0exp2

1FFFtsuFFduuuFsdus

ju

j

j

==∀== −+

+∞

∞−

∞+

∞−∫∫ δ

πδ




SOLO

[ ] ( ) ( ) [ ] ( ) ( ) ( ) ( ) ( )afafaftsufufduuaufsduasj

uaau

j

j

==∀=−−=− −+=

+∞

∞−

∞+

∞−∫∫ ..exp

2

1 δπ

δ

[ ] ( ) ( ) ( ) ( ) ( ) ∫∫∫∞+

∞−

∞+

∞−

∞+

∞−

=→=−−=−j

j

j

j

j

j

sdussFsj

ufdu

dsdussF

jufsduass

jua

ud

dexp

2

1exp

2

1exp

2

1

πππδ

( ) [ ] ( ) ( ) ( ) ( )

( ) ( ) ( )au

j

j

j

j

j

j

j

j

ud

ufdsdsFass

jsdduusufass

j

sdduuasufsj

dusduassj

ufduuaud

duf

=

∞+

∞−

∞+

∞−

∞+

∞−

∞+

∞−

+∞

∞−

+∞

∞−

∞+

∞−

+∞

∞−

−=−=−−=

−−=−−=−

∫∫ ∫

∫ ∫∫ ∫∫

exp2

1expexp

2

1

exp2

1exp

2

1

ππ

ππδ

[ ] ( ) ( ) ( ) ( ) ( ) ( ) ∫∫∫∞+

∞−

∞+

∞−

∞+

∞−

=→=−−=−j

j

ii

ij

j

j

j

ii

i

i

sdussFsj

ufdu

dsdussF

jufsduass

jua

ud

dexp

2

1exp

2

1exp

2

1

πππδ

( ) [ ] ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )au

i

ii

j

j

iij

j

ii

j

j

iij

j

ii

i

i

ud

ufdsdassFs

jsdduusufass

j

sdduuasufsj

dusduassj

ufduuaud

duf

=

−=−=−−=

−−=−−=−

∫∫ ∫

∫ ∫∫ ∫∫∞+

∞−

∞+

∞−

∞+

∞−

∞+

∞−

+∞

∞−

+∞

∞−

∞+

∞−

+∞

∞−

1exp2

1expexp

2

1

exp2

1exp

2

1

ππ

ππδ

Useful results related to integrals involving Delta (Dirac) function




SOLO

( ) ( )[ ]

( ) ( )[ ]

( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ]txpttxdttxpttxtxttxdttxpsdttxtxsj ttxttxttx

ttxtx

j

j

∆−

+∞

∞−∆−

+∞

∞−∆−

∆−−

∞+

∞−

=∆−∆−∆−−=∆−∆−∆−− ∫∫ ∫ δπ

δ

exp2

1

( ) ( ) ( )[ ]( )[ ] ( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ] ( )

( ) ( ) ( )[ ]( )[ ] ( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ] ( )

( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ]( )( )[ ]∑

∑ ∫

∫∑

∞

==∆

∆−∆−

∞

=

∞+

∞−∆−∆−

+∞

∞−

∞

=∆−∆−

∂∆−∆−−∂−=

∆−∆−∆−∆−−∂

∆−−∂−=

∆−∆−∆−∆−−∂

∆−−∂−

10

|

1|

1|

|

!

1

|!

1

|!

1

it

i

ttxi

ttxtxii

ittx

ittxtxi

ii

ittx

ittxtxi

ii

tx

txpttxttxtxE

i

ttxdttxpttxttxtxEtx

ttxtx

i

ttxdttxpttxttxtxEtx

ttxtx

i

δ

δ

( ) [ ] ( ) ( ) ( ) [ ] [ ] ( )auau

i

i

i

i

i

ii

i

i

ud

ufdduua

uad

duf

ud

ufdduua

ud

duf

==

=−−

→−=− ∫∫+∞

∞−

+∞

∞−

δδ 1We found

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ]( )( )[ ]∑

∞

==∆

∆−∆−∆− ∂

∆−∆−−∂−+=1

0

| |

!

1

it

i

ttxi

ttxtxii

ttxtxtx

txpttxttxtxE

itxptxp

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( ) ( )[ ]( )( )[ ]∑

∞

=

∆−

→∆

∆−

→∆ ∂∆−∆−−∂

∆−=

∆−

100

|1lim

!

1lim

ii

ttxii

t

ittxtx

t tx

txpttxttxtxE

tit

txptxp

Therefore

Rearranging, dividing by Δt, and tacking the limit Δt→0, we obtain:




SOLO

We found ( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ]( )( )[ ]∑

∞

=

∆−∆−

→∆

∆−

→∆ ∂∆−∆−−∂

∆−=

∆−

1

|

00

|1lim

!

1lim

ii

ttxi

ttxtxi

t

ittxtx

t tx

txpttxttxtxE

tit

txptxp

Define: ( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ) t

ttxttxtxEtxtxm

ittxtx

t

i

∆∆−∆−−

=− ∆−

→∆−

|lim: |

0

Therefore ( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ]( )( )[ ]∑

∞

=

−

∂−∂−=

∂∂

1 !

1

ii

txiii

tx

tx

txptxtxm

it

txp

( ) ( )ttxtxt

∆−=→∆−

0lim: and:

This equation is called the Stochastic Equation or Kinetic Equation.

It is a partial differential equation that we must solve, with the initial condition:

( ) ( )[ ] ( )[ ]000 0 txptxp tx ===




SOLO

We want to find px(t) [x(t)] where x(t) is the solution of

( ) ( ) ( ) [ ]fg ttttntxfdt

txd,, 0∈+=

( ) 0: == tnEn gg

( )tng

( ) ( )[ ] ( ) ( )[ ] ( ) ( )τδττ −=−− ttQnntntnE gggg ˆˆ

Wiener (Gauss) Process

( ) ( )[ ] ( ) ( )[ ] ( ) [ ] ( ) [ ] ( )tQnEtxnEt

ttxttxtxEtxtxm gg

t===

∆∆−∆−−=−

→∆−22

2

2

0

2 ||

lim:

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( ) ( ) ( ) ( )txfnEtxftxtd

txdE

t

ttxttxtxEtxtxm g

t,,|

|lim:

0

0

1 =+=

=

∆∆−∆−−=−

→∆−

( ) ( )[ ] ( ) ( )[ ] ( ) 20

|lim:

0>=

∆∆−∆−−=−

→∆− it

ttxttxtxEtxtxm

i

t

i

Therefore we obtain:

( ) ( )[ ] ( )[ ] ( ) ( )[ ]( )( ) ( ) ( ) ( )[ ]

( )[ ] 2

2

2

1,

tx

txptQ

tx

txpttxf

t

txp txtxtx

∂∂

+∂

∂−=

∂∂


Fokker–Planck Equation

Return to Daum

89


Given a nonlinear discrete stochastic Markovian system we want to use k discretemeasurements Z1:k=z1,z2,…,zk to estimate the hidden state xk. For this we want tocompute the probability of xk given all the measurements Z1:k=z1,z2,…,zk .

If we know p ( xk| Z1:k ) then xk is estimated using:

( )∫== kkkkkkkk xdZxpxZxEx :1:1| ||:ˆ

( ) ( ) ( ) ( ) ( )∫ −−=−−= kkkT

kkkkkT

kkkkkk xdZxpxxxxZxxxxEP :1:1| |ˆˆ|ˆˆor more general we can compute all moments of the probability distribution p ( xk| Z1:k ):

( ) ( ) ( )∫= kkkkkk xdZxpxgZxgE :1:1 ||

Bayesian Estimation IntroductionProblem:Estimate the hiddenStates of aNon-linear DynamicStochastic Systemfrom NoisyMeasurements.

kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )11, −− kk wxf

( )kk vxh ,

( )00 ,wxf

( )11,vxh

( )11,wxf

( )22 ,vxh

The knowledge of p ( xk| Z1:k ) allows also the computation of Maximum a Posteriori(MAP) estimate using: ( )kk

x

MAPkk Zxpx

k:1| |maxargˆ =

90


To find the expression for p ( xk| Z1:k ) we use the theorem of joint probability (Bayes Rule):

( ) ( )( )k

kkRuleBayes

kk Zp

ZxpZxp

:1

:1:1

,| =

Since Z1:k = zk, Z1:k-1 : ( ) ( )( )1:1

1:1:1 ,

,,|

−

−=kk

kkkkk Zzp

ZzxpZxp

The denominator of this expression is

( ) ( ) ( )1:11:11:1 ,,|,, −−− = kkkkk

RuleBayes

kkk ZxpZxzpZzxp

( ) ( ) ( )

1:11:11:1 |,| −−−= kkkkkk ZpZxpZxzp

Since the knowledge of xk supersedes the need for Z1:k-1 = z1, z2,…,zk-1

( ) ( )kkkkk xzpZxzp |,| 1:1 ≡−

( ) ( ) ( ) ( )( ) ( )1:11:1

1:11:1:1 |

|||

−−

−−=kkk

kkkkkkk ZpZzp

ZpZxpxzpZxpTherefore:

( ) ( ) ( )1:11:11:1 |, −−− = kkk

RuleBayes

kk ZpZzpZzp

and the nominator is


91


The final result is: ( ) ( ) ( )

( )1:1

1:1:1 |

|||

−

−=kk

kkkkkk Zzp

ZxpxzpZxp

Therefore:

Since p ( xk| Z1:k ) is a probability distribution it must satisfy:

( ) ( ) ( )( )

( ) ( )( )∫ ∫∫

−

−

−

− ===1:1

1:1

1:1

1:1:1 |

||

|

|||1

kk

kkkkk

kkk

kkkkkkk Zzp

xdZxpxzpxd

Zzp

ZxpxzpxdZxp

( ) 1| :1 =∫ kkk xdZxp

( ) ( ) ( )∫ −− = kkkkkkk xdZxpxzpZzp 1:11:1 |||

( ) ( ) ( )( ) ( )∫ −

−=kkkkk

kkkkkk

xdZxpxzp

ZxpxzpZxp

1:1

1:1:1

||

|||

and:

This is a recursive relation that needs the value of p (xk|Z1:k-1), assuming thatp (zk|xk) is obtained from the Markovian system definition ( zk = h (xk,vk) ).


kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )11, −− kk wxf

( )kk vxh ,

( )00 ,wxf

( )11,vxh

( )11,wxf

( )22 ,vxh

Hidden States

Measurements

92


The Correction Step is:

( ) ( ) ( )( )1:1

1:1:1 |

|||

−

−=kk

kkkkkk Zzp

ZxpxzpZxp


evidence

priorlikelioodposterior

⋅=

or:

prior: given by prediction equation ( )kk xzp |

likelihood: given by observation model ( )1:1| −kk Zxp

evidence: the normalized constant on the denominator

( ) ( ) ( )∫ −− = kkkkkkk xdZxpxzpZzp 1:11:1 |||

93


( ) ( ) ( )1:111:111:11 |,||, −−−−−− = kkkkk

Bayes

kkk ZxpZxxpZxxp

( ) ( ) ( ) ( )∫∫ −−−−−−−− == 11:11111:111:1 |||,| kkkkkkkkkkk xdZxpxxpxdZxxpZxp

Using:

We obtain:

Since for a Markov Process the knowledge of xk-1 supersedes the need for Z1:k-1 = z1, z2,…,zk-1

( ) ( )11:11 |,| −−− = kkkkk xxpZxxp

Chapman – Kolmogorov Equation

Sydney Chapman1888 - 1970

Andrey Nikolaevich Kolmogorov

1903 - 1987


kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )11, −− kk wxf

( )kk vxh ,

( )00 ,wxf

( )11,vxh

( )11,wxf

( )22 ,vxh

Hidden States

Measurements

94



Using p (xk-1|Z1:k-1) from time-step k-1 and p (xk|xk-1) of the Markov system, compute:

Initialize with p (x0)

( ) ( ) ( )( ) ( )∫ −

−=kkkkk

kkkkkk

xdZxpxzp

ZxpxzpZxp

1:1

1:1:1

||

|||

Using p (xk|Z1:k-1) from Prediction phase and p (zk|xk) of the Markov system, compute:

( )∫== kkkkkkkk xdZxpxZxEx :1:1| ||ˆ

( ) ( ) ( ) ( ) ( )∫ −−=−−= kkkT

kkkkkT

kkkkkk xdZxpxxxxZxxxxEP :1:1| |ˆˆ|ˆˆ

At stage k

k:=k+1

( )1|11| ˆˆ −−− = kkkk xfx

0

Prediction phase (before zk measurement)

1

Correction Step (after zk measurement)2

Filtering3

kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )11, −− kk wxf

( )kk vxh ,

( )00 ,wxf

( )11,vxh

( )11,wxf

( )22 ,vxh

Bayesian Estimation Introduction - Summary

95



( ) ( ) ( )( ) ( )∫ −

−=kkkkk

kkkkkk

xdZxpxzp

ZxpxzpZxp

1:1

1:1:1

||

|||


1

Correction Step (after zk measurement)2

kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( )11, −− kk wxf

( )kk vxh ,

( )00 ,wxf

( )11,vxh

( )11,wxf

( )22 ,vxh

Bayesian Estimation Introduction - Summary

This is a Conceptual Solution because the Integrals are Often Not Tractable.

An optimal solution is possible for some restricted cases:

• Linear Systems with Gaussian Noises (system and measurements)

• Grid-Based Filters

Table of Content

96

SOLO

Linear Gaussian Systems

A Linear Combination of Independent Gaussian random vectors is also a Gaussian random vector

mmm XaXaXaS +++= 2211:

( ) ( ) ( )( ) ( )

( ) ( ) ( )

( ) ( )

+++++++−=

+−

+−

+−=

ΦΦ⋅Φ==Φ ∫ ∫+∞

∞−

+∞

∞−

mmmm

mmmm

YYYm

YpYp

mYYmS

aaajaaa

ajaajaaja

YdYdYYpSjm

mmYY

mm

µµµωσσσω

µωσωµωσωµωσω

ωωωωω

2211222

22

22

12

12

22222

22

22

211

21

21

2

11,,

2

1exp

2

1exp

2

1exp

2

1exp

,,exp21

11

1

( ) ( )

−−= 2

2

2exp

2

1,;

i

ii

i

iiiX

XXp

i σµ

σπσµ ( ) ( ) ( )

+−==Φ ∫

+∞

∞−iiiiXiX jXdXpXj

iiµωσωωω 22

2

1expexp:

Moment-Generating

Function

Gaussian distribution

Define

Proof:

( ) ( )iXii

iX

iiYiii Xp

aa

Yp

aYpXaY

iii

11: =

=→=

( ) ( ) ( ) ( ) ( ) ( )

+−=Φ===Φ ∫∫

+∞

∞−

+∞

∞−iiiiiiX

asign

asign

iii

iXiiiiYiY ajaXaXda

a

XpXajYdYpYj

i

i

iiµωσωωωω 222

2

1expexpexp:

1

1


97

SOLO

Linear Gaussian Systems (continue – 1)

A Linear Combination of Independent Gaussian random vectors is also a Gaussian random vector

mmm XaXaXaS +++= 2211:

Therefore the Linear Combination of Independent Gaussian Random Variables is a Gaussian Random Variable with

mmS

mmS

aaa

aaa

m

m

µµµµσσσσ

+++=

+++=

2211

2222

22

21

21

2

Therefore the Sm probability distribution is:

( ) ( )

−−=

2

2

2exp

2

1,;

m

m

m

mm

S

S

S

SSm

xSp

σµ

σπσµ

Proof (continue – 1):

( ) ( ) ( )

+++++++−=Φ mmmmS aaajaaa

mµµµωσσσωω 2211

2222

22

21

21

2

2

1exp

We found:


q.e.d.

98


Linear Gaussian Markov Systems (continue – 2)

( )( )kkkk

kkkk

vuxkhz

wuxkfx

,,,

,,,1 111

=−= −−−

kkkk

kkkkkkk

vxHz

wuGxx

+=Γ++Φ= −−−−−− 111111

wk-1 and vk, white noises, zero mean, Gaussian, independent

( ) ( ) ( ) ( ) ( ) ( )kPkekeEkxEkxke xTxxx =−= &:

( ) ( ) ( ) ( ) ( ) ( ) lkTwww kQlekeEkwEkwke ,

0

&: δ=−=

( ) ( ) ( ) ( ) ( ) ( ) lkTvvv kRlekeEkvEkvke ,

0

&: δ=−=

( ) ( ) 0=lekeE Tvw

=≠

=lk

lklk 1

0,δ

( ) ( )Qwwpw ,0;N=

( ) ( )Rvvpv ,0;N=

( )( )

−= − wQw

Qwp T

nw1

2/12/ 2

1exp

2

1

π

( )( )

−= − vRv

Rvp T

pv1

2/12/ 2

1exp

2

1

π

A Linear Gaussian Markov Systems is defined as

( ) ( )0|0000 ,;0

Pxxxp ttx == = N ( )( )

( ) ( )

−−−= =

−== 00

10|0002/1

0|02/0 2

1exp

2

10

xxPxxP

xp tT

tntxπ

99



111111 −−−−−− Γ++Φ= kkkkkkk wuGxxPrediction phase (before zk measurement)

0

1:111111:1111:11| |||:ˆ −−−−−−−−−− Γ++Φ== kkkkkkkkkkkk ZwEuGZxEZxEx

or 111|111| ˆˆ −−−−−− +Φ= kkkkkkk uGxx

The expectation is

[ ] [ ] ( )[ ] ( )[ ] 1:1111|111111|111

1:11|1|1|

|ˆˆ

|ˆˆ:

−−−−−−−−−−−−−

−−−−

Γ+−ΦΓ+−Φ=

−−=

kT

kkkkkkkkkkkk

kT

kkkkkkkk

ZwxxwxxE

ZxExxExEP

( ) ( ) ( )

( ) Tk

Q

Tkkk

Tk

Tkkkkk

Tk

Tkkkkk

Tk

P

Tkkkkkkk

wwExxwE

wxxExxxxE

kk

11111

0

1|1111

1

0

11|11111|111|111

ˆ

ˆˆˆ

1|1

−−−−−−−−−−

−−−−−−−−−−−−−−

ΓΓ+Φ−Γ+

Γ−Φ+Φ−−Φ=−−

Tkk

Tkkkkkk QPP 1111|111| −−−−−−− ΓΓ+ΦΦ=

( )1|1|1:1 ,ˆ;| −−− = kkkkkkk PxxZxP NSince is a Linear Combination of Independent Gaussian Random Variables:

111111 −−−−−− Γ++Φ= kkkkkkk wuGxx

100

SOLO

For the particular vector measurement equation

where the measurement noise, is Gaussian (normal), with zero mean: ( ) ( )kkkv Rvvp ,0;N=

( ) ( )( )xp

zxpxzp

x

zxxz

,| ,

| =

and independent of , the conditional probability can be written, using Bayes rule as:

kx ( )xzp xz ||

( )

−

−

==−=

1

111

1111

1

1

,

nxpp

nx

pxnxpxnpxpx

xHz

xHz

zxfxHzv

xn

xn

( ) ( ) 2/1

,, /,, T

vxzx JJvxpzxp =

The measurement noise can be related to and by the function:v zx

pxp

p

pp

p

I

z

f

z

f

z

f

z

f

z

fJ =

∂∂

∂∂

∂∂

∂∂

=

∂∂=

1

1

1

1

( ) ( ) ( ) ( )vpxpvxpzxp vxvxzx ⋅== ,, ,,

kv

Since the measurement noise is independent of :xv

zThe joint probability of and is given by:x

Recursive Bayesian EstimationLinear Gaussian Markov Systems (continue – 4)

kkkk vxHz +=

Correction Step (after zk measurement) - 1st Way

( ) ( ) ( )( )1:1

1:1:1 |

|||

−

−=kk

kkkkkk Zzp

ZxpxzpZxp

101

( ) ( )kkkv Rvvp ,0;N=kkkk vxHz +=

Consider a Gaussian vector , where ,measurement, , where the Gaussian noiseis independent of and .

vkx ( ) [ ]1|1| ,; −−= kkkkkkx Pxxxp

N

kx

( ) ( ) ( ) ( )∫∫+∞

∞−

+∞

∞−

== kkxkkxzkkkzxkz xdxpxzpxdzxpzp |, |,

is Gaussian with( )kz zp ( ) ( ) ( ) ( ) 1|

0

−=+=+= kkkkkkkkk xHvExEHvxHEzE

( ) ( )[ ] ( )[ ] [ ] [ ] ( )[ ] ( )[ ] [ ] [ ] [ ] k

Tkkkk

Tkk

Tk

Tkkkk

Tkkkkk

Tk

Tkkkkkkk

Tkkkkkkkkkk

Tkkkkkkkkkkkk

Tkkkkk

RHPHvvEHxxvEvxxEH

HxxxxEHvxxHvxxHE

xHvxHxHvxHEzEzzEzEz

+=+−−−−

−−=+−+−=

−+−+=−−=

−−−

−−−−

−−

1|

0

1|

0

1|

1|1|1|1|

1|1|cov

( )( ) ( )

( )[ ] ( )[ ] ( )[ ]

−−+−−−−

+−= −

xHzRHPHxHzRHPH

zp TT

Tpz ˆˆ2

1exp

2

1 1

2/12/π

( )( )

( ) ( )

−−−= −

−−−

−

−− 1|1

1|1|2/1

1|2/1:1| 2

1exp

2

1|

1:1 kkkkkT

kkk

kknkkZx xxPxxP

Zxpkk

π

( ) ( )( )

( ) ( )

−−−=−= −

kkkT

kkkpkkkvkkxz xHzRxHzR

xHzpxzp 12/12/| 2

1exp

2

1|

π


Linear Gaussian Markov Systems (continue – 5)Correction Step (after zk measurement) 1st Way (continue – 1)

102



kkkk vxHz +=

( ) ( )Rvvpv ,0;N=( )

( )

−= − vRv

Rvp T

pv1

2/12/ 2

1exp

2

1

π

Correction Step (after zk measurement) 1st Way (continue – 2)

( )( )

( ) ( )

−−−= −

−−−

−

−− 1|1

1|1|2/1

1|2/1:1| 2

1exp

2

1|

1:1 kkkkkT

kkk

kknkkZx xxPxxP

Zxpkk

π

( ) ( )( )

( ) ( )

−−−=−= −

kkkT

kkkpkkkvkkxz xHzRxHzR

xHzpxzp 12/12/| 2

1exp

2

1|

π

( )( )

[ ] [ ] [ ]

−+−−

+= −

−

−−

−

1|

1

1|1|2/1

1|2/

ˆˆ2

1exp

2

1kkkk

Tkkkk

Tkkk

kTkkkk

pkz xHzRHPHxHz

RHPHzp

π

( ) ( ) ( )( )

( )

( ) ( ) ( ) ( ) [ ] [ ] [ ]

−+−+−−−−−−⋅

+

==

−

−

−−−−

−−−

−

−−

−

1|

1

1|1|1|1

1|1|1

2/1

1|

2/12/1

1|2/1:1

1:1:1

ˆˆ2

1

2

1

2

1exp

2

1

|

|||

kkkkkTkkkk

Tkkkkkkkkk

Tkkkkkkk

Tkkk

kTkkkk

kkknkk

kkkkkk

xHzRHPHxHzxxPxxxHzRxHz

RHPH

RPZzp

ZxpxzpZxp

π

from which

103

( ) ( ) ( ) ( ) ( ) [ ] ( )1|

1

1|1|1|1

1|1|1

−

−

−−−−

−−− −+−−−−+−− kkkk

Tkkkkk

Tkkkkkkkkk

Tkkkkkkk

Tkkk xHzHPHRxHzxxPxxxHzRxHz

( )[ ] ( )[ ] ( ) ( )( ) [ ] ( ) ( ) [ ] ( )( ) ( ) ( ) ( ) ( ) [ ] ( )1|

111|1|1|

11|1|

11|

1|

1

1|1

1|1|

1

1|1|

1|1

1|1|1|1|1

1|1|

−−−

−−−−

−−−

−

−

−

−−

−−

−

−−

−−

−−−−−

−−

−+−+−−−−−−

−+−−=−+−−

−−+−−−−−−=

kkkkkTkkk

Tkkkkkkkk

Tkkkkkkkkk

Tk

Tkkk

kkkkTkkkkkk

Tkkkkkkkk

Tkkkkk

Tkkkk

kkkkkT

kkkkkkkkkkkkT

kkkkkkkk

xxHRHPxxxxHRxHzxHzRHxx

xHzHPHRRxHzxHzHPHRxHz

xxPxxxxHxHzRxxHxHz

[ ] [ ] 11111|

1111

1|1 −−−−

−−−−−

−− ++/−/=+− k

Tkkk

Tkkkkkkk

LemmaMatrixInverseTkkkkkk RHHRHPHRRRHPHRRwe have

Define:[ ] [ ] 11|

1

1|1

1|1

1|

1111|| : −

−

−

−−

−−

−

−−−− +−=+= kk

Tk

Tkkkkkkkkkk

LemmaMatrixInverse

kkTkkkkk PHHPHRHPPHRHPP

( )[ ] ( )[ ]1|1

|1|1

|1|1

|1| −−

−−

−−

− −+−−+−= kkkkkTkkkkkkkk

T

kkkkkTkkkkkk xHzRHPxxPxHzRHPxx

( )( )

( )[ ] ( )[ ]

−+−−+−−⋅= −

−−

−−

−− 1|

1|1|

1|1|

1|1|2/1

|2/:1| 2

1exp

2

1| kkkkk

Tkkkkkkkk

T

kkkkkTkkkkkk

kknkkzx xHzRHPxxPxHzRHPxxP

Zxp

π


Linear Gaussian Markov Systems (continue – 7)Correction Step (after zk measurement) 1st Way (continue – 3)

then ( ) ( ) ( ) ( ) ( ) [ ] ( )1|

1

1|1|1|1

1|1|1

−

−

−−−−

−−− −+−−−−+−− kkkkk

Tkkkk

Tkkkkkkkkk

Tkkkkkkk

Tkkk xHzRHPHxHzxxPxxxHzRxHz

( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )1|

1|1|1|

1||

11|

1|1

|1

|1|1|1

|1

||1

1|

−−

−−−−

−

−−−

−−−−−

−

−−+−−−

−−−−−=

kkkkkT

kkkkkkkkkkkkT

kkkk

kkkkkTkkkkk

Tkkkkkkkk

Tkkkkkkkkk

Tkkkk

xxPxxxxPPHRxHz

xHzRHPPxxxHzRHPPPHRxHz

104

then

( )kkzxx

Zxpk

:1| |max( )

kk

kkkkkTkkkkkkkk

ZxE

xHzRHPxxx

:1

1|1

|1|*

|

|

ˆˆ:ˆ

=

−+== −−

−



Correction Step (after zk measurement) 1st Way (continue – 4)

( )( )

( )[ ] ( )[ ]

−+−−+−−⋅= −

−−

−−

−− 1|

11|

1|1|

11|2/1

|2/:1| 2

1exp

2

1| kkkkk

Tkkkkkk

T

kkkkkTkkkk

kknkkzx xHzRHxxPxHzRHxxP

Zxp

π

where:[ ] ( ) ( ) kTkkkkkkkk

Tkkkkk ZxxxxEHRHPP :1||

1111|| ˆˆ: −−=+=

−−−−

105

( ) ( )

ki

kkkkkkkkkkkkkkkkk zzKxxHzKxZxEx 1|1|1|1|:1| ˆ| −−−− −+=−+==



Summary 1st Way – Kalman Filter

Initial Conditions:

[ ] 1111|| :

−−−− += kk

Tkkkkk HRHPP


111|111| ˆˆ −−−−−− +Φ= kkkkkkk uGxx

Correction Step (after zk measurement)

Tkk

Tkkkkkk QPP 1111|111| −−−−−−− ΓΓ+ΦΦ=

1|: −= k

Tkkkk RHPK

00|0ˆ xEx = ( ) ( ) TxxxxEP 0|000|000|0 ˆˆ: −−=

kkkk wxHz +=

0

1:11|1:11:11| |ˆ||ˆ −−−−− +=+== kkkkkkkkkkkkk ZwExHZwxHEZzEz

1|1| ˆˆ −− = kkkkk xHz

106



kkkk vxHz +=

( ) ( )Rvvpv ,0;N= ( )( )

−= − vRv

Rvp T

pv1

2/12/ 2

1exp

2

1

π

( )( )

[ ] [ ] [ ]

−+−−

+= −

−

−−

−

1|

1

1|1|2/1

1|2/

ˆˆ2

1exp

2

1kkkkk

Tkkkk

Tkkkk

kT

kkkkp

kz xHzRHPHxHzRHPH

zpπ

from which 1|1:11| ˆ|ˆ −−− == kkkkkkk xHZzEz

( ) ( ) kkT

kkkkkT

kkkkkkzzkk SRHPHZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|

[ ] [ ] [ ] ( )[ ] T

kkkkT

kkkkkkkk

kT

kkkkkkxzkk

HPZvxxHxxE

ZzzxxEP

1|1:11|1|

1:11|1|1|

ˆˆ

ˆˆ

−−−−

−−−−

=+−−=

−−=

We also have

Correction Step (after zk measurement) 2nd Way

Define the innovation: 1|1| ˆˆ: −− −=−= kkkkkk xHzzzi

107


Joint and Conditional Gaussian Random Variables

=

k

kk z

xyDefine: assumed that they are Gaussian distributed

Prediction phase (before zk measurement) 2nd way (continue -1)

=

=

−

−

−

−−

1|

1|

1:1

1:11:1 ˆ

ˆ

|

||

kk

kk

kk

kkkk z

x

Zz

ZxEZyE

=

−

−

−

−=

−−

−−−

−

−

−

−− zz

kkzxkk

xzkk

xxkk

k

T

kkk

kkk

kkk

kkkyykk

PP

PPZ

zz

xx

zz

xxEP

1|1|

1|1|

1:11|

1|

1|

1|

1| ˆ

ˆ

ˆ

ˆ

where: [ ] [ ] 1|1:11|1|1| ˆˆ −−−−− =−−= kkkT

kkkkkkxxkk PZxxxxEP

[ ] [ ] kkT

kkkkkT

kkkkkkzzkk SRHPHZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|

[ ] [ ] Tkkkk

Tkkkkkk

xzkk HPZzzxxEP 1|1:11|1|1| ˆˆ −−−−− =−−=


108

( ) ( ) ( )

−−−= −

−−−

−

− 1|

1

1|1|2/1

1|

1:1, ˆˆ2

1exp

2

1|, kkk

yykk

Tkkk

yykk

kkkzx yyPyyP

Zzxpπ



The conditional probability distribution function (pdf) of xk given zk is given by:

Prediction phase (before zk measurement) 2nd Way (continue – 2)

( ) ( ) ( )

−−−= −

−−−

−

− 1|

1

1|1|2/1

1|

1:1 ˆˆ2

1exp

2

1| kkk

zzkk

Tkkk

zzkk

kkz zzPzzP

Zzpπ

( ) ( ) ( )( )

( ) ( )

( ) ( )

−−−

−−−

===−

−−−

−−

−−

−

−

−

−−

1|

1

1|1|

1|

1

1|1|

2/1

1|

2/1

1|

1:1

1:1,|1:1|

ˆˆ21

exp

ˆˆ21

exp

2

2

|

|,|,|

kkkzzkk

Tkkk

kkkyykk

Tkkk

yykk

zzkk

kkz

kkkzxkkzxkkkzx

zzPzz

yyPyy

P

P

Zzp

ZzxpzxpZzxp

π

π

( ) ( ) ( ) ( )

−−+−−−= −

−−−−

−−−

−

−1|

1

1|1|1|

1

1|1|2/1

1|

2/1

1|ˆˆ

2

1ˆˆ

2

1exp

2

2kkk

zzkk

Tkkkkkk

yykk

Tkkk

yykk

zzkk

zzPzzyyPyyP

P

π

π


We assumed that is Gaussian distributed:

=

k

kk z

xy

109




( ) ( ) ( ) ( ) ( )

−−+−−−= −

−−−−

−−−

−

−1|

1

1|1|1|

1

1|1|2/1

1|

2/1

1|

| ˆˆ2

1ˆˆ

2

1exp

2

2| kkk

zzkk

Tkkkkkk

zzkk

Tkkk

yykk

zzkk

kkzx zzPzzyyPyyP

Pzxp

π

π

Define: 1|1| ˆ:&ˆ: −− −=−= kkkkkkkk zzxx ςξ

( ) ( ) ( ) ( )

kzzkk

Tkk

zzkk

Tkk

zxkk

Tkk

xzkk

Tkk

xxkk

Tk

kkkzzT

k

k

k

zzkk

zxkk

xzkk

xxkk

T

k

k

kzzkk

Tk

k

k

zzkk

zxkk

xzkk

xxkk

T

k

k

kkkzzkk

Tkkkkkk

yykk

Tkkk

PTTTT

PTT

TT

PPP

PP

zzPzzyyPyyq

ςςςςξςςξξξ

ςςςξ

ςξ

ςςςξ

ςξ

1

1|1|1|1|1|

11|

1|1|

1|1|

1

1|

1

1|1|

1|1|

1|

1

1|1|1|

1

1|1| ˆˆˆˆ:

−−−−−−

−−

−−

−−

−−

−

−−

−−

−−

−−−−

−−

−+++=

−

=

−

=

−−−−−=


110



Prediction phase (before zk measurement) 2nd way (continue – 4)Using Inverse Matrix Lemma:

( ) ( )( ) ( )

−−−

−−−=

−−−−−

−−−−−−

11111

111111

nxmnxnmxnmxmmxnmxmnxmnxnmxnmxm

mxmnxmmxnmxmnxmnxnmxnmxmnxmnxn

mxmmxn

nxmnxn

BADCDCBADC

CBDCBADCBA

CD

BA

=

−−

−−

−

−−

−−

zzkk

zxkk

xzkk

xxkk

zzkk

zxkk

xzkk

xxkk

TT

TT

PP

PP

1|1|

1|1|

1

1|1|

1|1|

in1

1|1|1|

1

1|

1|

1

1|1|1|

1

1|

1|1

1|1|1|1

1|

−−−−

−−

−−

−−−−

−

−−

−−−−

−

−=

−=

−=

zzkk

xzkk

xzkk

xxkk

xzkk

xxkk

zxkk

zzkk

zzkk

kkzxkkzzkkxzkkxxkkxx

PPTT

TTTTP

PPPPT

kzzkk

Tkk

zzkk

Tkk

zxkk

Tkk

xzkk

Tkk

xxkk

Tk PTTTTq ςςςςξςςξξξ 1

1|1|1|1|1|

−−−−−− −+++=

( )k

zzkk

Tkk

zzkk

Tk

kxzkk

xxkk

zxkk

Tkk

xzkk

xxkk

zxkk

Tkk

xzkk

Tkk

xxkk

xxkk

zxkk

Tk

Tk

PT

TTTTTTTTTT

ςςςς

ςςςςςξξςξ1

1|1|

1|

1

1|1|1|

1

1|1|1|1|

1

1|1|

−−−

−−

−−−−

−−−−−

−−

−+

−+++=

( ) ( )( ) ( ) ( )k

xzkk

xxkkk

xxkk

T

kxzkk

xxkkkk

zzkk

xzkk

xxkkkkzx

zzkk

Tk

kxzkk

xxkk

xxkk

T

kxzkk

xxkkkk

xxkk

T

kxzkk

xxkkk

TT

TTTTTPTTTT

TTTTTTTTzxkk

Txzkk

ςξςξςς

ςςξξςξ

1|

1

1|1|1|

1

1|

0

1|1|

1

1|1|1|

1|

1

1|1|1|

1

1|1|1|

1

1|

1|1|

−−

−−−−

−−−−

−−−

−−

−−−−

−−−−

−

=

++=−−+

+++=−−


111



Prediction phase (before zk measurement) 2nd way (continue – 5)

=

−−

−−

−

−−

−−

zzkk

zxkk

xzkk

xxkk

zzkk

zxkk

xzkk

xxkk

TT

TT

PP

PP

1|1|

1|1|

1

1|1|

1|1|

1

1|1|1|

1

1|

1|

1

1|1|1|

1

1|

1|1

1|1|1|1

1|

−−−−

−−

−−

−−−−

−

−−

−−−−

−

−=

−=

−=

zzkk

xzkk

xzkk

xxkk

xzkk

xxkk

zxkk

zzkk

zzkk

kkzxkkzzkkxzkkxxkkxx

PPTT

TTTTP

PPPPT

( ) ( )kxzkk

xxkkk

xxkk

T

kxzkk

xxkkk TTTTTq ςξςξ 1|

1

1|1|1|

1

1| −−

−−−−

− ++=

1|1| ˆ:&ˆ: −− −=−= kkkkkkkk zzxx ςξ

( )

( )[ ] ( )[ ]

−−−−−−−=

−=

−−−−−

−

−

−

−

1|1|1|1|1|2/1

1|

2/1

1|

2/1

1|

2/1

1|

|

ˆˆˆˆ2

1exp

2

2

2

1exp

2

2|

kkkkkkkxxkk

Tkkkkkkk

yykk

zzkk

yykk

zzkk

kkzx

zzKxxTzzKxxP

P

qP

Pzxp

π

π

π

π

( )1|

1

1|1|1|

1

1|1| ˆˆ −−

−−−−

−− −−−=+ kkk

K

zzkk

xzkkkkkk

xxkk

xzkkk zzPPxxTT

k

ςξ


112




( ) ( )[ ] ( )[ ]

−−−−−−−= −

−−−−−−

−−−− 1|

1

1|1|1|1|1|

1

1|1|1|| ˆˆˆˆ2

1exp| kkk

xxkk

xzkkkkk

xxkk

T

kkkxxkk

xzkkkkkkkzx zzPPxxTzzPPxxczxp

From this we can see that

( )1|

1

1|1|1|| ˆˆˆ| −−

−−− −+== kkk

K

zzkk

xzkkkkkkkk zzPPxxzxE

k

( ) ( ) T

kzzkkk

xxkk

zxkk

zzkk

xzkk

xxkk

xxkkk

Tkkkkkk

xxkk

KPKP

PPPPTZxxxxEP

1|1|

1|

1

1|1|1|

1

1|:1||| ˆˆ

−−

−−

−−−−

−

−=

−==−−=

[ ] [ ] 1|1:11|1|1| ˆˆ −−−−− =−−= kkkT

kkkkkkxxkk PZxxxxEP

[ ] [ ] kT

kkkkkkT

kkkkkkzzkk SHPHRZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|

[ ] [ ] Tkkkk

Tkkkkkk



113




From this we can see that

( ) [ ] 1111|1|

1

1|1|1||

−−−−−

−

−−− +=+−= kkT

kkkkkkT

kkkkkT

kkkkkkk HRHPPHHPHRHPPP

( ) 11|

1

1|1|

1

1|1|−

−

−

−−−

−− =+== kT

kkkT

kkkkkT

kkkzzkk

xzkkk SHPHPHRHPPPK


kkT

kkkkk KSKPP −= −1||

or

[ ] [ ] 1|1:11|1|1| ˆˆ −−−−− =−−= kkkT

kkkkkkxxkk PZxxxxEP

[ ] [ ] kT

kkkkkkT

kkkkkkzzkk SHPHRZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|

[ ] [ ] Tkkkk

Tkkkkkk


114

We found that the optimal Kk is

[ ] 11|1|

−

−− += Tkkkkk

Tkkkk HPHRHPK

[ ] [ ] 1111|1

11

&

1

|1 11|

1

−−−−+

−−−

+ +−=+−

−− k

Tkkk

Tkkkkkk

LemmaMatrixInverse

existPR

Tkkkkk RHHRHPHRRHPHR

kkk

[ ] 11111|

11|

11|

−−−−−

−−

−− +−= k

Tkkk

Tkkkkk

Tkkkk

Tkkkk RHHRHPHRHPRHPK

[ ] [ ] 1111|1

111|1|1

−−−−+

−−−++ +−+= k

Tkkk

Tkkkkk

Tkkk

Tkkkkk RHHRHPHRHHRHPP

[ ] 1|

1111|1

−−−−−+ =+= RHPRHHRHPK T

kkkT

kkkT

kkkk

If Rk-1 and Pk|k-1

-1 exist:



Relation Between 1st and 2nd ways

2nd Way

1st Way = 2nd Way

115

1|1| ˆˆ: −− −=−= kkkkkkkk zzxHzi


Linear Gaussian Markov Systems (continue – 19)Innovation

The innovation is the quantity:

We found that:

( ) 0ˆ||ˆ| 1|1:11:11|1:1 =−=−= −−−−− kkkkkkkkkk zZzEZzzEZiE

[ ] [ ] kT

kkkkkkT

kkkT

kkkkkk SHPHRZiiEZzzzzE =+==−− −−−−− :ˆˆ 1|1:11:11|1|

Using the smoothing property of the expectation:

( ) ( ) ( ) ( )( )

( ) ( ) xEdxxpxdxdyyxpx

dxdyypyxpxdyypdxyxpxyxEE

x

X

x y

YX

x yyxp

YYX

y

Y

x

YX

YX

==

=

=

=

∫∫ ∫

∫ ∫∫ ∫

∞+

−∞=

∞+

−∞=

∞+

−∞=

∞+

−∞=

∞+

−∞=

∞+

−∞=

∞+

−∞=

,

||

,

,

||

,

1:1 −= kT

jkT

jk ZiiEEiiEwe have:

Assuming, without loss of generality, that k-1 ≥ j, and innovation I (j) is Independent on Z1:k-1, and it can be taken outside the inner expectation:

0

0

1:11:1 =

== −−T

jkkkT

jkT

jk iZiEEZiiEEiiE

116

1|1| ˆˆ: −− −=−= kkkkkkkk zzxHzi


Linear Gaussian Markov Systems (continue – 20)Innovation (continue – 1)

The innovation is the quantity:

We found that:

( ) 0ˆ||ˆ| 1|1:11:11|1:1 =−=−= −−−−− kkkkkkkkkk zZzEZzzEZiE

kT

kkkkkkT

kk SHPHRZiiE =+= −− :1|1:1

0=Tjk iiE

jikT

jk SiiE δ=

The uncorrelated ness property of the innovation implies that since they are Gaussian,the innovation are independent of each other and thus the innovation sequence isStrictly White.

Thus the innovation sequence is zero mean and white for the Kalman (Optimal) Filter.

Without the Gaussian assumption, the innovation sequence is Wide Sense White.

Table of Content

117


Closed-Form Solutions of Estimation

Closed-Form solutions for the Optimal Recursive Bayesian Estimationcan be derived only for special cases

The most important case:

• Dynamic and measurement models are linear

( )( )kkkk

kkkk

vuxkhz

wuxkfx

,,,

,,,1 111

=−= −−−

kkkk

kkkkkkk

vxHz

wuGxx

+=Γ++Φ= −−−−−− 111111

• Random noises are Gaussian

( ) ( )Qwwpw ,0;N=

( ) ( )Rvvpv ,0;N=

( )( )

−= wQw

Qwp T

nw 2

1exp

2

12/12/π

( )( )

−= − vRv

Rvp T

pv1

2/12/ 2

1exp

2

1

π• Solution: KALMAN FILTER

• In other non-linear/non-Gaussian cases:

USE APPROXIMATIONS

118


Closed-Form Solutions of Estimation (continue – 1)

• Dynamic and measurement models are linear

kkkk

kkkkkkk

vxHz

wuGxx

+=Γ++Φ= −−−−−− 111111

• The Optimal Estimator is the Kalman Filter developed by R. E. Kalman in 1960

( ) ( ) ( ) ( ) ( ) ( )kPkekeEkxEkxke xT

xxx =−= &:

( ) ( ) ( ) ( ) ( ) ( ) lkT

www kQlekeEkwEkwke ,

0

&: δ=−=

( ) ( ) ( ) ( ) ( ) ( ) lkT

vvv kRlekeEkvEkvke ,

0

&: δ=−= ( ) ( ) 0=lekeE T

vw

=≠

=lk

lklk 1

0,δ

Rudolf E. Kalman( 1920 - )

• K.F. is an Optimal Estimator (in the Minimum Mean Square Estimator (MMSE) ) sense if: - state and measurement models are linear - the random elements are Gaussian

• Under those conditions, the covariance matri: - independent of the state (can be calculated off-line) - equals the Cramer – Rao lower bound

Table of Content

119

Kalman FilterState Estimation in a Linear System (one cycle)

SOLO

1: += kk

Initialization ( ) ( ) TxxxxEPxEx 00000|000 ˆˆˆ −−==0

State vector prediction111|111| ˆˆ −−−−−− +Φ= kkkkkkk uGxx1

Covariance matrix extrapolation111|111| −−−−−− +ΦΦ= kT

kkkkkk QPP2

Innovation CovariancekT

kkkkk RHPHS += −1|3

Gain Matrix Computation11|

−−= k

Tkkkk SHPK4

Measurement & Innovation1|ˆ

1|ˆ

−

−−=kkz

kkkkk xHzi5

Filteringkkkkkk iKxx += −1|| ˆˆ6

Covariance matrix updating

( )( ) ( ) T

kkkT

kkkkkk

kkkk

Tkkkkk

kkkkT

kkkkkkk

KRKHKIPHKI

PHKI

KSKP

PHSHPPP

+−−=

−=−=

−=

−

−

−

−−

−−

1|

1|

1|

1|1

1|1||7

120

Kalman FilterState Estimation in a Linear System (one cycle)

Sensor DataProcessing andMeasurement

Formation

Observation -to - Track

Association

InputData Track Maintenance

( Initialization,Confirmationand Deletion)

Filtering andPrediction

GatingComputations

Samuel S. Blackman, " Multiple-Target Tracking with Radar Applications", Artech House,1986

Samuel S. Blackman, Robert Popoli, " Design and Analysis of Modern Tracking Systems",Artech House, 1999

SOLO


121

SOLO

General Bayesian Nonlinear Filters

General Bayesian Nonlinear Filters

Additive GaussianNoise

Gauss Hermite Kalman Filter

(GHKF)

Unscented Kalman Filter

(UKF)

Non-ResamplingParticleFilter

Gaussian Particle

Filter (GPF)

Gauss Hermite Particle Filter

(GHPF)

Unscented Particle Filter

(UPF)

Monte CarloParticle Filter

(MCPF)


Monte CarloKalman Filter

(MCKF)

ExtendedKalman Filter

(EKF)

Non-AdditiveNon-Gaussian

Noise

ResamplingParticleFilter

Sequential Importance Sampling Particle

Filter (SIS PF)

Bootstrap Particle

Filter (BPF)

Run This

Table of Content

122

Extended Kalman FilterSensor Data

Processing andMeasurement

Formation


Association




GatingComputations



SOLO

In the extended Kalman filter, (EKF) the state transition and observation models need not be linear functions of the state but may instead be (differentiable) functions.

( ) ( ) ( )[ ] ( )kwkukxkfkx +=+ ,,1

( ) ( ) ( )[ ] ( )11,1,11 +++++=+ kkukxkhkz νState vector dynamics

Measurements


xxx =−= &:

( ) ( ) ( ) ( ) ( ) ( ) lkT


0

&: δ=−=

( ) ( ) lklekeE Tvw ,0 ∀=

=≠

=lk

lklk 1

0,δ

The function f can be used to compute the predicted state from the previous estimate and similarly the function h can be used to compute the predicted measurement from the predicted state. However, f and h cannot be applied to the covariance directly. Instead a matrix of partial derivatives (the Jacobian) is computed.

( ) ( ) ( )[ ] ( ) ( )[ ] ( )( )

( ) ( )( )

( ) ( )kekex

fkeke

x

fkekukxEkfkukxkfke wx

Hessian

kxE

Txx

Jacobian

kxE

wx ++∂∂+

∂∂=+−=+

2

2

2

1,,,,1

( ) ( ) ( )[ ] ( ) ( )[ ] ( )( )

( ) ( )( )

( ) ( )1112

1111,1,11,1,11

1

2

2

1

++++∂∂+++

∂∂=+++++−+++=+

++

kkex

hkeke

x

hkkukxEkhkukxkhke x

Hessian

kxE

Txx

Jacobian

kxE

z νν

Taylor’s Expansion:

123

Extended Kalman FilterState Estimation (one cycle)

SOLO

1: += kk

( )11|11| ,ˆ,1ˆ −−−− −= kkkkk uxkfxState vector prediction1

Jacobians Computation

1|1|1 ˆˆ

1 &−−−

∂∂=

∂∂=Φ −

kkkk x

k

x

k x

hH

x

f2

Covariance matrix extrapolation111|111| −−−−−− +ΦΦ= kT

kkkkkk QPP3

Innovation CovariancekT

kkkkk RHPHS += −1|4

Gain Matrix Computation11|

−−= k

Tkkkk SHPK5

Measurement & Innovation1|ˆ

1|ˆ

−

−−=kkz

kkkkk xHzi6

Filteringkkkkkk iKxx += −1|| ˆˆ7

Covariance matrix updating

( )( ) ( ) T

kkkT

kkkkkk

kkkk

Tkkkkk

kkkkT

kkkkkkk

KRKHKIPHKI

PHKI

KSKP

PHSHPPP

+−−=

−=−=

−=

−

−

−

−−

−−

1|

1|

1|

1|1

1|1||8

0 Initialization (k = 0) ( ) ( ) TxxxxEPxEx 00000|000 ˆˆˆ −−==

124

Extended Kalman FilterState Estimation (one cycle)


Formation


Association




GatingComputations



SOLO


125

SOLO

Criticism of the Extended Kalman FilterUnlike its linear counterpart, the extended Kalman filter is not an optimal estimator. In addition, if the initial estimate of the state is wrong, or if the process is modeled incorrectly, the filter may quickly diverge, owing to its linearization. Another problem with the extended Kalman filter is that the estimated covariance matrix tends to underestimate the true covariance matrix and therefore risks becoming inconsistent in the statistical sense without the addition of "stabilizing noise".Having stated this, the Extended Kalman filter can give reasonable performance, and is arguably the de facto standard in navigation systems and GPS.

Extended Kalman Filter

Table of Content

126

SOLO

Additive Gaussian Nonlinear Filter

Consider the case of a Markovian process where the noise is additive and Gaussian:

( )( ) kkk

kkk

vxhz

wxfx

+=+= −− 11

( ) ( )kkkw Qwwp ,0;N=

( ) ( )kkkv Rvvp ,0;N=

( )( )

−= kk

Tk

knkw wQw

Qwp

2

1exp

2

12/12/π

( )( )

−= −

kkT

k

kpkv vRv

Rvp 1

2/12/ 2

1exp

2

1

π

where wk and vk are independent white noises Gaussian, with zero mean and covariances Qk and Rk, respectively:


Therefore, since f (xk-1) is a deterministic function, by adding the Gaussian noise wk-1, we obtain xk also a Gaussian random variable.

( ) ( )( )111:11 ,;,| −−−− = kkkkkk QxfxZxxp N

127

SOLO

Additive Gaussian Nonlinear Filter (continue – 1) ( )( ) kkk

kkk

vxhz

wxfx

+=+= −− 11


( ) ( )( )111:11 ,;,| −−−− = kkkkkk QxfxZxxp N

( ) ( ) ( )1:111:111:11 |,||, −−−−−− = kkkkk

Bayes

kkk ZxpZxxpZxxp

( ) ( ) ( ) ( )∫∫ −−−−−−−−− == 11:111:1111:111:1 |,||,| kkkkkkkkkkkk xdZxpZxxpxdZxxpZxp

Using:

we obtain:

( ) ( )( ) ( )∫ −−−−−− = 11:11111:1 |,;| kkkkkkkk xdZxpQxfxZxp N

( ) ( )( ) ( )[ ]∫ ∫∫ −−−−−−−− === kkkkkkkkkkkkkkkk xdxdZxpQxfxxxdZxpxZxEx 11:11111:11:11| |,;||:ˆ N

( )( )[ ] ( ) ( ) ( )∫∫ ∫ −−−−−−−−− == 11:11111:1111 ||,; kkkkkkkkkkkk xdZxpxfxdZxpxdQxfxx N

Assume that is Gaussian with mean and covariance , then1−kx1|1ˆ −− kkx 1|1 −− kkP

( ) ( )1|11|111:11 ,ˆ;| −−−−−−− = kkkkkkk PxxZxp N

( ) ( )∫ −−−−−−−−− == 11|11|1111:11| ,ˆ;|ˆ kxx

kkkkkkkkkk xdPxxxfZxEx N

128

SOLO


kkk

vxhz

wxfx

+=+= −− 11


( ) ( )xxkkkkkkk PxxZxp 1|11|111:11 ,ˆ;| −−−−−−− = N

( ) ( )∫ −−−−−−−−− == 11|11|1111:11| ,ˆ;|ˆ kxx


( ) ( ) ( )[ ] ( )[ ] ( )[ ] ( )[ ] ( )∫ −−−−−−−−−−−−

−−−−−−−−−−−

−+−+=

−+−+=−−=

11|11|111|111|11

1:11|111|111:11|1|1|

,ˆ;ˆˆ

|ˆˆ|ˆˆ

kxx

kkkkkT

kkkkkkkk

kT

kkkkkkkkkT

kkkkkkxx

kk

xdPxxxwxfxwxf

ZxwxfxwxfEZxxxxEP

N

( ) ( ) ( ) Tkkkkkk

xxkkkkkk

Tk

xxkk xxQxdPxxxfxfP 1|1|111|11|11111| ˆˆ.ˆ, −−−−−−−−−−−− −+= ∫ N

Let compute now

( )∫ −−−− == kkkkkkkkk xdZxpzZxzEz 1:11:111| |,|ˆ

( ) ( )[ ] ( )∫∫ −−−−−−− +=== kxx

kkkkkkkkxx

kkkkkkkkkkk xdPxxvxhxdPxxzZxzEz 1|1|1|1|1:111| ,ˆ;,ˆ;,|ˆ NN

Since xk and vk are independent ( ) ( )∫ −−−−− == kxx

kkkkkkkkkkk xdPxxxhZxzEz 1|1|1:111| ,ˆ;,|ˆ N

Using the Gaussian approximation of p (xk| Z1:k-1) given by

( ) ( )xxkkkkkkk PxxZxp 1|1|1:1 ,ˆ;| −−− ≈N

129

SOLO


kkk

vxhz

wxfx

+=+= −− 11


( ) ( )xxkkkkkkk PxxZxp 1|1|1:1 ,ˆ;| −−− ≈N

Since xk and vk are independent

( ) ( )∫ −−−−− == kxx


( ) ( ) ( )[ ] ( )[ ] ( )[ ] ( )[ ] ( )∫ −−−−

−−−−−−−

−+−+=

−+−+=−−=

kxx

kkkkkT

kkkkkkkk

kT

kkkkkkkkkT

kkkkkkzzkk

xdPxxzvxhzvxh

ZzvxhzvxhEZzzzzEP

1|1|1|1|

1:11|1|1:11|1|1|

.ˆ,ˆˆ

|ˆˆ|ˆˆ

N

( ) ( ) ( ) Tkkkkkk

xxkkkkkk

Tk

zzkk zzRxdPxxxhxhP 1|1|1|1|1| ˆˆ,ˆ; −−−−− −+= ∫ N

In the same way

( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( )∫ −−−−

−−−−−−−

−+−=

−+−=−−=

kxx

kkkkkT

kkkkkkk

kT

kkkkkkkkT

kkkkkkzx

kk

xdPxxzvxhxx

ZzvxhxxEZzzxxEP

1|1|1|1|

1:11|1|1:11|1|1|

.ˆ,ˆˆ

|ˆˆ|ˆˆ

N

( ) ( ) Tkkkkk

xxkkkkkk

Tk

zxkk zxxdPxxxhxP 1|1|1|1|1| ˆˆ,ˆ; −−−−− −= ∫ N

130

SOLO


kkk

vxhz

wxfx

+=+= −− 11


( ) ( )∫ −−−−− == kxx


( ) ( ) ( ) Tkkkkkk

xxkkkkkk

Tk

zzkk zzRxdPxxxhxhP 1|1|1|1|1| ˆˆ,ˆ; −−−−− −+= ∫ N

( ) ( ) Tkkkkk

xxkkkkkk

Tk

zxkk zxxdPxxxhxP 1|1|1|1|1| ˆˆ,ˆ; −−−−− −= ∫ N

( ) ( )∫ −−−−−−−−− == 11|11|1111:11| .ˆ;|ˆ kxx


( ) ( ) ( ) Tkkkkkk

xxkkkkkk

Tk

xxkk xxQxdPxxxfxfP 1|1|111|11|11111| ˆˆ,ˆ; −−−−−−−−−−−− −+= ∫ N

Summary

Initialization0

( ) ( ) TxxxxEP

xEx

00000|0

00

ˆˆ

ˆ

−−=

=

For ∞∈ ,,1k

State Prediction and its Covariance1

Measure Prediction and Covariances2

kx1−kx

kz1−kz

0x 1x 2x

1z 2z kZ :11:1 −kZ

( ) 11 −− + kk wxf

( ) kk vxh +

( ) 00 wxf +

( ) 11 vxh +

( ) 11 wxf +

( ) 22 vxh +

131

SOLO


kkk

vxhz

wxfx

+=+= −− 11


Summary (continue – 1)

We showed that the Kalman Filter, that uses this computations is given by:

( )1|

1

1|1|1|| ˆˆ|ˆ −−

−−− −+== kkk

K

zzkk

zxkkkkkkkk zzPPxzxEx

k

( ) ( ) T

kzzkkk

xxkk

xzkk

zzkk

zxkk

xxkkk

Tkkkkkk

xxkk

KPKP

PPPPZxxxxEP

1

1|1|

1|

1

1|1|1|:1||| ˆˆ

−−−

−−

−−−

−=

−=−−=

Kalman Gain Computations31

1|1|

−−−= zz

kkxzkkk PPK

k := k+1 & return to 1

Update State and its Covariance4

132

SOLO


kkk

vxhz

wxfx

+=+= −− 11


( ) ( )∫= xdPxxxgI xx,ˆ;N

To obtain the Kalman Filter, we must approximate integrals of the type:

Three approximation are presented:

(1) Gauss – Hermite Quadrature Approximation

(2) Unscented Transformation Approximation

(3) Monte Carlo Approximation

Table of Content

133

SOLO

Additive Gaussian Nonlinear Filter (continue – 7)( )

( ) kkk

kkk

vxhz

wxfx

+=+= −− 11


( ) ( )∫= xdPxxxgI xx,ˆ;N

To obtain the Kalman Filter, we must approximate integrals of the type:

Gauss – Hermite Quadrature Approximation

( )( )[ ] ( ) ( )∫

−−−= − xdxxPxx

PxgI xx

T

xxn

ˆˆ2

1exp

2

1 1

2/1π

Let Pxx = STS a Cholesky decomposition, and define: ( )xxSz ˆ2

1: 1 −= −

( ) ( )∫ −= zdezgI zzn

T

2/2

2

π

This integral can be approximated using the Gauss – Hermite quadrature rule:

( ) ( )∑∫=

− ≈M

iii

z zfwzdzfe1

2

where the quadrature points zi and weights wi are defined as follows:

Carl Friedrich Gauss

1777 - 1855

Charles Hermite1822 - 1901

Andre – LouisCholesky1875 - 1918

134

SOLO

Additive Gaussian Nonlinear Filter (continue – 8)( )

( ) kkk

kkk

vxhz

wxfx

+=+= −− 11


Gauss – Hermite Quadrature Approximation (continue – 1)

( ) ( )∑∫=

− ≈M

iii

z zfwzdzfe1

2

The quadrature points zi and weights wi are defined as follows:

A set of orthonormal Hermite polynomials are generated from the recurrence relationship:( ) ( )

( ) ( ) ( )zHj

jzH

jzzH

zHzH

jjj 11

4/101

11

2

/1,0

−+

−

+−

+=

== π

or in matrix form:

( )( )

( )( )

( )( )

( )( )

( ) Mjj

zH

zH

zH

zH

zH

zH

zH

z jM

e

M

zh

M

J

M

M

zh

M

M

M

,,2,12

:

1

0

0

0

00

00

00

00

00

1

1

0

1

1

2

21

1

1

1

0

==

+

=

−

−

−−

ββ

ββ

βββ

β

( )

( ) ( )zHj

zHj

zHz jjj

jj

11

1

2

1

2 +−

+

++=

ββ

( ) ( ) ( )zHezhJzhz MMMM β+=

135

SOLO

Additive Gaussian Nonlinear Filter (continue –9)


Gauss – Hermite Quadrature Approximation (continue – 2)

( ) ( )∑∫=

− ≈M

iii

z zfwzdzfe1

2

Orthonormal HermitianPolynomials in matrix form:

( ) Mjj

JJ jT

M

M

M

M ,,2,12

:

00

00

00

00

00

1

1

2

21

1

===

=

−

−

β

ββ

βββ

β

( ) ( ) ( )zHezhJzhz MMMM β+=

Let evaluate this equation for the M roots zi for which ( ) MizH iM ,,2,10 ==

( ) ( ) MizhJzhz iMii ,,2,1 ==

From this equation we can see that zi andare the eigenvalues and eigenvectors, respectively, of the symmetric matrix JM.

( ) ( ) ( ) ( )[ ] MizHzHzHzh TiMiii ,,1,,, 110 == −

Because of the symmetry of JM the eigenvectors are orthogonal and can be normalized.

Define: ( ) ( ) MjizHWWzHvM

jijiiij

ij ,,2,1,:&/:

1

0

2 === ∑−

=

We have:( ) ( ) ( ) ( ) li

li

li

li

M

j l

lj

i

ijM

j

lj

ij zhzh

WWW

zH

W

zHvv δ=⋅==

≠

−

=

−

=∑∑

0

1

0

1

0

1:

Table of Content

136

Unscented Kalman FilterSOLO

When the state transition and observation models – that is, the predict and update functions f and h (see above) – are highly non-linear, the Extended Kalman Filter can give particularly poor performance [JU97]. This is because only the mean is propagated through the non-linearity. The Unscented Kalman Filter (UKF) [JU97] uses a deterministic sampling technique known as the to pick a minimal set of sample points (called “sigma points”) around the mean. These “sigma points” are then propagated through the non-linear functions and the covariance of the estimate is then recovered. The result is a filter which more accurately captures the true mean and covariance. (This can be verified using Monte Carlo sampling or through a Taylor series expansion of the posterior statistics.) In addition, this technique removes the requirement to analytically calculate Jacobians, which for complex functions can be a difficult task in itself.

( ) 111,,1 −−− +−= kkkk wuxkfx

( ) kkk xkhz ν+= ,State vector dynamics

Measurements


xxx =−= &:

( ) ( ) ( ) ( ) ( ) ( ) lkT


0

&: δ=−=

( ) ( ) lklekeE Tvw ,0 ∀=

=≠

=lk

lklk 1

0,δ

The Unscented Algorithm using ( ) ( ) ( ) ( ) ( ) ( )kPkekeEkxEkxke xT

xxx =−= &:

determines ( ) ( ) ( ) ( ) ( ) ( )kPkekeEkzEkzke zT

zzz =−= &:

137


( ) ( )[ ]

( )n

n

j jj

nx

nx

nx

x

xxx

fxn

xxf

∂

∂=∇⋅

∇⋅=+

∑

∑

=

∞

=

1

0ˆ

:

!

1ˆ

δδ

δδ

Develop the nonlinear function f in a Taylor series around

x

Define also the operator ( )[ ] ( )xfx

xfxfD

nn

j jjx

nx

nx

x

∂

∂=∇⋅= ∑=1

: δδδ

Propagating Means and Covariances Through Nonlinear Transformations

Consider a nonlinear function .( )xfy =

Let compute

Assume is a random variable with a probability density function pX (x) (known orunknown) with mean and covariance

x ( ) ( ) Txx xxxxEPxEx ˆˆ,ˆ −−==

( )

( )[ ] ∑ ∑∑

∑∞

= =

∞

=

∞

=

∂

∂=∇⋅=

=+=

0ˆ

10ˆ

0

!

1

!

1

!

1ˆˆ

nx

nn

j jj

nx

nx

n

nx

fx

xEn

fxEn

DEn

xxfEy

x

δδ

δ δ

( ) ( ) xxTT PxxxxExxE

xxExE

xxx

=−−=

=−=+=

ˆˆ

0ˆ

ˆ

δδ

δδ

138



Consider a nonlinear function .(continue – 1)

( )xfy = ( ) ( ) xxTT PxxxxExxE

xxExE

xxx

=−−=

=−=+=

ˆˆ

0ˆ

ˆ

δδ

δδ

( ) ( )

+

∂

∂+

∂

∂+

∂

∂+

∂

∂+=

∂

∂=+=

∑∑∑

∑∑ ∑

===

=

∞

= =

x

n

j jjx

n

j jjx

n

j jj

x

n

j jj

nx

nn

j jj

fx

xEfx

xEfx

xE

fx

xExffx

xEn

xxfEy

xxx

xx

ˆ

4

1ˆ

3

1ˆ

2

1

ˆ10

ˆ1

!4

1

!3

1

!2

1

ˆ!

1ˆˆ

δδδ

δδδ

Since all the differentials of f are computed around the mean (non-random) x

( )[ ] ( )[ ] ( )[ ] ( )[ ]xxxxT

xxxTT

xxxTT

xxx fPfxxEfxxEfxE ˆˆˆˆ2 ∇∇=∇∇=∇∇=∇⋅ δδδδδ

( )[ ] 0

ˆ1

0ˆ1

ˆ0

ˆ =

∂∂=

∂

∂=

∇⋅=∇⋅ ∑∑

==x

n

j jj

x

n

j jj

x

xxx fx

xEfx

xEfxEfxExx

δδδδ

( ) [ ] ( ) ( )[ ] [ ] [ ] +++∇∇+==+= ∑∞

=xxxxxx

xxTx

nx

nx fDEfDEfPxffDE

nxxfEy ˆ

4ˆ

3ˆ

0ˆ

!4

1

!3

1

!2

1ˆ

!

1ˆˆ δδδδ

139

Simon J. Julier



Consider a nonlinear function .(continue - 2)

( )xfy = ( ) ( ) xxTT PxxxxExxE

xxExE

xxx

=−−=

=−=+=

ˆˆ

0ˆ

ˆ

δδ

δδ

Unscented Transformation (UT), proposed by Julier and Uhlmannuses a set of “sigma points” to provide an approximation ofthe probabilistic properties through the nonlinear function

Jeffrey K. Uhlman

A set of “sigma points” S consists of p+1 vectors and their associatedweights S = i=0,1,..,p: x(i) , W(i) . (1) Compute the transformation of the “sigma points” through the nonlinear transformation f:

( ) ( )( ) pixfy ii ,,1,0 ==(2) Compute the approximation of the mean: ( ) ( )∑

=

≈p

i

ii yWy0

ˆ

The estimation is unbiased if:( ) ( ) ( ) ( ) ( ) yWyyEWyWE

p

i

ip

i y

iip

i

ii ˆˆ00 ˆ0

===

∑∑∑

===

( ) 10

=∑=

p

i

iW

(3) The approximation of output covariance is given by

( ) ( )( ) ( )( )∑=

−−≈p

i

Tiiiyy yyyyWP0

ˆˆ

140



Consider a nonlinear function (continue – 3)( )xfy =

One set of points that satisfies the above conditions consists of a symmetric set of symmetric p = 2nx points that lie on the covariance contour Pxx:

th

xn

( ) ( )

( )

( )

( ) ( ) ( )( ) ( ) ( )

x

xni

xi

xxxni

i

xxxi

ni

nWW

nWW

PW

nxx

PW

nxx

WWxx

x

x ,,1

2/1

2/1

1ˆ

1ˆ

ˆ

0

0

0

0

000

=

−=

−=

−

−=

−

+=

==

+

+

where is the row or column of the matrix square root of nx Pxx /(1-W0)(the original covariance matrix Pxx multiplied by the number of dimensions of x, nx/(1-W0)). This implies:

( )( )i

xxx WPn 01/ −

xxxn

i

T

i

xxx

i

xxx PW

nP

W

nP

W

nx

01 00 111 −=

−

−∑

=

Unscented Transformation (UT) (continue – 1)

141



Consider a nonlinear function (continue – 3)( )xfy =

Unscented Transformation (UT) (continue – 2)

( ) ( )( )( )

( )

( )

+=

=

=

==

∑

∑∞

=−

∞

=

0

0

2,,1ˆ!

1

,,1ˆ!

1

0ˆ

nxx

nx

nx

nx

ii

nnixfDn

nixfDn

ixf

xfy

i

i

δ

δ1

2

Unscented Algorithm:

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )∑∑

∑

∑ ∑∑ ∑∑

==

=

=

∞

=−

=

∞

==

++−+−+=

++++−+=

−+−+==

x

ii

x

i

x

iii

x

i

x

i

x

n

ixx

x

n

ix

x

n

ixxx

x

n

i n

nx

x

n

i n

nx

x

n

i

iiUT

xfDxfDn

WxfD

n

Wxf

xfDxfDxfDxfn

WxfW

xfDnn

WxfD

nn

WxfWyWy

1

640

1

20

1

64200

1 0

0

1 0

00

2

0

ˆ!6

1ˆ

!4

11ˆ

2

11ˆ

ˆ!6

1ˆ

!4

1ˆ

!2

1ˆ

1ˆ

ˆ!

1

2

1ˆ

!

1

2

1ˆˆ

δδδ

δδδ

δδ

( )

i

xxxi

i PW

nxxxx

−

±=±=01

ˆˆ δ

Since ( ) ( )( )( )

−=

∂

∂−= ∑=

−oddnxfD

evennxfDxf

xxxfD

nx

nx

nn

j jij

nx

i

ix

i ˆ

ˆˆˆ

1 δ

δδ δ

142

Unscented Kalman Filter

( ) ( ) ( ) ( )∑=

++−+∇∇+=

x

ii

n

ixx

x

xxTUT xfDxfD

n

WxfPxfy

1

640 ˆ!6

1ˆ

!4

11ˆ

2

1ˆˆ δδ

( )

i

xxxi

i PW

nxxxx

−

±=±=01

ˆˆ δ

SOLO


Consider a nonlinear function (continue – 4)( )xfy =Unscented Transformation (UT) (continue – 3)

Unscented Algorithm:

( ) ( )

( ) ( ) ( )xfPxfPW

n

n

WxfP

W

nP

W

n

n

W

xfPW

nP

W

n

n

WxfD

n

W

xxTxxxT

x

n

i

T

i

xxx

i

xxxT

x

n

i

T

i

xxx

i

xxxT

x

n

ix

x

x

xx

i

ˆ2

1ˆ

12

11ˆ

112

11

ˆ112

11ˆ

2

11

0

0

1 00

0

1 00

0

1

20

∇∇=∇

−

∇−=∇

−

−

∇−=

∇

−

−

∇−=−

∑

∑∑

=

==δ

Finally:

We found

( ) [ ] ( ) ( )[ ] [ ] [ ] +++∇∇+==+= ∑∞

=xxxxxx

xxTx

nx

nx fDEfDEfPxffDE

nxxfEy ˆ

4ˆ

3ˆ

0ˆ

!4

1

!3

1

!2

1ˆ

!

1ˆˆ δδδδ

We can see that the two expressions agree exactly to the third order.

143




Accuracy of the Covariance:

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( )[ ] [ ] [ ]

( ) ( )[ ] [ ] [ ] T

xxxxxxxxT

x

xxxxxxxxT

x

T

m

mxx

n

nxx

TTTyy

fDEfDEfPxf

fDEfDEfPxf

fDm

xfDxffDn

xfDxfE

yyyyEyyyyEP

+++∇∇+⋅

⋅

+++∇∇+−

++

++=

−=−−=

∑∑∞

=

∞

=

ˆ4

ˆ3

ˆ

ˆ4

ˆ3

ˆ

22

!4

1

!3

1

!2

1ˆ

!4

1

!3

1

!2

1ˆ

!

1ˆˆ

!

1ˆˆ

ˆˆˆˆ

δδ

δδ

δδδδ

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( )

+

++

++=

∑∑

∑∑

∞

=

∞

=

∞

=

∞

=

T

m

mx

n

nx

T

n

nx

Tx

T

n

nx

Tx

T

fDm

fDn

E

xfxfDn

ExfxfDExfDn

ExfxfDExfxfxf

22

20

20

!

1

!

1

ˆˆ!

1ˆˆˆ

!

1ˆˆˆˆˆ

δδ

δδδδ

144




Accuracy of the Covariance:( ) ( )

( )[ ] ( )[ ]

( ) ( ) ( )

0

1 1

22

0

1 1

ˆˆ

!2!2

1

!!

1

4

1

ˆˆˆˆ

>

∞

=

∞

=

>

∞

=

∞

=

−

+

∇∇∇∇−=

−=−−=

∑ ∑∑ ∑ji

i j

Tjx

ix

ji

i j

Tjx

ix

T

xxxxT

xxxxxT

xT

xxx

x

TTTyy

fDEfDEji

fDfDji

E

fPfPP

yyyyEyyyyEP

δδδδ

AA

( )[ ] ( )[ ]

( ) ( )

( ) ( ) ( )

0

1 1

2

1

2

1

2~

2~2

2

1

0

1 1

~~

ˆˆ

4!2!2

1

!!

1

2

1

4

1

>

∞

=

∞

= = =

=

>

∞

=

∞

=

+

−

++

∇∇∇∇−=

∑ ∑ ∑ ∑

∑ ∑ ∑

ji

i j

L

k

L

m

Tji

L

k

ji

i j

Tji

T

xxxxT

xxxxxT

xT

xxx

xyy

UT

fDEfDELji

fDfDjiL

fPfPPP

mk

kk

σσ

σσ

λ

λ

AA

145

Uscented Kalman FilterSOLO

146


( ) ( )∑∑ −−==N

Tiiiz

N

ii zzPz2

0

2

0

ψψβψβ

x

xPα

xP

zP

( )f

iβ

iβ

iψ

z

[ ]xxi PxPxx ααχ −+=

Weightedsample mean

Weightedsample

covariance

Table of Content

147


UKF Summary

Initialization of UKF

( ) ( ) TxxxxEPxEx 00000|000 ˆˆˆ −−==

[ ] ( ) ( )

=−−===

R

Q

P

xxxxEPxxExTaaaaaTTaa

00

00

00

ˆˆ00ˆˆ0|0

00000|0000

[ ]TTTTa vwxx =:

For ∞∈ ,,1k

Calculate the Sigma Points ( )( )

λγ

γ

γ +=

=−=

=+=

=

−−−−+

−−

−−−−−−

−−−−

L

LiPxx

LiPxx

xx

ikkkk

Likk

ikkkk

ikk

kkkk

,,1ˆ

,,1ˆ

ˆ

1|11|11|1

1|11|11|1

1|10

1|1

State Prediction and its Covariance

System Definition( ) ( )

==+=

==+−= −−−−−−−

lkkT

lkkkkk

lkkT

lkkkkkk

RvvEvEvxkhz

QwwEwEwuxkfx

,

,1111111

&0,

&0,,1

δ

δ

( ) Liuxkfx ki

kki

kk 2,,1,0,,1 11|11| =−= −−−−

( ) ( ) ( )( ) LiL

WL

WxWx mi

mL

i

ikk

mikk 2,,1

2

1&ˆ 0

2

01|1| =

+=

+== ∑

=−− λλ

λ

0

1

2

( ) ( ) ( ) ( ) ( )( ) LiL

WL

WxxxxWP ci

cL

i

T

kki

kkkki

kkc

ikk 2,,12

1&1ˆˆ 2

0

2

01|1|1|1|1| =

+=+−+

+=−−= ∑

=−−−−− λ

βαλ

λ

148

Uscented Kalman FilterSOLOUKF Summary (continue – 1)

Measure Prediction

( ) Lixkhz ikk

ikk 2,,1,0, 1|1| == −−

( ) ( ) ( )( ) LiL

WL

WzWz mi

mL

i

ikk

mikk 2,,1

2

1&ˆ 0

2

01|1| =

+=

+== ∑

=−− λλ

λ

3

Innovation and its Covariance4

1|ˆ −−= kkkk zzi

( ) ( ) ( ) ( ) ( )( ) LiL

WL

WzzzzWPS ci

cL

i

T

kki

kkkki

kkc

izzkkk 2,,1

2

1&1ˆˆˆˆ 2

0

2

01|1|1|1|1| =

+=+−+

+=−−== ∑

=−−−−− λ

βαλ

λ

Kalman Gain Computations5( ) ( ) ( ) ( ) ( )

( ) LiL

WL

WzzxxWP ci

cL

i

T

kki

kkkki

kkc

ixzkk 2,,1

2

1&1ˆˆ 2

0

2

01|1|1|1|1| =

+=+−+

+=−−= ∑

=−−−−− λ

βαλ

λ

1

1|1|

−−−= zz

kkxzkkk PPK

Update State and its Covariance6kkkkkk iKxx += −1|| ˆˆ

Tkkkkkkk KSKPP −= −1||


149

Unscented Kalman FilterState Estimation (one cycle)


Formation


Association




GatingComputations



SOLO

Simon J. Julier Jeffrey K. Uhlman

150


Table of Content

151

Numerical Integration Using a Monte Carlo ApproximationSOLO

A Monte Carlo Approximation of the Expected Value Integrals uses Discrete Approximation to the Gaussian PDF ( )xxPxx ,ˆ;N

( )xxPxx ,ˆ;N can be approximated by:

( ) ( ) ( ) ( )∑∑==

−=−≈=ss N

i

i

s

N

i

iixx xxN

xxwPxxx11

1,ˆ; δδNp

We can see that for any x we have

( ) ( )∫∑∫∑∞−

≤∞− =

≈=−x

xx

xxi

ix N

i

ii dPxwdxw

i

s

ττττδ ,ˆ;1

N

The weight wi is not the probability of the point xi. The probability density near xi is given by the density of the points in the region around xi, which can be obtained by a normalized histogram of all xi.

Draw Ns samples from , where xi , i = 1,2,…,Ns are a set of supportpoints (random samples of particles) with weights wi = 1/Ns, i=1,2,…,Ns

( )xxPxx ,ˆ;N

Monte Carlo Kalman Filter (MCKF)

152


The Expected Value for any function g (x) can be estimated from:

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∑∑∫ ∑∫===

==−≈=sss N

i

i

s

N

i

iiN

i

iixp xg

NxgwxxwxgxdxpxgxgE

111

1δ

which is the sample mean.

( ) ( )

==+=

==+−= −−−−−−−

lkkT

lkkkkk

lkkT

lkkkkkk

RvvEvEvxkhz

QwwEwEwuxkfx

,

,1111111

&0,

&0,,1

δ

δGiven the System

Assuming that we computed the Mean and Covariance at stage k-1let use the Monte Carlo Approximation to compute the predicted Mean and Covariance at stage k

1|11|1 ,ˆ −−−− kkkk Px

1|1| ,ˆ −− kkkk Px

( ) ( )∑=

−−−− −==−

s

kk

N

ik

ikk

sZxpkkk uxkf

NxEx

111|1|1| ,,1

1ˆ

1:1

( ) ( ) ( ) ( )T

kkkkZxp

TkkZxp

Tkkkkkk

xxkk xxxxExxxxEP

kkkk1|1|||1|1|1| ˆˆˆˆ

1:11:1−−−−− −=−−=

−−

Monte Carlo Kalman Filter (MCKF) (continue – 1)

Draw Ns samples

( ) ( ) skkkkkkki

kk NiPxxZxpx ,,1,ˆ;|~ 1|11|111:111|1 == −−−−−−−−− N~ means Generate

(Draw) samples from a predefined

distribution

153


( ) ( ) ( ) ( )

( )[ ] ( )[ ] ( )

( ) ( ) ( ) ( ) ( )TN

ik

ikk

s

N

ik

ikk

sZxpk

ikk

Tk

ikk

TkkkkZxp

T

kki

kkkki

kk

TkkkkZxp

TkkZxp

Tkkkkkk

xxkk

ss

kk

kk

kkkk

uxkfN

uxkfN

QuxfuxfE

xxwuxkfwuxkfE

xxxxExxxxEP

−

−−+=

−+−+−=

−=−−=

∑∑=

−−−=

−−−−−−−−−

−−−−−−−−−−

−−−−−

−

−

−−

111|1

111|1|11|111|1

1|1||111|1111|1

1|1|||1|1|1|

,,11

,,11

,,

ˆˆ,,1,,1

ˆˆˆˆ

1:1

1:1

1:11:1

( ) ( ) ( ) ( )

−

−−−−+= ∑∑∑

=−−−

=−−−

=−−−−−−−

sss N

ik

ikk

s

N

ik

ikk

s

N

ik

ikk

Tk

ikk

s

xxkk uxkf

Nuxkf

Nuxkfuxkf

NQP

111|1

111|1

111|111|11| ,,1

1,,1

1,,1,,1

1

Using the Monte Carlo Approximation we obtain:

( ) ( )∑=

−− ==−

s

kk

N

i

ikk

sZxpkkk xkh

NzEz

11||1| ,

1ˆ

1:1

( ) ( ) ( ) ( )

−+= ∑∑∑

=−

=−

=−−−

sss N

i

ikk

s

N

i

ikk

s

N

i

ikk

Tikk

s

zzkk xkh

Nxkh

Nxkhxkh

NRP

11|

11|

11|1|1| ,

1,

1,,

1


( ) ( ) skkkkkkki

kk NiPxxZxpx ,,1,ˆ;|~ 1|1|1:11| == −−−− N

Now we approximate the predictive PDF, , as and we draw new Ns (not necessarily the same as before) samples.

( )1:1| −kk Zxp ( )1|1| ,ˆ; −− kkkkk PxxN

154


In the same way we obtain:

( ) ( )

−= ∑∑∑

=−

=−

=−−−

sss N

i

ikk

s

N

i

ikk

s

N

i

ikk

Tikk

s

zxkk xkh

Nx

Nxkhx

NP

11|

11|

11|1|1| ,

11,

1


The Kalman Filter Equations are:

( ) 11|1|

−−−= zz

kkzx

kkk PPK

( )1|1|| ˆˆˆ −− −+= kkkkkkkk zzKxx

Tk

zzkkk

xxkk

xxkk KPKPP 1|1|| −− −=

155

Monte Carlo Kalman Filter (MCKF)SOLOMCKF Summary

( ) ( ) TxxxxEPxEx 00000|000 ˆˆˆ −−==

[ ] ( ) ( )

=−−===

R

Q

P


00

00

00

ˆˆ00ˆˆ0|0

00000|0000

For ∞∈ ,,1k

System Definition:( ) ( ) ( )( ) ( )

=+===+−= −−−−−−

kkkkkk

kkkkkkk

Rvvvxkhz

QwwPxxxwuxkfx

,0;,

,0;&,ˆ;,,1 1110|0000111

N

NN

( ) sk

aikk

aikk Niuxkfx ,,1,,1 11|11| =−= −−−−

∑=

−− =sN

i

aikk

s

akk x

Nx

11|1|

1ˆ

Initialization of MCKF0

State Prediction and its Covariance2

Takk

akk

N

i

Taikk

aikk

s

akk xxxx

NP

s

1|1|1

1|1|1| ˆˆ1

−−=

−−− −= ∑

Assuming for k-1 Gaussian distribution with Mean and Covariance1 akk

akk Px 1|11|1 ,ˆ −−−−

Assuming Gaussian distribution with Mean and Covariance3 1|1| ,ˆ −− kkkk Px

( ) sa

kka

kka

k

aikk NiPxxx ,,1,ˆ;~ 1|11|111|1 =−−−−−−− N

Generate (Draw) Ns samples

( ) sa

kka

kka

kk

ajkk NjPxxx ,,1,ˆ;~ 1|1|1|1| =−−−− N

Generate (Draw) new Ns samples

[ ]TTTTa vwxx =:Augment the state space to include processing and measurement noises.

156

Monte Carlo Kalman Filter (MCKF)SOLOMCKF Summary (continue – 1)

( ) s

ajkk

jkk Njxkhz ,,1, 1|1| == −− ∑

=−− =

sN

j

jkk

skk z

Nz

11|1|

1ˆ

Measure Prediction4

( ) ( )∑=

−−−−− −−==sN

j

T

kkjkkkk

jkk

s

zzkkk zzzz

NPS

11|1|1|1|1| ˆˆ

1

Measurement & InnovationComputation

1|ˆ −−= kkkk zzi7

( ) ( )∑=

−−−−− −−=sa

N

j

T

kkjkk

akk

ajkk

s

zxkk zzxx

NP

11|1|1|1|1| ˆˆ

1

6 Kalman Gain Computations1

1|1|

−−−= zz

kkzx

kka

k PPKa

Kalman Filter8k

ak

akk

akk iKxx += −1|| ˆˆ

Takk

ak

akk

akk KSKPP −= −1||


Predicted Covariances Computations5

157


Formation


Association




GatingComputations



SOLO Monte Carlo Kalman Filter (MCKF)

Table of Content

158

Nonlinear Estimation Using Particle FiltersSOLO

We assumed that p (xk|Z1:k) is a Gaussian PDF. If the true PDF is not Gaussian (multivariate, heavily skewed or non-standard – not represented by any standard PDF) the Gaussian distribution can never described it well.

Non-Additive Non-Gaussian Nonlinear Filter

( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−



Prediction (before measurement)Use Chapman – Kolmogorov Equation to obtain:

( ) ( ) ( )∫ −−−−− = 11:1111:1 ||| kkkkkkk xdZxpxxpZxp

where: ( ) ( ) ( )∫ −−−−−− = 111111 |,|| kkkkkkkk wdxwpwxxpxxp

By assumption ( ) ( )111 | −−− = kkk wpxwp

Since by knowing , is deterministically given by system equation

we have

11 & −− kk wx kx

( ) ( )( ) ( )( )

≠=

=−=−−

−−−−−−

11

111111 ,0

,1,,|

kkk

kkkkkkkkk wxfx

wxfxwxfxwxxp δ

Therefore: ( ) ( )( ) ( )∫ −−−−− −= 11111 ,| kkkkkkk wdwpwxfxxxp δ

159

Nonlinear Estimation Using Particle FiltersSOLO Non-Additive Non-Gaussian Nonlinear Filter

( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−



Prediction (before measurement)

( ) ( ) ( )∫ −−−−− = 11:1111:1 ||| kkkkkkk xdZxpxxpZxp

where:

( ) ( )( ) ( ) ( )

( )

( ) ( )( )

( ) ( )( ) ( )∫ −

−

−

−

=− ===

kkkkk

kkkk

kk

kkkkBayes

bp

apabpbap

kkkkkxdZxpxzp

Zxpxzp

Zzp

ZxpxzpZzxpZxp

1:1

1:1

1:1

1:1

||

1:1:1||

||

|

||,||

( ) ( ) ( )∫= kkkkkkkk vdxvpvxzpxzp |,||

By assumption ( ) ( )kkk vpxvp =|

Since by knowing , is deterministically given by system equationkk vx & kz

( ) ( )( ) ( )( )

≠=

=−=kkk

kkkkkkkkk vxhz

vxhzvxhzvxzp

,0

,1,,| δ

Therefore: ( ) ( )( ) ( )∫ −= kkkkkkk vdvpvxhzxzp ,| δ

( ) ( )( ) ( )∫ −−−−− −= 11111 ,| kkkkkkk wdwpwxfxxxp δ

1

Update (after measurement)2

160


( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−



Prediction (before measurement) ( ) ( ) ( )∫ −−−−− = 11:1111:1 ||| kkkkkkk xdZxpxxpZxp

( ) ( )( ) ( )∫ −−−−− −= 11111 ,| kkkkkkk wdwpwxfxxxp δ

Update (after measurement)

( ) ( )( ) ( ) ( )

( )

( ) ( )( )

( ) ( )( ) ( )∫ −

−

−

−

=− ===

kkkkk

kkkk

kk

kkkkBayes

bp

apabpbap

kkkkkxdZxpxzp

Zxpxzp

Zzp

ZxpxzpZzxpZxp

1:1

1:1

1:1

1:1

||

1:1:1||

||

|

||,||

We need to evaluate the following integrals:

( ) ( )( ) ( )∫ −= kkkkkkk vdvpvxhzxzp ,| δ




( ) ( )( ) S

N

i

ik

ik

ikkk Nwxfxxxp

S

∑=

−−− −≈1

111 /,| δ

( ) ( )( ) S

N

i

ik

ik

ikkk Nvxhzxzp

S

∑=

−≈1

/,| δor

( ) ( ) ( ) S

N

i

ikkkk

ik

ik

ik Nxxxxpwxfx

S

∑=

−−− −≈→=1

111 /|, δ

( ) ( ) ( ) S

N

i

ikkkk

ik

ik

ik Nzzxzpvxhz

S

∑=

−≈→=1

/|, δ

Analytic solutions for those integralequations do not exist in the generalcase.

12

161

SOLO

( ) ( ) ( )( ) ( )kvkkk

xkkwkkkk

vpgivenvxhz

xpuwpgivenwuxfx

:,

,,:,, 011111 0

=

= −−−−−

Monte Carlo Computations of and . ( )kk xzp |( )1| −kk xxp


=For ∞∈ ,,1 k

Initialization0

1 At stage k-1



ik

ik Niwuxfx ,,1,, 111 == −−−

3 Generate (Draw) Measurement Noise ( ) Skvik Nivpv ,,1~ =

k:=k+1 & return to 1

( ) ( )∑=

− −≈SN

iS

ikkkk Nxxxxp

11 /| δ

( ) ( )∑=

−≈SN

iS

ikkkk Nzzxzp

1

/| δ

4 Measurement , Update ( ) Sik

ik

ik Nivxhz ,,1, ==kz

Nonlinear Estimation Using Particle FiltersNon-Additive Non-Gaussian Nonlinear Filter

162


( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−



Prediction (before measurement) ( ) ( ) ( )∫ −−−−− = 11:1111:1 ||| kkkkkkk xdZxpxxpZxp

Update (after measurement)

( ) ( )( ) ( ) ( )

( )

( ) ( )( )

( ) ( )( ) ( )∫ −

−

−

−

=− ===

kkkkk

kkkk

kk

kkkkBayes

bp

apabpbap

kkkkkxdZxpxzp

Zxpxzp

Zzp

ZxpxzpZzxpZxp

1:1

1:1

1:1

1:1

||

1:1:1||

||

|

||,||




( ) ( ) ( ) S

N

i

ikkkk

ik

ik

ik Nxxxxpwxfx

S

∑=

−−− −≈→=1

111 /|, δ

( ) ( ) ( ) S

N

i

ikkkk

ik

ik

ik Nzzxzpvxhz

S

∑=

−≈→=1

/|, δ

( ) ( ) ( ) ( ) ( ) ( )∑∑ ∫∫ ∑==

−−−−=

−−− −=−=−=SSS N

i

ikk

S

N

ikkk

ikk

Sk

N

ikk

ikk

Skk xx

NxdZxpxx

NxdZxpxx

NZxp

111

11:1111

1:111:1

1|

1|

1| δδδ

Since we use NS points to describe the probabilities wecall those points, Particles.

12

Table of Content

163

Nonlinear Estimation Using Particle FiltersSOLO

We assumed that p (xk|Z1:k) is a Gaussian PDF. If the true PDF is not Gaussian (multivariate, heavily skewed or non-standard – not represented by any standard PDF) the Gaussian distribution can never described it well. In such cases approximate Grid-Based Filters and Particle Filters will yield an improvement at the cost of heavy computation demand.

( ) ( )( ) 0

|

|:

:1

:1 >=kk

kkk Zxq

Zxpxw

To overcome this difficulty we use The Principle of Importance Sampling.

Suppose that p (xk|Z1:k) is a PDF from which is difficult to draw samples.

Also suppose that q (xk|Z1:k) is another PDF from which samples can be easily drawn(referred to Importance Density), for example a Gaussian PDF.

Now assume that we can find at each sample the scale factor w (xk) between the two densities:

Using this we can write:

( ) ( ) ( ) ( )( ) ( )

( ) ( )( )( ) ( )

( ) ( ) ( )( ) ( )∫

∫

∫

∫∫

=

==

kkkk

kkkkk

kkkkk

kk

kkkkk

kkk

kkkkZxpk

xdZxqxw

xdZxqxwxg

xdZxqZxqZxp

xdZxqZxq

Zxpxg

xdZxpxgxgEkk

:1

:1

1

:1:1

:1

:1:1

:1

:1|

|

|

|||

||

|

|:1

Non-Additive Non-Gaussian Nonlinear Filter ( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

Importance Sampling (IS)

164

SOLO

( ) ( )( ) ( ) ( )

( ) ( )∫∫=

kkkk

kkkkk

ZxpkxdZxqxw

xdZxqxwxgxgE

kk

:1

:1

||

|:1

( ) ( )( )∑

=

=sN

i

ik

s

iki

k

xwN

xwxw

1

1:~

where

Generate (draw) Ns particle samples xki, i=1,…,Ns from q(xk|Z1:k)

( ) skki

k NiZxqx ,,1|~ :1 =

( ) ( )

( ) ( )( )

( ) ( )∑∑

∑=

=

= =≈s

s

s

kk

N

i

ikkN

i

ik

s

N

i

ikk

sZxpk xwxg

xwN

xwxgN

xgE1

1

1|

~1

1

:1

and estimate g(xk) using a Monte Carlo approximation:

Nonlinear Estimation Using Particle FiltersNon-Additive Non-Gaussian Nonlinear Filter ( )

( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

Importance Sampling (IS)

Table of Content

165

SOLO

It would be useful if the importance density could be generated recursively (sequentially).

( ) ( )( ) ( ) ( ) ( )

( )

( ) ( ) ( )( )

( ) ( ) ( )( )kk

kkkkZzpc

kk

kkkkkk

bP

aPabPbaP

Bayes

kk

kkkk Zxq

Zxpxzpc

Zxq

ZzpZxpxzp

Zxq

Zzxpxw

kk

:1

1:1|/1:

:1

1:11:1

||:1

1:1

|

||

|

|/||

|

,| 1:1−

=−−

=

−−

===

( )( ) ( ) ( )

( ) ( )1:111:11|,

1:11 |,||, −−−−=−− = kkkkkbPbaPbaP

Bayes

kkk ZxpZxxpZxxp

( ) ( ) ( ) ( )∫∫ −−−−−−−−− == 11:111:1111:111:1 |,||,| kkkkkkkkkkkk xdZxpZxxpxdZxxpZxp

Using:

we obtain:

( ) ( ) ( ) ( )∫∫ −−−−−−−−− == 11:111:1111:111:1 |,||,| kkkkkkkkkkkk xdZxqZxxqxdZxxqZxq

In the same way:

( ) ( ) ( )( )

( ) ( ) ( )( ) ( )∫

∫−−−−−

−−−−−− ==11:111:11

11:111:11

:1

1:1

|,|

|,||

|

||

kkkkkk

kkkkkkkk

kk

kkkkk

xdZxqZxxq

xdZxpZxxpxzpc

Zxq

Zxpxzpcxw

Sequential Importance Sampling (SIS)


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

166

SOLO

It would be useful if the importance density could be generated recursively.

( ) ( ) ( )( )

( ) ( ) ( )( ) ( )∫

∫−−−−−

−−−−−− ==11:111:11

11:111:11

:1

1:1

|,|

|,||

|

||

kkkkkk

kkkkkkkk

kk

kkkkk

xdZxqZxxq

xdZxpZxxpxzpc

Zxq

Zxpxzpcxw

Suppose that at k-1 we have Ns particle samples and their probabilities xk-1|k-1

i,wk-1i ,i=1,…,Ns , that constitute a random measure which characterizes the

posterior PDF for time up to tk-1. Then

( ) ( ) ( )∑=

−−−−−−−− −≈sN

i

ikkkk

ikkkk xxZxpZxp

11|111:11|11:11 || δ

( )( ) ( ) ( ) ( )

( ) ( ) ( )∫ ∑

∫ ∑

=−−−−−−−−

−=

−−−−−−−−

−

−=

s

s

N

i

ikkkk

ikkkkk

k

N

i

ikkkk

ikkkkkkk

k

xxZxqZxxq

xdxxZxpZxxpxzpcxw

11|111:11|11:11

11

1|111:11|11:11

|,|

|,||

δ

δ

( ) ( ) ( )∑=

−−−−−−−− −≈sN

i

ikkkk

ikkkk xxZxqZxq

11|111:11|11:11 || δ

Sequential Importance Sampling (SIS) (continue – 1)

We obtained:


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

167

SOLO

( ) ( )( )

( ) ( )( )kk

kkkkBayes

kk

kkk Zxq

Zxpxzpc

Zxq

Zxpxw

:1

1:1

:1

:1

|

||

|

| −==

( )( ) ( ) ( ) ( )

( ) ( ) ( )( ) ( ) ( )

( ) ( )

( ) ( )( ) ( ) ( ) ( ) ( )

( ) ( )1:11|11|1

1:11|11|1|,|

|,|

1:11|11:11|1

1:11|11:11|1

11|111:11|11:11

11

1|111:11|11:11

||

|||

|,|

|,||

|,|

|,||

1|11:11|1

1|11:11|1 −−−−−

−−−−−=

=

−−−−−−

−−−−−−

=−−−−−−−−

−=

−−−−−−−−

−−−−−

−−−−−

=

=

−

−=

∫ ∑

∫ ∑

ki

kki

kkk

ki

kki

kkkkkxxpZxxp

xxqZxxq

ki

kkki

kkk

ki

kkki

kkkkk

N

i

ikkkk

ikkkkk

k

N

i

ikkkk

ikkkkkkk

k

Zxqxxq

Zxpxxpxzpc

ZxqZxxq

ZxpZxxpxzpc

xxZxqZxxq

xdxxZxpZxxpxzpcxw

ikkkk

ikkk

ikkkk

ikkk

s

s

δ

δ

( ) ( )( )1:11

1:111 |

|

−−

−−− =

kk

kkk Zxq

ZxpxwSince

( ) ( )( )i

kkikk

ikk

ikk

ikkki

kik xxq

xxpxzpcww

1|1|

1|1||1 |

||

−−

−−−=

Define ( ) ( )( )k

ikk

kikki

kkik Zxq

Zxpxww

:1|

:1|| |

|: ==

( ) ( )( )1:11|1

1:11|11|11 |

|:

−−−

−−−−−− ==

ki

kk

ki

kkikk

ik Zxq

Zxpxww

Sequential Importance Sampling (SIS) (continue – 2)Nonlinear Estimation Using Particle FiltersNon-Additive Non-Gaussian Nonlinear Filter ( )

( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

168

SOLO


( ) ( )

( )( ) ( ) ( )( )( ) ( )( ) twwwt

Zxxq

xxpxzpww i

kk

N

i

ik

kik

ik

ik

ik

ikk

N

ik

ik /~~

,|

||~~1:11

1

/1

1 ==→= ∑=−

−−

( ) ( )∑=

−− −=≈

N

i

ikk

ikkk NxxNxZxp

1

11:1 /:,| δ

k:=k+1Run This


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

( ) ( )∑=

−=N

i

ikk

ikkk xxwZxp

1:1| δ


=For ∞∈ ,,1 k

Initialization0

1 At stage k-1



ik

ik Niwuxfx ,,1,, 111 == −−−

Start with the approximation ( ) ( )∑=

− −≈SN

iS

ikkkk Nxxxxp

11 /| δ3

After measurement zk we compute ( ) ( ) ( ) ik

ikkk wxZxp ~,| :1 ≈4

Generate (Draw) NS samples ( ) Skwik Nivpv ,,1~ =

Compute ( )ikik

ik vxhz ,=

Approximate ( ) ( )∑=

−=SN

iS

ikk

ikk Nzzxzp

1

/| δ

169

SOLO

The resulting sequential importance sampling (SIS) algorithm is a Monte Carlo methodthat forms the basis for most sequential MC Filters.


This sequential Monte Carlo method is known variously as:

• Bootstrap Filtering

• Condensation Algorithm

• Particle Filtering

• Interacting Particle Approximation

• Survival of the Fittest


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

170

SOLO

Degeneracy Problem


A common problem with SIS particle filter is the degeneracy phenomenon, where aftera few iterations, all but one particle will have negligible weights.

It can be shown that the variance of the importance weights, wki, of the SIS algorithm,

can only increase over time, and that leads to the degeneracy problem. A suitable measureof degeneracy is given by:

( )1

1ˆ1

1

2== ∑

∑ =

=

N

i

ikN

i

ik

eff wwherew

N

To see this let look at the following two cases:

1( )

NN

NNiN

w N

i

effik ==⇒==

∑=1

2/1

1ˆ,,1,1

2( )

11ˆ

0

1

1

2==⇒

≠=

=∑

=

N

i

ik

effik

wN

ji

jiw

Hence, small Neff indicates a severe degeneracy and vice versa.


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

Table of Content

171

SOLO

The Bootstrap (Resampling)

• Popularized by Brad Efron (1979)

• The Bootstrap is a name generically applied to statistical resampling schemes that allow uncertainty in the data to be assesed from the data themselves, in other words

“pulling yourself up by your bootstraps”

The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite-sample guarantees, and has a tendency to be overly optimistic.The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches.

The advantage of bootstrapping over analytical methods is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients.

Sequential Importance Resampling (SIR)


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

Bradley Efron1938

Stanford U.

172

SOLO

Resampling

Sequential Importance Resampling (SIR) (continue – 1)

Whenever a significant degeneracy is observed (i.e., when Neff falls bellow someThreshold Nthr) during the sampling, where we obtained

( ) ( )∑=

−≈N

i

ikk

ikkk xxwZxp

1:1| δ

we need to resample and replace the mapping representation with a random measure

Niwx ik

ik ,,1, =

NiNx ik ,,1/1,* =This is done by first computing the Cumulative Density Function (C.D.F.) of thesampled distribution wk

i.

Initialize the C.D.F.: c1 = wk1

Compute the C.D.F.: ci = ci-1 + wki

For i = 2:N

i := i + 1


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

173

SOLO

Resampling (continue – 1)


Using the method of Inverse Transform Algorithm we generate N independent and identical distributed (i.i.d.) variables from the uniform distribution u, we sort them in ascending order and we compare them with the Cumulative Distribution Function (C.D.F.)of the normalized weights.


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

Non-Additive Non-Gaussian Nonlinear Filter ( )( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

174

SOLO

Resampling Algorithm (continue – 2)Sequential Importance Resampling (SIR) (continue – 3)

Initialize the C.D.F.: c1 = wk1

Compute the C.D.F.: ci = ci-1 + wki

For i = 2:N

i := i + 1

0

Start at the bottom of the C.D.F.: i = 1

Draw for the uniform distribution [ ]1,0~ −NUui

1 For i=1:N

Move along the C.D.F. uj = ui +(j – 1) N-1.

For j=1:N2

WHILE uj > ci

j* = i + 1END WHILE

3

END For

5 i := i + 1 If i < N Return to 1

4 Assign sample: ik

jk xx =*

Assign weight:1−= Nw j

k Assign parent: ii j =


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

175

SOLO

Resampling


( ) ( )

( )( ) ( ) ( )( )( ) ( )( )

twwwt

Zxxq

xxpxzpww

ikk

N

i

ik

kik

ik

ik

ik

ikk

N

ik

ik

/~~

,|

||~~

1

:11

1

/1

1

==

=

∑=

−

−−

After measurement zk-1 wecompute ( ) ( ) ( ) i

kikkk wxZxp ~,| :1 ≈

1

Start with the approximation

( ) ( )∑

=

−−

−=

≈N

i

ikk

ikkk

Nxx

NxZxp

1

11:1

/:

,|

δ

0

Prediction( ) ( ) ( )( )i

kkik

ik nuxfx ,,*1 =+

to obtain ( ) ( ) 11:11 ,| −

++ ≈ NxZxp ikkk

3

k:=k+1 Run This


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

( ) ( )∑=

−=N

i

ikk

ikkk xxwZxp

1:1| δ

If Resample to obtain ( ) ( ) 1

:1 ,*| −≈ NxZxp ikkk

2 ( ) tht

N

i

ikeff NwN <

= ∑

=1

2/1

176

SOLO

Resampling


Although the resampling step reduces the effect of degeneracy, it introduces otherpractical problems:

It limits the possibility of parallel implementation.

The particles that have high wki are statistically selected many times. This leads to

loss of diversity among the particles (sample impoverishment).

1

2

Several other techniques for generating samples from an unknown P.D.F., besideImportance Sampling, have been presented in the literature. If the P.D.F. is stationary,Markov Chain Monte Carlo (MCMC) methods have been proposed:• Metropolis – Hastings (MH)• Gibbs sampler (a special case of MH) (see Probability Presentation)


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

177

SOLO

Selection of Importance Density


The choice of the Importance Density q (xk|xk-1,zk) is one of the most critical issues inthe design of the Particle Filter.

The Optimal Choice

The Optimal Importance Density q (xk|xk-1,zk), that minimizes the variance of importanceweights, conditioned upon xk-1

i and zk has been shown to be:

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( )i

kk

ikk

ikkk

aabbba

kikkoptk

ikk xzp

xxpxxzpzxxpzxxq

1

11Pr|PrPr|Pr

11 |

|,|,|,|

−

−−=

−− ==

( ) ( )( )( ) ( ) ( )( )( ) ( )( )k

ik

ik

ik

ik

ikki

ki

k zxxq

xxpxzpww

,|

||

1

11

−

−−= Substitution of this into:

we obtain: ( ) ( ) ( )( )ikk

ik

ik xzpww 11 | −−=

From this equation we can see that the importance weights at time k can be computed(if necessary resampling can be performed) before the particles are propagate to time k.

In order to use optimal importance function we must: sample from p (xk|xk-1,zk).1

evaluate:2 ( ) ( ) ( )∫ −− = kikkkk

ikk xdxxpxzpxzp 11 |||

In the general case either of these two tasks can be difficult.


( )kkk

kkk

vxhz

wxfx

,

, 11

== −−

178

Sequential Importance Resampling Particle Filter (SIRPF)SOLOSIRPF Summary

Initialization of SIRPF

( )00 0~ˆ xpx x

For ∞∈ ,,1 k

Assuming for k-1 Gaussian distribution with Mean and Covariance

( ) skkkkki

kk NiPxxx ,,1,ˆ;ˆ 1|11|111|1 == −−−−−−− N


System Definition( ) ( ) ( )( ) ( )

=

−= −−−

kvkkkk

kwkxkkkk

vpvvxkhz

wpwxpxwuxkfx

~,,

~,~,,,1 00111 0

( ) ski

kkikk Niuxkfx ,,1,,1 11|11| =−= −−−−

0

1

2

1|11|1 ,ˆ −−−− kkkk Px

Generate Ns samples

Assuming Gaussian distribution with Mean and Covariance

( ) skkkkkkjkk NjPxxx ,,1,ˆ; 1|1|1|1| == −−−− N

3 1|1| ,ˆ −− kkkk Px

Generate new Ns samples

Draw

Table of Content

179

Monte Carlo Particle Filter (MCPF)SOLOMCPF Summary

Initialization of MCPF

( ) ( ) TxxxxEPxEx 00000|000 ˆˆˆ −−==

[ ] ( ) ( )

=−−===

R

Q

P


00

00

00

ˆˆ00ˆˆ0|0

00000|0000

[ ]TTTTa vwxx =:

For ∞∈ ,,1 k

Assuming for k-1 Gaussian distribution with Mean and Covariance

( ) skkkkki

kk NiPxxx ,,1,ˆ;ˆ 1|11|111|1 == −−−−−−− N


System Definition( ) ( )

==+=

==+−= −−−−−−−

lkkTlkkkkk

lkkT

lkkkkkk

RvvEvEvxkhz

QwwEwEwuxkfx

,

,1111111

&0,

&0,,1

δ

δ

( ) ski

kkikk Niuxkfx ,,1,,1 11|11| =−= −−−−

∑=

−− =sN

i

ikk

skk x

Nx

11|1|

1ˆ

0

1

2

Tkkkk

N

i

Tikk

ikk

skk xxxx

NP

s

1|1|1

1|1|1| ˆˆ1

−−=

−−− −= ∑

1|11|1 ,ˆ −−−− kkkk Px

Generate Ns samples

Assuming Gaussian distribution with Mean and Covariance

( ) skkkkkkjkk NjPxxx ,,1,ˆ; 1|1|1|1| == −−−− N

3 1|1| ,ˆ −− kkkk Px


180

Monte Carlo Particle Filter (MCPF)SOLOMCPF Summary (continue – 1)

Measure Prediction

( ) sjkk

jkk Njxkhz ,,1, 1|1| == −− ∑

=−− =

sN

j

jkk

skk z

Nz

11|1|

1ˆ

4

Innovation and its Covariance

6

1|ˆ −−= kkkk zzi

( ) ( )∑=

−−−−− −−==sN

j

T

kkjkkkk

jkk

s

zzkkk zzzz

NPS

11|1|1|1|1| ˆˆ

1

Kalman Gain Computations

7

( ) ( )∑=

−−−−− −−=sN

j

T

kkjkkkk

jkk

s

xzkk zzxx

NP

11|1|1|1|1| ˆˆ

1

1

1|1|

−−−= zz

kkxzkkk PPK

Kalman Filter8kkkk

xkk iKx += −1|| ˆµ T

kkkkkxxkk KSKP −=Σ −1||


Predicted Covariances Computations5

Importance Sampling using Gaussian Mean and Covariance

( ) sxxkk

xkkk

mkk Nmxx ,,1,; ||| =Σ= µN

9 xxkk

xkk 1|1| , −− Σµ


Weight Update10 ( ) ( )( ) sxx

kkxkk

mkk

kkkkmkk

mkkkm

k Nmx

Pxxxzpw ,,1

,;

,ˆ;|~

1|1||

1|1||| =Σ

=−−

−−

µN

Ns

N

l

lk

mk

mk Nmwww

s

,,1~/~1

== ∑=

Update State and its Covariance11 ∑=

=sN

m

mkk

mk

skk xw

Nx

1||

1ˆ ( ) ( )∑

=

−−=sN

m

T

kkmkkkk

mkk

skk xxxx

NP

1||||| ˆˆ

1

181


Formation


Association




GatingComputations



SOLO Monte Carlo Particle Filter (MCPF)

Table of Content

182

Estimators

vxHz +=

SOLO

Maximum Likelihood Estimate (MLE)

For the particular vector measurement equation

where the measurement noise, is gaussian (normal), with zero mean:

v

H zx

( )RNv ,0~

( ) ( )( )xp

zxpxzp

x

zxxz

,| ,

| =

and independent of , the conditional probability can be written, using Bayes rule as:

x ( )xzp xz ||

( )

−

−

==−=

1

111

1111

1

1

,

nxpp

nx

pxnxpxnpxpx

xHz

xHz

zxfxHzv

xn

xn

( ) ( ) 2/1

,, /,, T

vxzx JJvxpzxp =

The measurement noise can be related to and by the function:v zx

pxp

p

pp

p

I

z

f

z

f

z

f

z

f

z

fJ =

∂∂

∂∂

∂∂

∂∂

=

∂∂=

1

1

1

1


v

Since the measurement noise is independent of :xv

zThe joint probability of and is given by:x

183

EstimatorsSOLO

Maximum Likelihood Estimate (continue – 1)

v

H zx


x

v

( )vxp vx ,,

( ) ( )

( )( ) ( )

−−−=

−=

− xHzRxHzR

xHzpxzp

T

p

vxz

12/12/

|

2

1exp

2

1

|

π

( ) ( ) ( )[ ] ( )RWWLSxHzRxHzxzp T

xxz

x⇒−−⇔ −1

| min|max

( ) ( )[ ] ( ) 02 11 =−−=−−∂∂ −− xHzRHxHzRxHzx

TT

0*11 =− −− xHRHzRH TT ( ) zRHHRHxx TT 111*: −−−==

( ) ( )[ ] HRHxHzRxHzx

TT 11

2

2

2 −− =−−∂∂ this is a positive definite matrix, therefore

the solution minimizesand maximizes

( ) ( )[ ]xHzRxHz T −− −1

( )xzp xz ||

( ) ( )( ) ( )

( )

−=== − vRv

Rvp

xp

zxpxzp T

pvx

zxxz

12/12/

/| 2

1exp

2

1,|

π

gaussian (normal), with zero mean

( ) ( )xzpxzL xz |:, |= is called the Likelihood Function and is a measureof how likely is the parameter given the observation .x z

Table of Content

184

EstimatorsSOLO

Bayesian Maximum Likelihood Estimate (Maximum Aposteriori– MAP Estimate)

v

H zxvxHz +=Consider a gaussian vector , where ,measurement, , where the gaussian noiseis independent of and .( )RNv ,0~

vx ( ) ( )[ ]−− PxNx ,~

x

( )( ) ( )

( )( ) ( ) ( )( )

−−−−−−

−= − xxPxx

Pxp T

nx

1

2/12/ 2

1exp

2

1

π

( ) ( )( )

( ) ( )

−−−=−= − xHzRxHz

RxHzpxzp T

pvxz1

2/12/| 2

1exp

2

1|

π

( ) ( ) ( ) ( )∫∫+∞

∞−

+∞

∞−

== xdxpxzpxdzxpzp xxzzxz |, |,

is gaussian with( )zp z ( ) ( ) ( ) ( ) ( )−=+=+= xHvExEHvxHEzE

0

( ) ( )[ ] ( )[ ] ( )[ ] ( )[ ] ( )( )[ ] ( )( )[ ] ( )[ ] ( )[ ] ( )[ ] ( )[ ] ( ) RHPHvvEHxxvEvxxEH

HxxxxEHvxxHvxxHE

xHvxHxHvxHEzEzzEzEz

TTTTT

TTT

TT

+−=+−−−−−−

−−−−=+−−+−−=

−−+−−+=−−=

00

cov

( )( ) ( )

( )[ ] ( )[ ] ( )[ ]

−−+−−−−

+−= −

xHzRHPHxHzRHPH

zp TT

Tpz ˆˆ2

1exp

2

1 1

2/12/π

185

EstimatorsSOLO

Bayesian Maximum Likelihood Estimate (Maximum Aposteriori Estimate) (continue – 1)

v

H zxvxHz +=Consider a gaussian vector , where ,measurement, , where the gaussian noiseis independent of and .( )RNv ,0~

vx ( ) ( )[ ]−− PxNx ,~

x

( )( ) ( )

( )( ) ( ) ( )( )

−−−−−−

−= − xxPxx

Pxp T

nx

1

2/12/ 2

1exp

2

1

π

( ) ( )( )

( ) ( )

−−−=−= − xHzRxHz

RxHzpxzp T

pvxz1

2/12/| 2

1exp

2

1|

π

( )( ) ( )

( )[ ] ( )[ ] ( )[ ]

−−+−−−−

+−= −

xHzRHPHxHzRHPH

zp TT

Tpz ˆˆ2

1exp

2

1 1

2/12/π

( ) ( ) ( )( ) ( ) ( )

( )

( ) ( ) ( )( ) ( ) ( )( ) ( )[ ] ( )[ ] ( )[ ]

−−+−−−+−−−−−−−−−⋅

+−

−==

−−− xHzRHPHxHzxxPxxxHzRxHz

RHPH

RPzp

xpxzpzxp

TTTT

T

nz

xxzzx

ˆˆ2

1

2

1

2

1exp

2

1||

111

2/1

2/12/12/

||

π

from which

186

EstimatorsSOLO


( ) ( ) ( )( ) ( ) ( )( ) ( )( ) ( )[ ] ( )( )−−+−−−−−−−−−+−−−−− xHzRHPHxHzxxPxxxHzRxHz TTTT 111

( ) ( )( )[ ] ( ) ( )( )[ ] ( )( ) ( ) ( )( )( )( ) ( )[ ] ( )( ) ( )( ) ( )[ ] ( )( )

( )( ) ( )( ) ( )( ) ( )( ) ( )( ) ( )[ ] ( )( )−−+−−−+−−−−−−−−−−

−−+−−−−=−−+−−−−

−−−−−+−−−−−−−−−−=

−−−−

−−−

−−

xxHRHPxxxxHRxHzxHzRHxx

xHzRHPHRxHzxHzRHPHxHz

xxPxxxxHxHzRxxHxHz

TTTTT

TTTT

TT

1111

111

11

( )( ) ( )( ) ( )( ) ( ) ( )( ) ( )( ) ( )[ ] ( )( )−−+−−−−−−−−−+−−−−−−− xHzRHPHxHzxxPxxxHzRxHz TTTT 111

( )[ ] ( )[ ] 11111111 −−−−−−−− −++/−/=+−− RHPHRHHRRRRHPHR TTTwe have

then

Define: ( ) ( )[ ] 111:−−− +−=+ HRHPP T

( )( ) ( ) ( )[ ] ( ) ( )( )( )( ) ( ) ( )[ ] ( )( ) ( )( ) ( ) ( )[ ] ( )( )( )( ) ( )[ ] ( )( )−−+−−−+

−−++−−−−−++−−−

−−+++−−=

−−

−−−−

−−−

xxHRHPxx

xxPPHRxHzxHzRHPPxx

xHzRHPPPHRxHz

TT

TTT

TT

11

1111

111

( ) ( )( )[ ] ( ) ( ) ( )( )[ ]−−+−−+−−+−−= −−− xHzRHxxPxHzRHxx TTT 111

( )( ) ( )

( ) ( ) ( )( )[ ] ( ) ( ) ( ) ( )( )[ ]

−−+−−−+−−+−−−−⋅

+= −−− xHzRHPxxPxHzRHPxx

Pzxp TTT

nzx

1112/12/| 2

1exp

2

1|

π

187

EstimatorsSOLO


then

where: ( ) ( )[ ] 111:−−− +−=+ HRHPP T

( )( ) ( )

( ) ( ) ( )[ ] ( ) ( ) ( ) ( )[ ]

−+−−−+−+−−−−⋅

+= −−− xHzRHPxxPxHzRHPxx

Pzxp TTT

nzx111

2/12/| 2

1exp

2

1|

π

( )zxp zxx

|max | ( ) ( ) ( ) ( )( )−−++−==+ − xHzRHPxxx T 1*:

Table of Content

SOLO

( ) ( ) ( ) [ ]fttttwddttxftxd ,, 0∈+=A continuous dynamic system is described by:

Nonlinear Filters

( )tx - n- dimensional state vector

( )twd - n- dimensional process noise vector described by the covariance matrix Q

- the probability of the state at time tx

The time evolution of the probability density function is described by the Fokker–Planck equation:


Fred Daum from Raytheon Company leads methods to design NonlinearFilters starting from Fokker-Planck Equation.

( ) ( ) 0:ˆ == twdEtwd

( ) ( )[ ] ( ) ( )[ ] ( ) ( )τδ −=−− ttQtwdtwdtwdtwdE Tˆˆ

Return to StochasticProcesses

( ) ( )[ ] ( )[ ] ( ) ( )[ ]( ) ( ) ( ) ( )[ ]

∂

∂∂∂+

∂∂

−=∂

∂x

txptQ

xx

txpttxf

t

txp txtxtx

2

1,

( )[ ]txp

Fred Daum

( )[ ] ( ) ( )[ ]( ) ( )[ ] ( ) ( )[ ]( )∑ = ∂

∂=

∂∂ n

ii

txitx

x

txpttxf

x

txpttxf1

,,( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] T

n

txtxtxtx

x

txp

x

txp

x

txp

x

txp

∂

∂∂

∂∂

∂=

∂∂

,,,21

SOLO

Assuming system measurements at discrete time tk given by:

( ) ( )( ) [ ]fkkkkk tttvttxhtz ,,, 0∈=

kv - m- dimensional measurement noise vector at tk

We are interested in the probability of the state at time t given the set of discrete measurements until (included) time tk < t.

x

( )kZtxp |,

kk zzzZ ,,, 21 = - set of all measurements up to and including time tk.

Bayes’ Rule:

Nonlinear Filters

( ) ( ) ( )( )

( ) ( )( )1

1

||

1 |

,||,,|,

−

−

=− =

kk

kkkkBayes

bp

apabpbap

Z

kkk Zzp

txzpZtxpZztxp

k

( )1|, −kk Ztxp probability of at time tk given Zk-1 (apriori – before measurement zk)x

(aposteriori – after measurement zk)

probability o f at time tk given Zk x

( )kk txzp ,| probability of measurement given the state at time tk. (likelihood of measurement)

kz x

( )1| −kk Zzp probability of measurement given Zk-1 (apriori – before measurement zk)(normalization of conditional probability)

kz


SOLO

In the Classical Particle Filter solution the particle are drawn using the apriori density that decide their distribution (see Figure). After measurement the Likelihood ofMeasurement is obtained and nothingwill prevent a low density of particlesdrawn before in the Likelihood region.This is the Particle Degeneracy, thatproduce the curse of dimensionality.

Nonlinear Filters

Fred Daum


prior density

Particles to represent prior density

Liklehood of measurement

Particle DegeneracyCause of Curse of Dimensionality

The Particle Filter solutions have implementations problems.

The Number of Particles necessary to reduce the Filter Error increase with System Dimension. Daum gives the Filter Error as function of Number of Particles for System Dimension as Parameter.

http://sc.enseeiht.fr/doc/Seminar_Daum_2012_2.pdf

SOLO

By taking natural logarithm of the conditional probability, we get in the right side a sum of logarithms

Nonlinear Filters

Fred Daum( ) ( ) ( ) ( )

ionnormalizat

kk

likelihood

kk

aprior

kk

aposterior

kk ZzptxzpZtxpZtxp 11 |ln,|ln|,ln|,ln −− −+=

The homotopy

( ) ( ) ( ) ( )

ionnormalizatlikelihoodaprioraposterior

Kxhxgxp λλλ lnlnln,ln −+=p.d.f.

p.d.f.Flow of Density

particles particles

Flow of Particles

Sample from Density

Sample from Density

aprioriaposteriori

Induced Flow of Particles for Bayes¶Rule

Since p (x,λ) is the p.d.f. associated to a system defined by f (x,λ) we have the Fokker-Plank Equation:

( ) ( ) ( )( ) ( ) ( )

∂

∂∂∂+

∂∂−=

∂∂

x

xpxQ

xx

xpxfxp λλλλλ

λ ,,

2

1,,,

To obtain the aposteriori probability p (x,tk|Zk) from the apriori probability p (x,tk|Zk-1) and the likelihood p (zk|x,tk), Daum uses a homotopy procedure (see next slide) by choosing a homotopy continuous parameter λ ϵ [0,1]. He will search for a function (not related to the filtered system) that describes the flow of the particles and is associated to p (x,tk|Zk) .

( )λ,xf


( )λ,xQ - Noise Spectrum to be defined

Here we describe Daum proposed methods called Particle Flow Filters

( ) ( )λ

λλλ d

wdxQxf

d

xd,, += Particle Flow Equation

01/13/15 192

HomotopyIn topology, two continuous functions from one topological space to another are called homotopic (Greek μός (homós) = same, similarὁ , and τόπος (tópos) = place) if one can be "continuously deformed" into the other, such a deformation being called a homotopy between the two functions. An outstanding use of homotopy is the definition of homotopy groups and cohomotopy groups, important invariants in algebraic topology.

A Homotopy of a Coffe Cup into a doughnut

Formally, a homotopy between two continuous functions f and g from a topological space X to a topological space Y is defined to be a continuous function H : X × [0,1] → Y from the product of the space X with the unit interval [0,1] to Y such that, if x X then H(x,0) = f(x) and H(x,1) = ∈g(x).If we think of the second parameter of H as time then H describes a continuous deformation of f into g: at time 0 we have the function f and at time 1 we have the function g.An alternative notation is to say that a homotopy between two continuous functions f, g : X → Y is a family of continuous functions ht : X → Y for t ∈ [0,1] such that h0 = f and h1 = g, and the map t h↦ t is continuous from [0,1] to the space of all continuous functions X → Y. The two versions coincide by setting ht(x) = H(x,t).

Formal definition

SOLO

SOLO Nonlinear Filters

Fred Daum

( ) ( ) ( )( ) ( ) ( )

∂

∂∂∂+

∂∂−=

∂∂

x

xpxQ

xx

xpxfxp λλλλλ

λ ,,

2

1,,,Fokker-Plank Equation

( ) ( ) ( ) ( ) ( )( ) ( ) ( )

∂

∂∂∂+

∂∂−=

−

x

xpxQ

xx

xpxfxp

d

Kdxh

λλλλλλ

λ ,,

2

1,,,

lnln

Partial Differential Equation for f given p

( ) ( ) ( ) ( )( ) ( ) ( )

∂

∂∂∂+

∂∂−=

∂∂

x

xpxQ

xx

xpxfxp

xp λλλλλλ

λ ,,

2

1,,,

,ln

( ) ( ) ( ) ( )λλλ Kxhxgxp lnlnln,ln −+= Definition of p (x,λ)

We have:

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

∂

∂∂∂+

∂∂−

∂∂

−=

−

x

xpxQ

xxf

x

xp

x

xfxpxp

d

Kdxh

λλλλλλλ

λλ ,

,2

1,

,,,,

lnln

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

∂

∂∂∂+

∂∂−

∂∂−=

−

x

xpxQ

xxpxf

x

xp

x

xf

d

Kdxh

λλλ

λλλλ

λ ,,

,2

1,

,ln,lnln



Fred Daum

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

∂

∂∂∂+

∂∂−

∂∂−=

−

x

xpxQ

xxpxf

x

xp

x

xf

d

Kdxh

λλλ

λλλλ

λ ,,

,2

1,

,ln,lnln

Differentiate this Equation as function of x

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

∂

∂∂∂

∂∂+

∂∂∂−

∂

∂∂∂−

∂∂−=

∂∂ λλλλλλλλ ,/

,,

2

1,,ln,,ln,

ln2

2

xpx

xpxQ

xxx

xf

x

xp

x

xf

xx

xpxf

x

xh T

One option to simplify the problem is to choose such that:( )λ,xQ

( ) ( ) ( ) ( ) ( ) ( ) 0,/,

,2

1,,ln, =

∂

∂∂∂

∂∂+

∂∂∂−

∂

∂∂∂− λλλλλλ

xpx

xpxQ

xxx

xf

x

xp

x

xf

x

We obtain ( ) ( ) ( )2

2 ,ln,

ln

x

xpxf

x

xh T

∂∂−=

∂∂ λλ ( ) ( ) ( ) T

x

xh

x

xpxf

∂

∂

∂

∂−=−

ln,ln,

1

2

2 λλ



Fred Daum

Second option to simplify the problem is to choose ( ) 0, =λxQ


( )λλ

,xfd

xd =

( ) ( ) ( )( )x

xpxfxp

∂∂−=

∂∂ λλ

λλ ,,,

Fokker-Plank Equation

Particle Flow Equation

( ) ( ) ( ) ( )λλλ Kxhxgxp lnlnln,ln −+= Definition of p (x,λ)

Define ( ) ( ) ( ) ( )

known

xpd

Kdxhx λ

λλλη ,

lnln:,

−−=

We obtain ( ) ( )[ ] ( )ληλλ ,,, xxpxfx

=∂∂ P.D.E. for f given p

( ) ( ) ( ) ( )( )x

xpxfxp

xp

∂∂−=

∂∂ λλλ

λλ ,,

,,ln

( ) ( ) ( ) ( ) ( )[ ]λλλλ

λ,,,

lnln xpxf

xxp

d

Kdxh

∂∂−=

− P.D.E. for f given p

λdd


Fred Daum

Second option to simplify the problem is to choose ( ) 0, =λxQ


We obtain ( ) ( )[ ]( )

( )ληλλλ

,,,,

xxpxfx

xq

=∂∂

q = p ff = unknown functionp & η known at random points( ) ( ) ( ) ( )λ

λλλη ,

lnln:, xp

d

Kdxhx

−−=

We have ( )λη ,2

2

1

1 xx

q

x

q

x

q

x

q

d

d =∂∂++

∂∂+

∂∂=

∂∂

1. Linear PDE in unknown f or q.2. Constant coefficient PDE in q.3. First Order PDE.4. Highly undetermined PDE.5. Same as the Gauss divergence law in Maxwell Equations.6. Same as Euler’s Equation in Fluid Dynamics.7. Existence of solutions if and only if integral of η is zero.

Exact Flow Solutions for g & h Gaussian Densities:( ) ( ) ( )( ) [ ]

( ) ( ) ( )[ ]xAzRHPAIAIb

HRHPHHPA

bxAxf

T

TT

+++=

+−=

+=

−

−

1

1

2:

2

1:

,

λλλ

λλ

λλλ Automatically stable under very mild conditions & extremely fast

Fred Daum


F. Daum, J. Huang, Particle Flow for Nonlinear Filters, Bayesian Decision and Transport, 7 April 2014


Fred Daum

Table of Content

199


References:

SOLO

1. Sage, A.P., & Melsa, J.L., “Estimation Theory with Applications to Communications and Control”, McGraw Hill, 19712. Gordon, N.J., Salmond, D.J., Smith, A.M.F., “Novel Approach to

Nonlinear/Non- Gaussian Bayesian State Estimation”, IEE Proceedings Radar and Signal Processing, vol. 140, No. 2, April 1993, pp. 107 - 113

7. Haug, A.J., “A Tutorial on Bayesian Estimation and Tracking Techniques Applicable to Nonlinear and Non-Gaussian Processes”, MITRE Corporation,

January 2005

5. Arulampalam,S., Maskell,S., Gordon,N., Clapp,T., “A Tutorial on Particle Filters for On-line Non-linear/Non-Gaussian Bayesian Tracking”, IEEE

Transactions on Signal Processing, Vol. 50, No. 2, February 2002

6. Ristic,B., Arulampalam,S., Gordon,N., “Beyond the Kalman Filter Particle Filters for Tracking Applications”, Artech House, 2004

4. Karlsson, R., “Simulation Based Methods for Target Tracking”, Department of Electrical Engineering Linköpings Universitet, 2002

3. Doucet,A., de Freitas,N., Gordon,N., Ed. “Sequential Monte Carlo Methods in Practice”, Springer, 2001

200


References (continue – 1):

SOLO

Fred Daum, “Particle Flow for Nonlinear Filters”, 19 July 2012

http://sc.enseeiht.fr/doc/Seminar_Daum_2012_2.pdf

https://www.ll.mit.edu/asap/asap_06/pdf/Papers/23_Daum_Pa.pdf

Fred Daum, Misha Krichman, “Non-Particle Filters”,

F. Daum, J. Huang, Particle Flow for Nonlinear Filters, Bayesian Decision and Transport, 7 April 2014

http://meeting.xidian.edu.cn/workshop/miis2014/uploads/files/July-5th-930am_Fred%20Daum_Particle%20flow%20for%20nonliner%20filters,%20Bayesuan%20Decisions%20and%20Transport%20.pdf

http://www.dsi.unifi.it/users/chisci/idfric/Nonlinear_filtering_Chen.pdf

Zhe Chen, “Bayesian Filtering From Kalman Filters to Particle Filters, and Beyond”, 18.05.06,

Table of Content

201January 13, 2015 201

SOLO

TechnionIsraeli Institute of Technology

1964 – 1968 BSc EE1968 – 1971 MSc EE

Israeli Air Force1970 – 1974

RAFAELIsraeli Armament Development Authority

1974 – 2013

Stanford University1983 – 1986 PhD AA

202“Proceedings of the IEEE”, March 2004, Special Issue on:“Sequential State Estimation: From Kalman Filters to Particle Filters”

Julier, S.,J. and Uhlmann, J.,K., “Unscented Filtering and Nonlinear Estimation”,pp.401 - 422


203

SOLO

Neil GordonM. Sanjev Arulampalam

Tim ClappSimon MaskellNando de FreitasArnaud Doucet

Branko Ristic

Genshiro Kitagawa Christophe Andrieu

Dan Crişan Fred Daum


204

Markov Chain Monte Carlo (MCMC)SOLO

Some MCMC Developments Related to Vision

Nicholas Constantine Metropolis ( 1915 – 1999)

Metropolis 1946

Hastings 1970

Heat bath

Miller, Grenader, 1994

Green 1995

DDMCMC 2001 - 2005

Waltz 1972, (labeling)

Rosenfeld, Hummel, Zucker 1976 (relaxation)

Geman brothers 1984, (Gibbs sampler)

Kirkpatrick 1983

Swendsen-Wang 1987(clustering)

Swendsen-Wang Cut 2003

205

Markov Chain Monte Carlo (MCMC)SOLO

A Brief History of MCMC

Nicholas Constantine Metropolis ( 1915 – 1999)

1942 – 1946: Real use of Monte Carlo started during WWII- study of the atomic bomb (neutron diffusion in fissile material)

1948: Fermi, Metropolis, Ulam obtained Monte Carlo estimates forthe eigenvalues of the Schrödinger equations.

1950: Formating of the basic construction of MCMC, e.g. the Metropolis method- application to statistical physics model, such as Ising model

1960 - 80: using MCMC to study phase transition; material growth/defect,macro molecules (polymers), etc.

1980s: Gibbs samples (Germ brothers), Simulated annealing, data augmentation, Swendsen-Wang, etc.global optimization; image and spech; quantum field theory

1990s: Applications in genetics; computational biology.

206

Rao – Blackwell TheoremSOLO

Rao-Blackwell Theorem provides a process by which a possible improvement in efficiency of an estimator can be obtained by taking its conditional expectation with respect to a sufficient statistics.

The result on one parameter appeared in Rao (1945) and in Blackwell (1947). Lehmann and Scheffè (1950) called the result as Rao-Blackwell Theorem (RBT), and the process is described as Rao-Blackwellization (RB) by Berkson (1955). In computational terminology it is called Rao-Blackwellized Filter (RBF).

Calyampudi Radhakrishna Rao and David Blackwell.

The Rao – Blackwell Theorem states that if g (x) isany kind of estimator of a parameter θ, then theconditional expectation of g (x) given T (x), whereT (x) is a sufficient statistics, is typically a better estimator of θ, and is never worse. Let x = (x1,…,xn) be a random sample from a probability distribution p (x,θ) whereθ = (θ1,…, θq) is an unknown vector parameter. Consider an estimator g (x)=(g1(x),…,gq(x)) of θ and the qxq mean square and product matrix C (g)

C (g) =(cij)= ( E [gi(x)- θi(x)] [gj(x)- θj(x)])

Let S be a sufficient statistic, which may be vector valued, s.t. the conditional expectation,E g|S = T (x), is independent on θ. A general version of Rao – Blackwell is

C (g) – C (T) is nonnegative definite

207

SOLO Non-Gaussian Distribution Approximation

208

SOLO Non-Gaussian Distribution Approximationhttp://www.dsi.unifi.it/users/chisci/idfric/Nonlinear_filtering_Chen.pdf

3 recursive bayesian estimation

Science

p abreview of probability

random variable x

given function

x axis

general function g x

mean e x

discrete random variable

continuous random variable