good lecture for est theory

8/18/2019 Good Lecture for Est Theory

1/99

Chapter 5

Estimation Theory and Applications

References:

S.M.Kay, Fundamentals of Statistical Signal Processing: EstimationTheory, Prentice Hall, 1993

1


2/99

Estimation Theory and Applications

Application Areas

1. Radar

Radar system transmits an electromagnetic pulse )(n s . It is reflected by

an aircraft, causing an echo )(nr to be received after 0τ seconds:)()(( nwn snr +τ−) 0α=

where the range of the aircraft is related to the time delay by

c/20 =τ

2


3/99

3


4/99

2. Mobile Communications

The position of the mobile terminal can be estimated using the time-of-arrival measurements received at the base stations.

4


5/99


6/99

Differences from Detection

1. Radar

Radar system transmits an electromagnetic pulse )(n s . After some time,

it receives a signal )(nr . The detection problem is to decide whether )(nr

isecho from an object or it is not an echo

6


7/99

7


8/99

2. Communications

In wired or wireless communications, we need to know the informationsent from the transmitter to the receiver.

e.g., for binary phase shift keying (BPSK) signals, it consists of only two

symbols, “0” or “1”. The detection problem is to decide whether it is “0” or“1”.

8


9/99


10/99


11/99

4. Image Processing

Fingerprint authentication: given a fingerprint image and his owner sayshe is “A”, we need to verify if it is true or not

Other biometric examples include face authentication, iris authentication,

etc.

11


12/99

5. Biomedical Engineering

17 Jan. 2003, Hong Kong Economics Times

e.g., given some X-ray slides, the detection problem is to determine if shehas breast cancer or not

6. Seismology

To detect if there is oil or there is no oil at a region

12


13/99


14/99

Estimate the value of resistance from a set of voltage and current

readings:1,,1,0],[][][],[][][ 2actual1actual −=+=+= nnwn I n I nwnV nV L

Given pairs of ( ][],[ n I nV ), we need to estimate the resistance ,

ideally, I V /=

⇒ the parameter is not directly observed in the received signals

Estimate the position of the mobile terminal using time-of-arrival

measurements:

1,,1,0],[)()(

][

22

−=+−−−

= N nnwc

y y x xnr

n sn sL

Given ][nr , we need to find the mobile position ( s s y x , ) where is thesignal propagation speed and (

c

nn x , ) represent the known position of

the nth base station

⇒ the parameters are not directly observed in the received signals

14


15/99


16/99

Parameter is unknown deterministic or random

Unknown deterministic: constant but unknown (classical)DC value is an unknown constant

Random : random variable with prior knowledge ofPDF (Bayesian)If we have prior knowledge that the DC valueis bounded by 0− and 0 with a particularPDF ⇒ better estimate

Parameter is stationary or changing

Stationary : Unknown deterministic for whole observationperiod, time-of-arrivals of a static source

Changing : Unknown deterministic at different timeinstants, time-of-arrivals of a moving source

16


17/99

Performance Measures for Classical Parameter Estimation

Accuracy:

Is the estimator biased or unbiased?

e.g., 1,,1,0],[][ −=+= nnwn x L

where is a zero-mean random noise with variance][nw 2wσ

Proposed estimators:

]0[ˆ1 x A =

][1ˆ 1

02 n x

N A

N

n∑=

−

=

][1

1ˆ 1

03 n x

N A

n∑

−= −

=

N N N

n

N x x xn x A ]1[]1[]0[][ˆ1

0

4 −⋅=∏=−

=

L

17


18/99

Biased : A A E ≠}ˆ{

Unbiased : A A E =}ˆ{ Asymptotically unbiased : only if A A E =}ˆ{ ∞→

Taking the expected values for , and , we have1ˆ A 2

ˆ A 3ˆ A

A Aw E A E x E A E =+=+== 0]}0[{}{]}0[{}ˆ{ 1

A N

A N N

nw E N

A N

nw N E A N E n x N E A E

N

n

N

n

N

n

N

n

N

n

N

n

=∑+⋅⋅=∑+∑=

∑+

∑=

∑=−

=

−

=

−

=

−

=

−

=

−

=

011

]}[{11

][

11

][

1

}ˆ{

1

0

1

0

1

0

1

0

1

0

1

02

A A N

A E ⋅−

=⋅−

=/11

1

1}ˆ{ 3

Q. State the biasedness of , and .1ˆ

A 2ˆ

A 3ˆ

A

18


19/99

For , it is difficult to analyze the biasedness. However, for4ˆ A 0][ =nw :

A A A A A N x x x N N N N ==⋅=−⋅ LL ]1[]1[]0[

What is the value of the mean square error or variance?

They correspond to the fluctuation of the estimate in the second order:

})ˆ{(MSE 2 A A E −= (5.1)

}})ˆ{ˆ{(var 2 A E A E −= (5.2):

If the estimator is unbiased, then MSE = var

19


20/99


21/99

An optimum estimator should give estimates which are

Unbiased Minimum variance (MSE as well)

Q. How do we know the estimator has the minimum variance?

Cramer-Rao Lower Bound (CRLB)

Performance bound in terms of minimum achievable variance provided by

any unbiased estimators

Use for classical parameter estimation

Require knowledge of the noise PDF and the PDF must have closed form

More easier to determine than other variance bounds

21


22/99

Let the parameters to be estimated be T ],,,[ 21 θθθ= Lθ , the CRLB for

θ in Gaussian noise is stated as followsi

[ ] iiiii ,1

, )()()CRLB( θIθJ −==θ (5.4)

where

θ∂

∂

θ∂θ∂

∂

θ∂

∂

θ∂θ∂

∂

θ∂θ∂

∂

θ∂θ∂

∂

θ∂

∂

=

2

2

1

2

22

2

12

2

1

2

21

2

21

2

);(ln);(ln

);(ln);(ln

);(ln);(ln);(ln

)(

P P

P

p E

p E

p E

p E

p E

p E

p E

θx-

θx-

θx-

θx-

θx-

θx-

θx-

θI

OM

L

(5.5)

22


23/99


24/99

Review of Gaussian (Normal) Distribution

The Gaussian PDF for a scalar random variable is defined as x

µ−

σ

−

πσ

= 222

)(

2

1exp

2

1)( x x p (5.6)

We can write ),(~ σµ N x

The Gaussian PDF for a random vector of sizex is defined as

−⋅⋅−−

π= )()(

2

1exp

)(det)2(

1)( 1

2/12/ µxCµx

Cx

-T

N p (5.7)

We can write ),(~ Cµx

24


25/99

The covariance matrix has the form ofC

µ−−µ−−µ−

µ−µ−

µ−−µ−µ−

=

−⋅−=

−−

−

})]1[{()}]1[)(]0[{(

)}]1[)(]0[{(

)}]1[)(]0[{(})]0[{(

})(){(

2110

1010

2

0

N N

N

T

N x E N x x E

x x E

N x x E x E

E

L

M

MO

L

µxµxC

(5.8)

whereT N x x x ]]1[,],1[],0[[ −= Lx

T N E ]],,,[}{ 110 −µµµ== Lxµ

25


26/99

If is a zero-mean white vector and all vector elements have variancex 2σ

N T E IµxµxC ⋅σ=

σ

σ

σ

=−⋅−= 2

2

2

2

00

0

0

00

})(){(

L

OM

M

L

The Gaussian PDF for the random vector can be simplified asx

∑

σ−

πσ=

−

=][

2

1exp

)2(

1)( 2

1

022/2

n x p N

n N

x (5.9)

with the use of

N -

IC ⋅σ= −21 N N 22 )(det σ=σ=(C)

26


27/99

Example 5.1

Determine the PDF of

]0[]0[ w x += and

1,,1,0],[][ −=+= nnwn x L

where { } is a white Gaussian process with known variance and)(nw 2wσ

is a constant

−

σ−

πσ= 2

22)]0[(

2

1exp

2

1)];0[( A x A x p

ww

−∑

σ−

πσ=

−

=

21

022/2 )][(

2

1exp

)2(

1)( An x;A p

N

nw N

w

x

27


28/99

Example 5.2

Find the CRLB for estimating based on single measurement:

]0[]0[ w x +=

22

2

22

22

2

2

22

1))];0[(ln(

)]0[(1)]0[(2

2

1))];0[(ln(

)]0[(2

1

)2ln())];0[(ln(

)]0[(2

1exp

2

1)];0[(

w

ww

ww

ww

A

A x p

A x A x

A

A x p

A x A x p

A x A x p

σ−=

∂

∂⇒

σ

−=−⋅−⋅

σ−=

∂∂

⇒

−σ−πσ−=⇒

−

σ−

πσ=

28


29/99

As a result,

22

2

1))];0[(ln(w A

A x p E σ−=

∂∂

2

2

2

)CRLB(

)(

1

)()(

w

w

w

A

A J

A I A

σ=⇒

σ=⇒

σ==I

This means the best we can do is to achieve estimator variance = or2wσ

2)ˆvar( w A σ≥

where ˆ is any unbiased estimator for estimating

29


30/99

We also observe that a simple unbiased estimator

]0[ˆ1 x A =

achieves the CRLB:

222221 ]}0[{})]0[{(})]0[{(})

ˆ{( ww E Aw A E A x E A A E σ==−+=−=−

Example 5.3

Find the CRLB for estimating based on measurements:

1,,1,0],[][ −=+= nnwn x L

−∑

σ−

πσ=

−

=

21

022/2 )][(

2

1exp

)2(

1)( An x;A p

N

nw N

w

x

30


31/99

22

2

2

1

01

02

21

02

2/2

21

022/2

));(ln(

)][(

1)][(22

1));(ln(

)][(2

1))2ln(());(ln(

)][(

2

1exp

)2(

1);(

w

w

N

n N

nw

N

nw

N w

N

nw N w

N

A

A p

An x

An x A

A p

An x A p

An x A p

σ−=

∂∂⇒

σ

−∑=−⋅−∑⋅⋅

σ−=

∂∂

⇒

−∑σ

−πσ−=⇒

−∑

σ

−

πσ

=

−

=−

=

−

=

−

=

x

x

x

x

22

2 ));(ln(

w

N

A

A p E

σ−=

∂

∂⇒

x

31


32/99

As a result,

A

N A J

A I A

w

w

w

2

2

2

)CRLB(

)(

)()(

σ=⇒

σ

=⇒

σ==I

This means the best we can do is to achieve estimator variance =

or

N w /2σ

A w2

)ˆvar( σ≥

where ˆ is any unbiased estimator for estimating

32


33/99

We also observe that a simple unbiased estimator

]0[ˆ1 x A = does not achieve the CRLB

22222

1 ]}0[{})]0[{(})]0[{(})ˆ

{( ww E Aw A E A x E A A E σ==−+=−=−

On the other hand, the sample mean estimator

][1ˆ 1

02 n x

N A

N

n∑=

−

=

achieve the CRLB

N nw E N An x N E A A E w

N

n

N

n

22

1

0

21

0

22 ][

1

][

1

})ˆ{(

σ

=

∑=

−∑=−

−

=

−

=

⇒ sample mean is the optimum estimator for white Gaussian noise

33


34/99

Example 5.4

Find the CRLB for and given2wσ ]}[{ n x :

1,,1,0],[][ −=+= nnwn x L

2

1

01

02

21

02

2

21

02

2/2

221

022/2

)][(

1)][(22

1));(ln(

)][(2

1)ln(

2)2ln(

2

)][(

2

1))2ln(());(ln(

,)][(2

1exp)2(

1);(

w

N

n N

nw

N

nw

w

N

nw

N w

w N

nw N

w

An x

An x A

p

An x N N

An x p

A, An x p

σ

−∑=−⋅−∑⋅⋅

σ−=

∂∂

⇒

−∑σ

−σ−π−=

−∑

σ

−πσ−=⇒

σ=

−∑

σ−

πσ=

−=

−

=

−

=

−

=

−=

θx

θx

][θθx

34


35/99

0

]})[{());(ln(

])[()][());(ln(

));(ln(

));(ln(

4

1

02

2

4

1

04

1

02

2

22

2

22

2

=σ

∑−=

σ∂∂

∂⇒

σ

∑−=

σ

−∑−=

σ∂∂

∂⇒

σ−=

∂

∂⇒

σ

−=

∂

∂⇒

−

=

−

=

−

=

w

N

n

w

w

N

n

w

N

n

w

w

w

nw E

A

p E

nw An x

A

p

N

A

p E

N

A

p

θx

θx

θx

θx

35


36/99

4

2

6422

2

21

06422

2

21

0422

21

0

2

2

2

1

2)(

));(ln(

])[(1

2)(

));(ln(

)][(2

1

2

));(ln(

)][(

2

1)ln(

2

)2ln(

2

));(ln(

wwwww

N

nwww

N

nwww

N

nw

w

N N

N p E

nw N p

An x N p

An x N N

p

σ−=σ⋅

σ−

σ=

σ∂

∂⇒

∑σ

−σ

=σ∂

∂⇒

−∑σ

+σ

−=σ∂

∂⇒

−∑

σ

−σ−π−=

−=

−

=

−

=

θx

θx

θx

θx

σ

σ=4

2

20

0

)(

w

w N

N

θI

36


37/99

σ

σ

== N w

w

- 4

2

12

0

0

)()( θIθJ

N A

ww

w

42

2

2)CRLB(

)CRLB(

σ=σ⇒

σ=⇒

⇒ the CRLBs for unknown and known noise power are identical

Q. The CRLB is not affected by knowledge of noise power. Why?

Q. Can you suggest a method to estimate ? 2wσ

37


38/99

Example 5.5

Find the CRLB for phase of a sinusoid in white Gaussian noise:

1,,1,0],[)cos(][ 0 −=+φ+ω= nnwnn x L

where and are assumed known0ω The PDF is

2

0

1

02

2/2

20

1

022/2

))cos(][(2

1

))2ln(());(ln(

))cos(][(2

1exp

)2(

1);(

φ+ω−∑σ−πσ−=φ⇒

φ+ω−∑

σ−

πσ=φ

−

=

−

=

n An x p

n An x p

N

nw

N

w

N

nw N

w

x

x

38


39/99

φ+ω−φ+ω∑

σ−=

φ+ω−⋅−⋅φ+ω−∑

σ

−=

φ∂

φ∂

−

=

−

=

)22sin(2

)sin(][

)sin())cos(][(2

2

1));(ln(

00

1

02

00

1

0

2

n A

nn x A

n An An x p

N

nw

N

nw

x

[ ])22cos()cos(][));(ln( 001

022

2

φ+ω−φ+ω∑σ

−=φ∂

φ∂ −

=n Ann x A p

N

nw

x

[ ]

[ ]

φ+ω−φ+ω+∑

σ−=

φ+ω−φ+ω∑σ

−=

φ+ω−φ+ω⋅φ+ω∑

σ

−=

φ∂

φ∂

−

=

−

=

−

=

)22cos()22cos(2

1

2

1

)22cos()(cos

)22cos()cos()cos());(ln(

001

02

2

002

1

02

2

000

1

0

22

2

nn A

nn A

n Ann A A p

E

N

nw

N

nw

N

nw

x

39


40/99

)22cos(22

)22cos(

22

));(ln(

0

1

02

2

2

2

0

1

02

2

2

2

2

2

φ+ω∑σ

+σ

−=

φ+ω∑

σ

+⋅

σ

−=

φ∂

φ∂

−

=

−

=

n A NA

n A N A p

E

N

nww

N

nww

x

As a result,11

002

211

002

2

2

2

)22cos(1

12

)22cos(22

)CRLB(

−−

=

−−

=

∑ φ+ω−

σ=

∑ φ+ω

σ−

σ=φ

N

n

w N

nww

n N NA

n A NA

If 1>> , 0)22cos(1

0

1

0

≈φ+ω∑−

=n

N

N

n

then

2

22)CRLB(

NA

wσ≈φ

40


41/99


42/99

2

0

1

020

21

022

2

2

)22cos(

2

1

2

11)(cos

1));(ln(

w

N

nw

N

nw

N

nn

A

p

σ−≈

φ+ω+∑

σ

−=φ+ω∑

σ

−=

∂

∂ −

=

−

=

θx

22

2

2

));(ln(

w

N

A

p E

σ−≈

∂

∂ θx

Similarly,

0)22sin(

2

))θ;x(ln(0

1

0

2

0

2

≈φ+ω∑

σ

=

ω∂∂

∂ −

=

nn A

A

p E

N

nw

0)22sin(2

))θ;x(ln(0

1

02

2

≈φ+ω∑σ

=

φ∂∂∂ −

=n

A

A

p E

N

nw

42


43/99

21

02

2

02

1

02

2

20

2

2

)22cos(

2

1

2

1));(ln(n

Ann

A p E

N

nw

N

nw

∑

σ

−≈

φ+ω−∑

σ

−=

ω∂

∂ −

=

−

=

θx

n A

nn A p

E N

nw

N

nw

∑σ

−≈φ+ω∑σ

−=

φ∂ω∂∂ −

=

−

=

1

02

2

02

1

02

2

0

2

2)(sin

))θ;x(ln(

2

20

21

02

2

2

2

2)(sin));(ln(

w

N

nw

NAn A p E σ

−≈φ+ω∑σ

−=

φ∂∂ −

=

θx

∑

∑∑σ

≈

−

=

−

=

−

=

220

220

002

1)(

21

0

2

1

0

22

1

0

2

2

NAn

A

n A

n A

N

N

n

N

n

N

nw

θI

43

Aft t i i i h


44/99

After matrix inversion, we have

N A w

22)CRLB( σ≈

2

2

20

2

SNR ,

)1(SNR

12)CRLB(

w

A

N N σ=

−⋅≈ω

)1(SNR

)12(2)CRLB(

+⋅−

≈φ N N

N

Note that

NA N N N N

N w22

SNR

1

SNR

4

)1(SNR

)12(2)CRLB(

σ=

⋅>

⋅≈

+⋅−

≈φ

⇒ In general, the CRLB increases as the number of parameters to beestimated increases

⇒ CRLB decreases as the number of samples increases

44

P t T f ti i CRLB


45/99

Parameter Transformation in CRLB

Find the CRLB for θ)α ( g = where () g is a function

e.g., 1,,1,0],[][ −=+= nnwn x L

What is the CRLB for 2?

The CRLB for parameter transformation of )(α θ g = is given by

θ∂

θ∂

−

θ∂θ∂

=α

2

2

2

));(ln(

)(

)CRLB(x p

E

g

(5.10)

For nonlinear function, “=” is replaced by “ ” and it is true only for large≈

45

E l 5 7


46/99

Example 5.7

Find the CRLB for the power of the DC value, i.e., 2 :

1,,1,0],[][ −=+= nnwn x L

22

2

4)(

2)(

)(

A A

A g A

A

A g

A A g

=

∂∂

⇒=∂

∂⇒

==α

From Example 5.3, we have

22

2 ));(ln(

w

N

A

A p E

σ=

∂

∂−

x

As a result,

1,4

4)CRLB(222

22 >>σ

=σ

⋅≈ N A

A A ww

46

Example 5 8


47/99

Example 5.8

Find the CRLB for cc 21 +=α from

1,,1,0],[][ −=+= nnwn x L

c

22

2

2

21

)()(

)(

c A

A g c

A

A g

c g

=

∂∂

⇒=∂

∂⇒

+==α

As a result,

c

N c Ac

w

w

222

222

22 )CRLB()CRLB(

σ=

σ⋅=⋅=α

47

Maximum Likelihood Estimation


48/99

Maximum Likelihood Estimation

Parameter estimation is achieved via maximizing the likelihood function

Optimum realizable approach and can give performance close to CRLB


Require knowledge of the noise PDF and the PDF must have closed form

Generally computationally demanding

Let be the PDF of the observed vector parameterized by the

parameter vector θ. The maximum likelihood (ML) estimate is

);( θx p x

);(maxargˆ θxθ

θ

p= (5.11)

48


49/99

Example 5 9


50/99

Example 5.9

Given1,,1,0],[][ −=+= nnwn x L

where is an unknown constant and is a white Gaussian noise with

known variance 2 . Find the ML estimate of

][nw

wσ .

−∑

σ

−

πσ

= −

=

21

022/2

)][(

2

1exp

)2(

1)( An x;A p

N

nw N w

x

Since ))};({ln(maxarg);(maxarg θxθxθθ

p p = , taking log for gives)( ;A p x

2

1

02

2/2 )][(2

1))2ln(());(ln( An x A p

N

nw

N w −∑

σ−πσ−=

−

=x

50

Differentiate with respect to yields


51/99

Differentiate with respect to yields

2

1

01

02

)][(1)][(2

2

1));(ln(

w

N

n N

nw

An x An x

A

A p

σ−∑=−⋅−∑⋅⋅

σ−=

∂∂

−

=−

=

x

)};({ln(maxargˆ A p A x= is determined from

][1ˆ0)ˆ][(0

)ˆ][(1

0

1

02

1

0 n x N

A An x

An x N

n

N

nw

N

n ∑=⇒=−∑⇒=σ

−∑ −

=

−

=

−

=

Note that

ML estimate is identical to the sample mean Attain CRLB

Q. How about if is unknown?2wσ

51

Example 5 10


52/99

Example 5.10

Find the ML estimate for phase of a sinusoid in white Gaussian noise:

1,,1,0],[)cos(][ 0 −=+φ+ω= nnwnn x L

where and are assumed known0ω The PDF is

2

0

1

02

2/2

20

1

022/2

))cos(][(2

1

))2ln(());(ln(

))cos(][(2

1exp

)2(

1);(

φ+ω−∑σ−πσ−=φ⇒

φ+ω−∑σ

−πσ

=φ

−

=

−

=

n An x p

n An x p

N

nw

N

w

N

nw N

w

x

x

52

It is obvious that the maximum of );( φxp or ));(ln( φxp corresponds to the


53/99

It is obvious that the maximum of );( φx p or ));(ln( φx p corresponds to theminimum of

20

1

02

))cos(][(2

1φ+ω−∑

σ

−

=n An x

nw

or 201

0

))cos(][( φ+ω−∑−

=n An x

N

n

Differentiating with respect to φ and then set the result to zero:

0)22sin(2

)sin(][

)sin())cos(][(2

001

0

00

1

0

=

φ+ω−φ+ω∑=

φ+ω−⋅−⋅φ+ω−∑

−=

−

=

n Ann x A

n An An x

N

n

N

n

⇒ )ˆ22sin(2

)ˆsin(][ 01

0

0

1

0

φ+ω∑=φ+ω∑−

=

−

=n

Ann x

N

n

N

n

The ML estimate for φ is determined from the root of the above equation

Q. Any ideas to solve the nonlinear equation?

53

Approximate ML (AML) solution may exist and it depends on the structure


54/99

pp o ate ( ) so ut o ay e st a d t depe ds o t e st uctu eof the ML expression. For example, there exists an AML solution for φ

1,002)ˆ22sin(

1

2)ˆsin(][

1

)ˆ22sin(2

)ˆsin(][

0

1

00

1

0

0

1

00

1

0

>>=⋅≈φ+ω∑⋅=φ+ω∑⇒

φ+ω∑=φ+ω∑

−

=

−

=

−

=

−

=

N

A

n N

A

nn x N

n A

nn x

N

n

N

n

N

n

N

n

The AML solution is obtained from

)cos(][)ˆsin()sin(][)ˆcos(

0)ˆsin()cos(][)ˆcos()sin(][

0)ˆsin(][

0

1

00

1

0

0

1

0

0

1

0

01

0

nn xnn x

nn xnn x

nn x

N

n

N

n

N

n

N

n

N

n

ω∑⋅φ−=ω∑⋅φ⇒

=φω∑+φω∑⇒

=φ+ω∑

−

=

−

=

−

=

−

=

−

=

54

ω∑ − )sin(][ˆ 0

101 nnx

N


55/99

ω∑

ω∑−=φ

−=

=−

)cos(][

)sin(][tanˆ

010

001

nn x

nn x

N n

n

In fact, the AML solution is reasonable:

ω∑+φ

ω∑−φ=

>>

ω∑+φ

ω∑+φ−−≈

ω+φ+ω∑ω+φ+ω∑−=φ

−=

−=−

−=

−

=−

−=

−=−

)cos(][2

)cos(

)sin(][

2

)sin(tan

1,

)cos(][)cos(2

)sin(][)sin(2tan

)cos(])[)cos((

)sin(])[)cos((tanˆ

010

0101

010

0

1

01

0010

00101

nnw NA

nnw NA

N

nnw NA

nnw NA

nnwn A

nnwn A

N n

N n

N n

N

n

N n

N n

55

For parameter transformation, if there is a one-to-one relationship


56/99

p , pbetween )(θ=α g and θ, the ML estimate for α is simply:

)ˆ(ˆ θ=α g (5.12)

Example 5.11

Given samples of a white Gaussian process ,][nw 1,,1,0 −=n L , with

unknown variance 2. Determine the power of in dB.σ ]n[w

The power in dB is related to 2σ by

)(log10 210 σ= P

which is a one-to-one relationship. To find the ML estimate for , we first

find the ML estimate for 2σ

56

][11

)( 21

2 N

∑−


57/99

][2

1)ln(

2)2ln(

2))σ(ln(

][

2

exp

)2(

)σ(

21

02

22

2

0

22/2

2

n x N N

; p

n x; p

N

n

n

N

∑σ

−σ−π−=⇒

∑

σ

−

πσ

=

−

=

=

w

w

Differentiating the log-likelihood function w.r.t. to

2

σ :

][2

1

2σ

))σ(ln( 21

0422

2

n x N ; p N

n∑

σ+

σ−=

∂

∂ −

=

w

Setting the resultant expression to zero:

][1

ˆ][ˆ2

1

ˆ2

21

0

221

042

n x N

n x N N

n

N

n∑=σ⇒∑

σ=

σ

−

=

−

=

As a result,

∑=σ=

−

=][

1log10)ˆ(log10ˆ

21

010

210 n x

N P

N

n

57

Example 5.12


58/99

Given1,,1,0],[][ −=+= nnwn x L

where is an unknown constant and is a white Gaussian noise with

unknown variance 2. Find the ML estimates of

][nw

σ and 2σ .

T N

n

N A An x; p ][θθx 22

1

0

22/2 ,)][(

2

1exp

)2(

1)( σ=

−∑

σ

−

πσ

= −

=

⇒ 21

0422

1

02

)][(2

1

2σ

))(ln(

)][(1))(ln(

An x N ; p

An x A

; p

N

n

N

n

−∑σ

+σ

−=∂

∂

−∑σ

=∂

∂

−

=

−

=

θx

θx

58

Solving the first equation:


59/99

xn x N

A N

n

=∑= −=

][1ˆ 1

0

Putting x== ˆ in the second equation:

21

0

2 )][(1

ˆ xn x N

N

n

−∑=σ −

=

Numerical Computation of ML Solution

When the ML solution is not of closed form, it can be computed by

Grid search

Numerical methods: Newton-Raphson, Golden section, bisection, etc

59

Example 5.13


60/99

From Example 5.10, the ML solution of φ is determined from

)ˆ22sin(2

)ˆsin(][ 01

00

1

0

φ+ω∑=φ+ω∑−

=

−

=n

Ann x

N

n

N

n

Suggest methods to find φ̂ Approach 1: Grid search

Let

)22sin(2

)sin(][)( 01

00

1

0

φ+ω∑−φ+ω∑=φ −

=

−

=n

Ann x g

N

n

N

n

It is obvious that

φ̂ = root of )(φ g

60

The idea of grid search is simple:


61/99

Search for all possible values of or a given range of to find root⇒ φˆ

φˆ

Values are discrete tradeoff between resolution & computation

e.g., Range for : any values inφ̂ )2,0[ π ⇒Discrete points : 1000 resolution is 1000/2π

MATLAB source code:

N=100;n=[0:N-1];

w = 0.2*pi; A = sqrt(2);p = 0.3*pi;np = 0.1;q = sqrt(np).*randn(1,N);x = A.*cos(w.*n+p)+q;

for j=1:1000pe = j/1000*2*pi;s1 =sin(w.*n+pe);s2 =sin(2.*w.*n+2.*pe);g(j) = x*s1'-A/2*sum(s2);

end

61

pe = [1:1000]/1000;plot(pe,g)


62/99

plot(pe,g)

Note: x-axis is )2/( πφ

62

stem(pe g)


63/99

stem(pe,g)axis([0.14 0.16 -2 2])

23240)2152.0( .- g =π⋅ , 21680)2153.0( . g =π⋅

⇒ (π=π⋅=φ 306.02153.0ˆ π± 001.0 )

63

For a smaller resolution say 200 discrete points:


64/99

For a smaller resolution, say 200 discrete points:

clear pe;clear s1;clear s2;clear g;

for j=1:200pe = j/200*2*pi;s1 =sin(w.*n+pe);s2 =sin(2.*w.*n+2.*pe);g(j) = x*s1'-A/2*sum(s2);

endpe = [1:200]/200;plot(pe,g)

64

stem(pe,g)


65/99

axis([0.14 0.16 -2 2])

13061)2150.0( .- g =π⋅ , 11501)2155.0( . g =π⋅

⇒ (π=π⋅=φ 310.02155.0ˆ π± 005.0 )

⇒ Accuracy increases as number of grids increases

65


66/99

p1 = 0;for k=1:10

( * )


67/99

s1 =sin(w.*n+p1);

s2 =sin(2.*w.*n+2.*p1);c1 =cos(w.*n+p1);c2 =cos(2.*w.*n+2.*p1);g = x*s1'-A/2*sum(s2);g1 = x*c1'-A*sum(c2);p1 = p1 - g/g1;p1_vector(k) = p1;

endstem(p1_vector/(2*pi))

Newton/Raphson method converges at ~ 3rd iteration

π=π⋅=φ 305.021525.0ˆ

Q. Can you comment on the grid search & Newton/Raphson method?

67

ML Estimation for General Linear Model


68/99

The general linear data model is given by

wHθx += (5.14)

where

x is the observed vector of sizew is Gaussian noise vector with known covariance matrix

p

C

×H is known matrix of sizeθ is parameter vector of size p

Based on (5.7), the PDF of parameterized by isx θ

−⋅⋅−−

π= )()(

2

1exp

)(det)2(

1)( 1

2/12/ HθxCHθx

Cθx;

-T

N p (5.15)

68


69/99

Example 5.14


70/99

Given pair of ( y x, ) where is error-free but x y is subject to error:

1,,1,0,][][][ −=++⋅= nnwcn xmn L

where is white Gaussian noise vector with known covariance matrixw C

Find the ML estimates for andm c

T cmnwn xnwc

mn xn y

nwcn xmn y

][],[]1][[][]1][[][

][][][

=+⋅=+

⋅=⇒

++⋅=

θθ

⇒

]1[]1]1[[]1[

]1[]1]1[[]1[

]0[]1]0[[]0[

−+⋅−=−

+⋅=

+⋅=

N w N x N y

w x y

w x y

θ

θ

θ

LLL

70

Writing in matrix form:wHθy +=


71/99

y

where

T N y y y ]]1[,],1[],0[[ −= Ly

−

=

1]1[

1]1[

1[0]

N x

x

x

MMH

Applying (5.16) gives

yCHHCHθ 111 )(ˆˆˆ −−− ⋅=

= T T

cm

71

Example 5.15


72/99

Find the ML estimates of , 0ω and φ for

1,1,,1,0],[)cos(][ 0 >>−=+φ+ω= nnwnn x L

where is a white Gaussian noise with variance][nw 2

wσ Recall from Example 5.6:

],[θθx φω=

φ+ω−∑

σ−

πσ= −

= 0

20

1

022/2

,))cos(][(2

1exp)2(

1);( A,n An x p N

nw N

w

The ML solution for can be found by minimizingθ

20

1

00 ))cos(][(),,( φ+ω−∑=φω

−

=n An x A J

n

72

This can be achieved by using a 3-D grid search or Netwon/Raphsonmethod but it is computationally complex


73/99

p y p

Another simpler solution is as follows

200

1

0

20

1

00

))sin()sin()cos()cos(][(

))cos(][(),,(

n An An x

n An x A J

N

n

N

n

ωφ+ωφ−∑=

φ+ω−∑=φω

−

=

−

=

Since and are not quadratic inφ ),,(0 φω , the first step is to use

parameter transformation:

)sin(

)cos(

2

1

φ−=α

φ=α

A ⇒

αα−=φ

α+α=

−

1

21

22

21

tan

A

73


74/99

Substituting back to ),,( 021 ωαα :

ααT


75/99

( ) ( )

xHHHHx-xx

xHHHH-Ix

xHHHH-IHHHH-Ix

xHHHH-IxHHHH-I

xHHHH-xxHHHH-x

αH-x

αH-x

⋅⋅⋅⋅=

⋅⋅⋅=

⋅⋅⋅⋅=

⋅⋅⋅⋅=

⋅⋅==ω

−

−

−−

−−

−−

T T T T

T T T

T T T T T T

T T T T T

T T T T T J

1

1

11

11

110

)(

))((

))(())((

))(())((

))(())(()ˆ()ˆ()(

Minimizing )( 0ω is identical to maximizing

xHHHHx ⋅⋅⋅ − T T T 1)( or

xHHHHx ⋅⋅⋅=ω −ω

T T T 10 )(maxargˆ

0

⇒ 3-D search is reduced to a 1-D search

75

After determining , can be obtained as well0ω̂ α̂


76/99

For sufficiently large :

[ ]

[ ]

( ) ( ) 20

1

0

22

1

1

1

)exp(][2

2

2/0

02/

)(

n jn x N

N

N

N

N

n

T T

T

T T T

T

T

T T

T T T T T T T

ω−∑=

+=

⋅

⋅≈

⋅

⋅=⋅⋅⋅

−

=

−

−−

xsxc

xs

xcxsxc

xs

xc

sscs

scccxsxcxHHHHx

⇒

ω−∑= −

=ω

2

0

1

00 )exp(][

1maxargˆ

0

n jn x N

N

n

ω periodogram maximum⇒

76

Least Squares Methods


77/99

Parameter estimation is achieved via minimizing a least squares (LS) costfunction

Generally not optimum but computationally simple


No knowledge of the noise PDF is required

Can be considered as a generalization of LS filtering

77

Variants of LS Methods


78/99

1. Standard LS

Consider the general linear data model:

wHθx += where

x is the observed vector of sizew is zero-mean noise vector with unknown covariance matrix

p×H is known matrix of sizeθ is parameter vector of size p

The LS solution is given by

( ) ( ) xHHHHθ-xHθ-xθθ

T T T 1)(minargˆ −== (5.18)

which is equal to (5.17)

78

⇒ LS solution is optimum if covariance matrix of is and isw IC ⋅σ= 2w w Gaussian distributed


79/99

Define

Hθ-xe =

where

T N eee )]1()1()0([ −= Le

(5.18) is equivalent to

∑= −

=)(minargˆ 2

1

0

k e N

k θθ (5.19)

which is similar to LS filtering

Q. Any differences between (5.19) and LS filtering?

79

Example 5.16


80/99

Given1,,1,0],[][ −=+= nnwn x L

where is an unknown constant and is a zero-mean noise][nw Find the LS solution of

Using (5.19),

( )

−∑= −

=

21

0

][minargˆ An x A N

n A

Differentiating ( )2

1

0][ An x

N

n −∑

−

= with respect to and set the result to 0:

][1ˆ 1

0

n x N

A N

n∑=

−

=

80

On the other hand, writing ]}[{ n x in matrix form:


81/99

wHx += where

=1

1

1

MH

Using (5.18),

][

]1[

]1[

]0[

111

1

1

1

111ˆ 1

0

1

1

n x N

N x

x

x

A N

n∑⋅=

−

⋅

⋅= −

=

−

−

ML

ML ][][

Both (5.18) and (5.19) give the same answer and the LS solution is

81

optimum if the noise is white GaussianExample 5.17


82/99

Consider the LS filtering problem again. Given

1,,1,0],[][][ −=+⋅= N nnqW n X nd T L where

][nd is desired responseT Ln xn xn xn X ]]1[]1[][[][ +−−= L is the input signal vector

T

LwwwW ][

110 −= L is the unknown filter weight vector][nq is zero-mean noise

Writing in matrix form:

W , =+⋅= WqWHd Using (5.18):dHHHW T T 1)(ˆ −=

82

where

00]0[)0( x X T L


83/99

−−−

=

−

=

][]2[]1[

0]0[]1[

)1(

)1(

L N x N x N x

x x

N X

X

T

T

L

MMMM

L

MH

with 0]2[]1[ ==−=− L x x

Note that

dH

HH

T dx

T xx

R

R

=

=

where xx is not the original version but not modified version of (3.6)

83


84/99

The LS solution is then

)cos(][1

φ+ω∑−

nn x N


85/99

)(cos

ˆ

02

1

0

00

φ+ω∑

=−

=

=

n

A N

n

n

2. Weighted LS

Use a general form of LS via a symmetric weighting matrix W

( ) ( ) WxHWHHHθ-xWHθ-xθθ

T T T 1)(minargˆ −== (5.20)

such thatT

WW =

Due to the presence of , it is generally difficult to write the cost function

( )W

( )Hθ-xWHθ-x T in scalar form as in (5.19)

85

Rationale of using : put larger weights on data with smaller errors W


86/99

put smaller weights on data with larger errors

When = where is covariance matrix of the noise vector:W 1-C C

xCHHCHθ

111 )(ˆ -T -T −= (5.21)

which is equal to the ML solution and is optimum for Gaussian noise

Example 5.19

Given two noisy measurements of :

11 w x += and 22 w x +=

where and are zero-mean uncorrelated noises with known

variances 2 and . Determine the optimum weighted LS solution

1w 2w22σ1σ

86

Use

212 0/10


87/99

σ

σ=

σ

σ== −22

21

1

22

211

/10

0/1

0

0-CW

Grouping and into matrix form:1 x 2 x

+⋅

=

2

1

2

1

1

1

w

w A

x

x

orwHx +⋅=

Using (5.21)

[ ] [ ]

σ

σ

σ

σ==

−−

2

1

22

21

1

22

21111

/10

0/111

1

1

/10

0/111)(ˆ

x

x A -T -T xCHHCH

87

As a result,

22111


88/99

222

21

21

122

21

22

22

221

11

22

21

11ˆ x x x x

A ⋅σ+σ

σ+⋅σ+σ

σ=

σ

+σ

σ

+σ

= −

Note that

If , a larger weight is placed on and vice versa2122 σ>σ 1 x

If , the solution is equal to the standard sample mean

w w

21

22 σ=σ

The solution will be more complicated if and are correlated2 2

1 2

Exact values for and are not necessary, only ratio is needed1σ 2σ

Define , we have22

21 / σσ=λ

2111

1ˆ x x A ⋅λ+

λ+⋅

λ+=

88

3. Nonlinear LS

Th LS t f ti t b t d li d l i


89/99

The LS cost function cannot be represented as a linear model as in

wHθx +=

In general, it is more complex to solve, e.g.,

The LS estimates for 0,ω and φ can be found by minimizing

20

1

0

))cos(][( φ+ω−∑−=

n An x N

n

whose solution is not straightforward as seen in Example 5.15

Grid search and numerical methods are used to find the minimum

89

4. Constrained LS

The linear LS cost function is minimized subject to constraints:


90/99

( ) ( )Hθ-xHθ-xθθ

T minargˆ = subject to (5.22)S

where is a set of equalities/inequalities in terms ofS θ

Generally it can be solved by linear/nonlinear programming, but simplersolution exists for linear and quadratic constraint equations, e.g.,

Linear constraint equation: 10321 =θ+θ+θ

Quadratic constraint equation: 1002322

21 =θ+θ+θ

Other types of constraints: 10321 >θ>θ>θ

10033221 ≥θ+θ+θ

90

Consider the constraints isS

bAθ


91/99

bA = which contains r linear equations. The constrained LS problem for linearmodel is

( ) ( )Hθ-xHθ-xθθ

T minargˆ = subject to bAθ = (5.23)

The technique of Lagrangian multipliers can solve (5.23) as follows

Define the Lagrangian

( ) ( ) )( b-Aθλ Hθ-xHθ-x T T c J += (5.24)

where λ is a r -length vector of Lagrangian multipliers

The procedure is first solve λ then θ

91

Expanding (5.24):

bλ

Aθλ

Hθ

Hθ

xH2θ

xx

T T T T T T T

cJ


92/99

b-AHHxH2-xxc J ++= Differentiate c with respect to :θ

λ AHθHx-2Hθ

T T T c

++=∂

∂2

Set the result to zero:

λ AHH-θλ AHH-xHHHθ

0λ AθHHx2H-

T T T T T T c

T cT T

111 )(2

1ˆ)(2

1)(ˆ

ˆ2

−−− ==⇒

=++

where is the LS solution. Put into :θ̂

cθ̂

bAθ

=

)ˆ())((2

)(2

1ˆˆ 111 b-θAAHHAλ

bλ AHHA-θAθA −−− =⇒== T T T T c

92

Put λ back to :cθ̂

)ˆ

())(()(ˆˆ 111

bθAAHHAAHHθθ

−−− T T T T c


93/99

)())(()( b-θAAHHAAHH-θθ =c Idea of constrained LS can be illustrated by finding minimum value of y :

2 23 +−= x x y subject to 1=− y x

93

5. Total LS

Motivation: Noises at both and :x Hθ


94/99

21 wHθwx +=+ (5.25)

where and are zero-mean noise vectors1w 2w

A typical example is LS filtering in the presence of both input noise andoutput noise. The noisy input is

1,,1,0),()()( −=+= nk nk sk x i L

and the noisy output is

1,,1,0),()()()( −=+⊗= nk nk hk sk r o L

The parameters to be estimated are )}({ k h given )(k x and )(k y

94

Another example is in frequency estimation using linear prediction:

For a single sinusoid )cos()( φ+ωkks it is true that


95/99

For a single sinusoid )cos()( φ+ω= k k s , it is true that

)2()1()cos(2)( −−−ω= k sk sk s

s )(k is perfectly predicted by )1( −k s and )1( −k s :

)2()1()( 10 −+−= k sak sak s

It is desirable to obtain )cos(20 ω=a and 11 −=a in estimation process

In the presence of noise, the observed signal is

1,,1,0),()()( −=+= nk wk sk x L

The linear prediction model is now

95

1,,1,0),2()1()( 10 −=−+−= nk xak xak x L 0

)1()2()3(

)0()1()2(

10

10

+=

+=

xaxax

xa xa x

0)1()2(

)()1(

)3(

)2(

axx

x x

x

x


96/99

)3()2()1(

)1()2()3(

10

10

−+−=−

+=

N xa N xa N x

xa xa x

LLL⇒

−−

=

−1

0

)2()2(

)1()2(

)1(

)3(

a

a

N x N x

x x

N x

x

MMM

−−

+

−−

=

−

+

−1

0

1

0

)2()2(

)1()2(

)0()1(

)2()2(

)1()2(

)0()1(

)1(

)3(

)2(

)1(

)3(

)2(

a

a

N w N w

ww

ww

a

a

N s N s

s s

s s

N w

w

w

N s

s

s

MMMMMM

6. Mixed LS

A combination of LS, weighted LS, nonlinear LS, constrained LS and/ortotal LS

Examples: weighted LS with constraints, total LS with constraints, etc.

96


97/99

Questions for Discussion

1. Suppose you have pairs of ),( ii y x , i ,,2,1 L= and you need to fitthem into the model of ax= . Assuming that only }{ i y contain zero-mean noise, determine the least squares estimate for .a

(Hint: ithe relationship between andi x i y is

inax y iii ,,2,1, L=+=

where { } are the noise inin }{ i y .)

97

2. Use least squares to estimate the line ax= in Q.1 but now only }{ i x

contain zero-mean noise.


98/99

contain zero mean noise.

3. In a radar system, the received signal is

)()()( 0 nwn snr +τ−α= where the range of an object is related to the time delay by

c/20 =τ

Suppose we get an unbiased estimate of , say,0τ 0τ̂ , and its variance is. Determine the corresponding range variance ˆ , where)ˆvar( 0τ )var( R ˆ is

the estimate of .

If and20 )1.0()ˆvar( sµ=τ 18103 −×= msc , what is the value of ?)ˆvar( R

98


99/99

99

good lecture for est theory

Documents