asp lecture 3 estimation intro 2015

32
Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK [email protected], URL: www.commsp.ee.ic.ac.uk/mandic c D. P. Mandic Advanced Signal Processing, Spring 2015 1

Upload: kamnaashu

Post on 06-Dec-2015

221 views

Category:

Documents


3 download

DESCRIPTION

DSP

TRANSCRIPT

Page 1: ASP Lecture 3 Estimation Intro 2015

Advanced Signal Processing

Introduction to Estimation Theory

Danilo Mandic,

room 813, ext: 46271

Department of Electrical and Electronic Engineering

Imperial College London, [email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic

c© D. P. Mandic Advanced Signal Processing, Spring 2015 1

Page 2: ASP Lecture 3 Estimation Intro 2015

Aims of this lecture

◦ To introduce the notions of: Estimator, Estimate, Estimandum

◦ To discuss the bias and variance in statistical estimation theory,asymptotically unbiased and consistent estimators

◦ Performance metric: the Mean Square Error (MSE)

◦ The bias–variance dilemma and the MSE in this context

◦ To derive a feasible MSE estimator

◦ A class of Minimum Variance Unbiased (MVU) estimators

◦ Extension to the vector parameter case

◦ Point estimators, confidence intervals, statistical goodness of anestimator, the role of noise

c© D. P. Mandic Advanced Signal Processing, Spring 2015 2

Page 3: ASP Lecture 3 Estimation Intro 2015

Role of estimation in signal processing

(try also the function specgram in Matlab)

◦ An enabling technology in many electronic signal processing systems

1. Radar 4. Image analysis 7. Control2. Sonar 5. Biomedicine 8. Seismics3. Speech 6. Communications 9. Almost everywhere ...

◦ Radar and sonar: range and azimuth

◦ Image analysis: motion estimation, segmenation

◦ Speech: features used in recognition and speaker verification

◦ Seismics: oil reservoirs

◦ Communications: equalization, symbol detection

◦ Biomedicine: various applications

c© D. P. Mandic Advanced Signal Processing, Spring 2015 3

Page 4: ASP Lecture 3 Estimation Intro 2015

Statistical estimation problem

for simplicity, consider a DC level in WGN, x[n] = A + w[n], w ∼ N (0, σ2)

Problem statement: We seek to determine a set of parametersθ = [θ1, . . . , θp]

T from a set of data points x = [x[0], . . . , x[N − 1]]T

such that the values of these parameters would yield the highestprobability of obtaining the observed data. In other words,

maxspan θ

p(x;θ) reads : “p(x) parametrised by θ′′

◦ The unknown parameters may be seen as deterministic or randomvariables

◦ There are essentially two alternatives to the statistical case

– No a priori distribution assumed: Maximum Likelihood– A priori distribution known: Bayesian estimation

◦ Key problem # to estimate a group of parameters from a discrete-timesignal or dataset.

c© D. P. Mandic Advanced Signal Processing, Spring 2015 4

Page 5: ASP Lecture 3 Estimation Intro 2015

Estimation of a scalar random variable

Given an N - point dataset x[0], x[1], . . . , x[N − 1] which depends on anunknown parameter θ, (scalar), define an “estimator” as some function, g,of the dataset, that is

θ = g(x[0], x[1], . . . , x[N − 1])

which may be used to estimate θ (single parameter case).

(in our DC level estimation problem, θ = A)

◦ This defines the problem of “parameter estimation”

◦ Also need to determine g(·)

◦ Theory and techniques of statistical estimation are available

◦ Estimation based on PDFs which contain unknown but deterministicparameters is termed classical estimation

◦ In Bayesian estimation, the unknown parameters are assumed to berandom variables, which may be prescribed “a priori” to lie within somerange of allowable parameters (or desired performance)

c© D. P. Mandic Advanced Signal Processing, Spring 2015 5

Page 6: ASP Lecture 3 Estimation Intro 2015

The stastical estimation problem

First step: to model, mathematically, the data

◦ We employ a Probability Desity Function (PDF) to describe theinherently random measurement process, that is

p(x[0], x[1], . . . , x[N − 1]; θ)

which is “parametrised” by the unknown parameter θ

Example: for N = 1, and θ denoting the mean value, a generic form ofPDF for the class of Gaussian PDFs with any value of θ is given by

p(x[0]; θ) = 1√2πσ2

exp[− 1

2σ2(x[0]− θ)2]

x[0] 1 2

Clearly, theobserved value ofx[0] impacts uponthe likely value ofθ.

c© D. P. Mandic Advanced Signal Processing, Spring 2015 6

Page 7: ASP Lecture 3 Estimation Intro 2015

Estimator vs. Estimate

The parameter to be estimated is then viewed as a realisation of therandom variable θ

◦ Data are described by the joint PDF of the data and parameters:

p(x, θ) = p(x | θ)︸ ︷︷ ︸

(conditional PDF )

p(θ)︸︷︷︸

(prior PDF )

◦ An estimator is a rule that assigns a value of θ from each realisationof x = x = [x[0], . . . , x[N − 1]]T

◦ An estimate of, i.e. θ (also called ’estimandum’) is the value obtained

for a given realisation of x = [x[0], . . . , x[N − 1]]T in the form θ = g(x)

Example: for a noisy straight line: p(x; θ) = 1(2πσ2)N/2e

[

− 12σ2

∑N−1n=0 (x[n]−A−Bn)2

]

◦ Performance is critically dependent upon this PDF assumption - theestimator should be robust to slight mismatch between the measurementand the PDF assumption

c© D. P. Mandic Advanced Signal Processing, Spring 2015 7

Page 8: ASP Lecture 3 Estimation Intro 2015

Example: finding the parameters of straight line

Specification of the PDF is critical in determining a good estimator

In practice, we choose a PDF which fits the problem constraints and any“a priori” information; but it must also be mathematically tractable.

Example: Assume that “on theaverage” the data are increasing

n

noisy line

ideal noiseless line

0

A

x[n]

Data: straight line embedded inrandom noise w[n] ∼ N (0, σ2)

x[n] = A+Bn+ w[n]

n = 0, 1, . . . , N − 1

p(x;A,B) = 1(2πσ2)N/2e

[

− 12σ2

∑N−1n=0 (x[n]−A−Bn)2

]

Unknown parameters:

A, B ⇔ θ ≡ [A B]T

Careful: what would be the effectsof bias in A and B?

c© D. P. Mandic Advanced Signal Processing, Spring 2015 8

Page 9: ASP Lecture 3 Estimation Intro 2015

Bias in parameter estimation

Estimation theory (scalar case): estimate the value of an unknown

parameter, θ, from a set of observations of a random variable described bythat parameter

θ = g(x[0], x[1], . . . , x[N − 1]

)

Example: given a set of observations from Gaussian distribution, estimatethe mean or variance from these observations.

◦ Recall that in linear mean square estimation, when estimating a value ofrandom variable y from an observation of a related random variable x,the coefficents a and b in the estimation y = ax+ b depend upon themean and variance of x and y as well as on their correlation.

The difference between the expected value of the estimate and theactual value θ is caleld the bias and will be denoted y B.

B = E{θN} − θ

where θN denotes estimation over N data samples, x[0], . . . , x[N − 1]

c© D. P. Mandic Advanced Signal Processing, Spring 2015 9

Page 10: ASP Lecture 3 Estimation Intro 2015

Asymptotic unbiasedness

If the bias is zero, then the expected value of the estimate is equal to thetrue value, that is

E{θN} = θ ≡ B = E{θN} − θ = 0

and the estimate is said to be unbiased.

If B 6= 0 then the estimator θ = g(x) is said to be biased.

Example: Consider the sample mean estimator of the signalx[n] = A+ w[n], w ∼ N (0, 1), given by

A = x =1

N + 2

N−1∑

n=0

x[n] that is θ = A

Is the above sample mean estimator of the true mean A biased?

More often: an estimator is biased but bias B → 0 when N → ∞

limN→∞

E{θN} = θ

Such as estimator is said to be asymptotically unbiased.

c© D. P. Mandic Advanced Signal Processing, Spring 2015 10

Page 11: ASP Lecture 3 Estimation Intro 2015

How about the variance?

◦ It is desirable that an estimator be either unbiased or asymptoticallyunbiased (think about the power of estimation error due to DC offset)

◦ For an estimate to be meaningful, it is necessary that we use theavailable statistics effectively, that is,

V ar → 0 as N → ∞

or in other words

limN→∞

var{θN} = limN→∞

{

|θN − E{θN}|2}

= 0

If θN is unbiased then E{θN} = θ, and from Tchebycheff inequality ∀ ǫ > 0

Pr{|θN − θ| ≥ ǫ} ≤var{θN}

ǫ2

⇒ if V ar → 0 as N → ∞, then the probability that θN differs by morethan ǫ from the true value will go to zero (showing consistency).

In this case, θN is said to converge to θ with probability one.

c© D. P. Mandic Advanced Signal Processing, Spring 2015 11

Page 12: ASP Lecture 3 Estimation Intro 2015

Mean square convergence

Another form of convergence, stronger than convergence with probabilityone is mean square convergence.

An estimate θN is said to converge to θ in the mean–square sense, if

limN→∞

E{|θN − θ|2}︸ ︷︷ ︸mean square error

= 0

◦ For an unbiased estimator this is equivalent to the previous conditionthat the variance of the estimate goes to zero

◦ An estimate is said to be consistent if it converges, in some sense, tothe true value of the parameter

◦ We say that the estimator is consistent if it is asymptoticallyunbiased and has a variance that goes to zero as N → ∞

c© D. P. Mandic Advanced Signal Processing, Spring 2015 12

Page 13: ASP Lecture 3 Estimation Intro 2015

Example: Assessing the performance of the Sample

Mean as an estimator

Consider the estimation of a DC level, A in random noise, which could bemodelled as

x[n] = A+ w[n]

w[n] ∼ some zero mean random process.

◦ Aim: to estimate A given {x[0], x[1], . . . , x[N − 1]}

◦ Intuitively, the sample mean is a reasonable estimator

A =1

N

N−1∑

n=0

x[n]

Q1: How close will A be to A?

Q2: Are there better estimators than the sample mean?

c© D. P. Mandic Advanced Signal Processing, Spring 2015 13

Page 14: ASP Lecture 3 Estimation Intro 2015

Mean and variance of the Sample Mean estimator

x[n] = A+ w[n] w[n] ∼ N (0, σ2)

Estimator = f( random data ), ⇒ a random variable itself

⇒ its performance must be judged statistically

(1) What is the mean of A?

E{

A}

= E

{

1

N

N−1∑

n=0

x[n]

}

=1

N

N−1∑

n=0

E {x[n]} = A # unbiased

(2) What is the variance of A?

Assumption: The samples of w[n]s are uncorrelated

E{

A2}

= V ar{

A}

= V ar

{

1

N

N−1∑

n=0

x[n]

}

=1

N2

N−1∑

n=0

V ar {x[n]} =1

N2N σ2 =

σ2

N

Notice the variance → 0 as N → ∞ # consistent (see your P&A sets)

c© D. P. Mandic Advanced Signal Processing, Spring 2015 14

Page 15: ASP Lecture 3 Estimation Intro 2015

Minimum Variance Unbiased (MVU) estimation

Aim: to establish “good” estimators of unknown deterministic parameters

Unbiased estimator # “on the average” yields the true value of theunknown parameter independent of its particular value, i.e.

E(θ) = θ a < θ < b

where (a, b) denotes the range of possible values of θ

Example: Unbiased estimator for a DC level in White GaussianNoise (WGN). If we are given

x[n] = A+ w[n] n = 0, 1, . . . , N − 1

where A is the unknown, but deterministic, parameter to be estimatedwhich lies within the interval (−∞,∞), then the sample mean can be usedas an estimator of A, namely

A =1

N

N−1∑

n=0

x[n]

c© D. P. Mandic Advanced Signal Processing, Spring 2015 15

Page 16: ASP Lecture 3 Estimation Intro 2015

Careful: the estimator is parameter dependent!

An estimator may be unbiased for certain values of the unknownparameter but not all, such an estimator is not unbiased

Consider another sample mean estimator:

ˆA = 1

2N

∑N−1n=0 x[n]

Therefore: E{ˆA}

= 0 when A = 0 but

E{ˆA}

= A2 when A 6= 0 (parameter dependent)

HenceˆA is not an unbiased estimator

◦ A biased estimator introduces a “systemic error” which should notgenerally be present

◦ Our goal is to avoid bias if we can, as we are interested in stochasticsignal properties and bias is largely deterministic

c© D. P. Mandic Advanced Signal Processing, Spring 2015 16

Page 17: ASP Lecture 3 Estimation Intro 2015

Remedy

(also look in the Assignment dealing with PSD in your CW )

Several unbiased estimates of the same quantity may be averagedtogether, i.e. given the L independent estimates

{

θ1, θ2, . . . , θL

}

We may choose to average them, to yield

θ = 1L

∑Ll=1 θl

Our assumption was that the individual estimators are unbiased, with equalvariance, and uncorrelated with one another.

Then (NB: averaging biased estimators will not remove the bias)

E{

θ}

= θ

and

V ar{

θ}

= 1L2

∑Ll=1 V ar

{

θl

}

= 1LV ar

{

θl

}

Note, as L → ∞, θ → θ (consistent)

c© D. P. Mandic Advanced Signal Processing, Spring 2015 17

Page 18: ASP Lecture 3 Estimation Intro 2015

Effects of averaging for real world data

Problem 3.4 from your P/A sets: heart rate estimation

The heart rate, h, of a patient is automatically recorded by a computer every 100ms. In

one second the measurements{

h1, h2, . . . , h10

}

are averaged to obtain h. Given than

E{

hi

}

= αh for some constant α and var(hi) = 1 for all i, determine whether

averaging improves the estimator if α = 1 and α = 1/2.

h =1

10

10∑

i=1

hi[n],

E{

h}

10

10∑

i=1

h = αh

If α = 1, unbiased, if α = 1/2 it will not

be unbiased unless the estimator is formed

as h = 15

∑10i=1 hi[n].

var{

h}

=1

L2

10∑

i=1

var{

hi

}

h

p(h

i)

Before Averaging

h/2 h

p(h

i)

h

p(h

i)

After Averaging

h/2 h

p(h

i)

α =1

2α =

1

2

α = 1α = 1

hi

hi

c© D. P. Mandic Advanced Signal Processing, Spring 2015 18

Page 19: ASP Lecture 3 Estimation Intro 2015

Minimum variance criterion

⇒ An optimality criterion is necessary to define an optimal estimator

Mean Square Error (MSE)

MSE(θ) = E

{(

θ − θ)2

}

measures the average mean squared deviation of the estimator from thetrue value.

This criterion leads, however, to unrealisable estimators - namely, oneswhich are not solely a function of the data

MSE(θ) = E

{[(

θ − E(θ))

+(

E(θ)− θ)]2

}

= V ar(θ) + E{

(θ)− θ}2

= V ar(θ) +B2(θ)

⇒ MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

c© D. P. Mandic Advanced Signal Processing, Spring 2015 19

Page 20: ASP Lecture 3 Estimation Intro 2015

Example: An MSE estimator with ’gain factor’

Consider the following estimator for DC level in WGN

A = a1

N

N−1∑

n=0

x[n]

Task: Find a which results in minimum MSE

Given

E{

A}

= aA and

V ar(A) = a2σ2

N

we have

MSE(A) =a2σ2

N+ (a− 1)2A2

Of course, the choice of a = 1 removes the mean and minimises the variance

c© D. P. Mandic Advanced Signal Processing, Spring 2015 20

Page 21: ASP Lecture 3 Estimation Intro 2015

Continued: MSE estimator with ’gain’

But, can we find a analytically? Differentiating with respect to a yields

∂MSA

∂a(A) =

2aσ2

N+ 2(a− 1)A2

and setting the result to zero gives the optimal value

aopt =A2

A2 + σ2

N

but we do not know the value of A

◦ The optimal value depends upon A which is the unknown parameter

◦ Comment - any criterion which depends on the value of the unknownparameter to be found is likely to yield unrealisable estimators

◦ Practically, the minimum MSE estimator needs to be abandoned, andthe estimator must be constrained to be unbiased

c© D. P. Mandic Advanced Signal Processing, Spring 2015 21

Page 22: ASP Lecture 3 Estimation Intro 2015

A counter-example: A little bias can help

(but the estimator is difficult to control)

Q: Let {y[n]}, n = 1, . . . , N be iid Gaussian variables ∼ N (0, σ2).Consider the following estimate of σ2

σ2 =α

N

N∑

n=1

y2[n] α > 1

Find α which minimises the MSE of σ2.

S: It is straightforward to show that E{σ2} = ασ2 andMSE(σ2) = E{

(

σ2 − σ2)2} = E{σ4} + σ4(1 − 2α)

=α2

N2

N∑

n=1

N∑

s=1

E{y2[n]y2[s]} + σ4(1 − 2α)

=α2

N2

(

N2σ4 + 2Nσ4)

+ σ4(1 − 2α) = σ4[

α4(1 +2

N) + (1 − 2α)

]

The MMSE is obtained for αmin = NN+2 and has the value

minMSE(σ2) = 2σ4

N+2. Given that the minimum variance of an

unbiased estimator (CRLB, later) is 2σ4/N , this is an example of abiased estimator which obtains a lower MSE than the CRLB.

c© D. P. Mandic Advanced Signal Processing, Spring 2015 22

Page 23: ASP Lecture 3 Estimation Intro 2015

Example: Estimation of hearing threshold (hearing aids)

Objective estimation of hearing threshold:

◦ Design AM or FM stimuli with carrier frequency up to 20 kHz

◦ Modulate this carrier with e.g. 40 Hz envelope

◦ Present this sound to the listener

◦ Measure the electroencephalogram, which should contain a 40 Hz signalif the subject was able to detect the sound

◦ This is a good objective estimate of the hearing curve

c© D. P. Mandic Advanced Signal Processing, Spring 2015 23

Page 24: ASP Lecture 3 Estimation Intro 2015

Key points

Simulation: Auditory steady state response (ASSR), 40 Hz amplitude modulated

◦ An estimator is a random

variable, its performance can

only be completely described

statistically or by a PDF.

◦ Estimator’s performance cannot

be conclusively assessed by

one computer simulation - we

often average M trials of

independent simulations, called

“Monte Carlo” analysis.

◦ Performance and computation

complexity are often traded - the

optimal estimator is replaced by

a suboptimal but realisable one.

90 92 94 96 98

−2

0

2

x 10−6 Time series

20 25 30 35 40 45

10−15

PSD analysis

98 100 102 104 106

−2

0

2

x 10−6

20 25 30 35 40 45

10−15

106 108 110 112 114

−4

−2

0

2

4x 10

−6

20 25 30 35 40 45

10−15

90 95 100 105 110

−4

−2

0

2

4x 10

−6

Time (s)20 25 30 35 40 45

10−15

Averaged PSD analysis

Frequency (Hz)

Simulation: Left column: The 24s of ITE EEG sampled at 256Hz was split into 3

segments of length 8s. Left column: The respective periodograms (rectangular window).

Bottom right: The averaged periodogram, observe the reduction in variance (in red).

c© D. P. Mandic Advanced Signal Processing, Spring 2015 24

Page 25: ASP Lecture 3 Estimation Intro 2015

Desired: minimum variance unbiased (MVU) estimator

Minimising the variance of an unbiased estimator concentrates the PDF ofthe error about zero ⇒ estimation error is therefore less likely to be large

◦ Existence of the MVU estimator

The MVU estimator is an unbiased estimator with minimumvariance for all θ, that is, θ3 on the graph.

c© D. P. Mandic Advanced Signal Processing, Spring 2015 25

Page 26: ASP Lecture 3 Estimation Intro 2015

Methods to find the MVU estimator

◦ The MVU estimator may not always exist

◦ A single unbiased estimator may not exist – in which case a searchfor the MVU is fruitless!

1. Determine the Cramer-Rao lower bound (CRLB) and find someestimator which satisfies

2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem

3. Restrict the class of estimators to be not only unbiased, but also linear(BLUE)

4. Sequential vs. block estimators

5. Adaptive estimators

c© D. P. Mandic Advanced Signal Processing, Spring 2015 26

Page 27: ASP Lecture 3 Estimation Intro 2015

Extensions to the vector parameter case

◦ If θ =[

θ1, θ2, . . . , θp

]T

∈ Rp×1 is a vector of unknown parameters, an

estimator is unbiased if

E(θi) = θi ai < θi < bi

for i = 1, 2, . . . , p

and by defining

E(θ) =

E(θ1)E(θ2)

...E(θp)

an unbiased estimator has the property E(θ) = θ

within the p– dimensional space of parameters

◦ An MVU estimator has the additional property that V ar(θi) fori = 1, 2, . . . , p is minimum among all unbiased estimators

c© D. P. Mandic Advanced Signal Processing, Spring 2015 27

Page 28: ASP Lecture 3 Estimation Intro 2015

Summary and food for thoughts

◦ We are now equipped with performance metrics for assessing thegoodnes of any estimator (bias, variance, MSE)

◦ Since MSE = var + bias2, some biased estimators may yield low MSE.However, we prefer the minimum variance unbiased (MVU) estimators

◦ Even a simple Sample Mean estimator is a very rich example of theadvantages of statistical estimators

◦ The knowledge of the parametrised PDF p(data;parameters) is veryimportant for designing efficient estimators

◦ We have introduced statistical “point estimators”, would it be useful toalso know the “confidence” we have in our point estimate

◦ In many disciplines it is useful to design so called “set membershipestimates”, where the output of an estimator belongs to a pre-defininedbound (range) of values

◦ We will next address linear, best linear unbiased, maximum likelihood,least squares, sequential least squares, and adaptive estimators

c© D. P. Mandic Advanced Signal Processing, Spring 2015 28

Page 29: ASP Lecture 3 Estimation Intro 2015

Homework: Check another proof for the MSE expression

MSE(θ) = var(θ) + bias2(θ)

Note : var(x) = E[x2]−[E[x]

]2(∗)

Idea : Let x = θ − θ → substitute into (∗)

to give var(θ − θ)︸ ︷︷ ︸

term (1)

= E[(θ − θ)2]︸ ︷︷ ︸

term (2)

−[E[θ − θ]

]2

︸ ︷︷ ︸

term (3)

(∗∗)

Let us now evaluate these terms:

(1) var(θ − θ) = var(θ)

(2) E[θ − θ]2 = MSE

(3)[E[θ − θ]

]2=

[E[θ]− E[θ]

]2=

[E[θ − θ]

]2= bias2(θ)

Substitute (1), (2), (3) into (**) to give

var(θ) = MSE− bias2 ⇒ MSE = var(θ) + bias2(θ)

c© D. P. Mandic Advanced Signal Processing, Spring 2015 29

Page 30: ASP Lecture 3 Estimation Intro 2015

Recap: Unbiased estimators

Due to the linearity properties of the E {·}, that is

E{a+ b} = E{a}+ E{b}

the sample mean operator can be simply shown to be unbiased, i.e.

E{

A}

= 1N

∑N−1n=0 E {x[n]} = 1

N

∑N−1n=0 A = A

◦ In some applications, the value of A may be constrained to be positive

a component value such as an inductor, capacitor or resistor wouldbe

positive (prior knowledge)

◦ For N data points in random noise, unbiased estimators generally havesymmetric PDFs centred about their true value, i.e.

A ∼ N (A, σ2/N)

c© D. P. Mandic Advanced Signal Processing, Spring 2015 30

Page 31: ASP Lecture 3 Estimation Intro 2015

Notes

c© D. P. Mandic Advanced Signal Processing, Spring 2015 31

Page 32: ASP Lecture 3 Estimation Intro 2015

Notes

c© D. P. Mandic Advanced Signal Processing, Spring 2015 32