chapter 8: least squares (beginning of chapter) - tuthehu/ssp/lecture4.pdf · the least squares...

Chapter 8: Least squares (beginning of chapter)

Least Squares

• So far, we have been trying to determine an estimatorwhich was unbiased and had minimum variance.

• Next we’ll consider a class of estimators that in many caseshave no optimality properties associated with them, butare still applicable in a number of problems due to theirease of use.

• The Least Squares approach originates from 1795, whenGauss invented the method (at the time he was only 18years old)1

• However, the method was properly formalized into itscurrent form and published in 1806 by Legendre.

1Compare the date with maximum likelihood from 1912 and CRLB and RBLS from mid 1900’s.

Least Squares

• The method became widely known as Gauss was the onlyone able to describe the orbit of Ceres, a minor planet in theasteroid belt between Mars and Jupiter that wasdiscovered in 1801.

• In the least squares approach (LS) we attempt to minimizethe squared difference between the given data x[n] and theassumed signal model2.

• Note, that there is no assumption about the noise PDF.

2In Gauss’ case, the goal was to minimize the squared difference between the measurements of the location ofCeres and a family of functions describing the orbit.

Least Squares

0 5 10 15 20 25

0

5

10

15

y(n)

x(n)

LS minimizes the total length of vertical distances

Introductory Example

• Consider the model

x[n] = A+ Bn+w[n], n = 0, . . . ,N− 1

where A and B are unknown deterministic parameters andw[n] ∼ N(0,σ2).


• The LS estimate θLS is the value of θ that minimizes the LSerror criterion:

J(θ) =

N−1∑n=0

(x[n] − s[n; θ])2,

where s[n; θ] is the deterministic part of our modelparameterized by θ.


• In the previous example, θ = [A,B]T and the LS criterion is

J(θ) =

N−1∑n=0

(x[n] − s[n;θ])2 =

N−1∑n=0

(x[n] − (A+ Bn))2,

and the LS estimator is defined as

θLS = arg minθ

N−1∑n=0

(x[n] − (A+ Bn))2

• Due to the quadratic form, this estimation problem iseasily solved, as we will soon see.

Types of LS Problems: Linear and Nonlinear

• The LS problems can be divided into two categories: linearand nonlinear.

• The problems where data depends on the parameters in alinear manner are easily solved, whereas nonlinearproblems may be impossible to solve in closed form.

Example of a linear LS problem

• Consider estimating the DC level A of a signal:

x[n] = A+w[n], n = 0, . . . ,N− 1

• According to the LS approach, we can estimate A byminimizing

J(A) =

N−1∑n=0

(x[n] −A)2


• Due to the quadratic form, differentiating is easy:

∂J(A)

∂A=

N−1∑n=0

2(x[n] −A) · (−1)

• Note that the quadratic form also guarantees that the zeroof the derivative is the unique global minimum.


• When set equal to zero, we get the minimum of J(A).

N−1∑n=0

2(x[n] − A) · (−1) = 0

N−1∑n=0

x[n] −

N−1∑n=0

A = 0

N−1∑n=0

x[n] = NA

A =1N

N−1∑n=0

x[n]

Example of a nonlinear LS problem

• Consider the signal model

x[n] = cos 2πf0n+w[n]

where the frequency f0 is to be estimated. The LSE isdefined by the error function

J(f0) =

N−1∑n=0

(x[n] − cos 2πf0n)2


• Now, the differentiation gives

∂J(f0)

∂f0=

N−1∑n=0

2(x[n] − cos 2πf0n)(2 sin 2πf0n)

= 4N−1∑n=0

(x[n] sin 2πf0n− cos 2πf0n sin 2πf0n)

• We see that it is not possible to solve the root in closedform.

• The problem could be solved using grid search or iterativeminimization.


• Non-exhaustive heuristics tend to fail in this kind ofproblems due to numerous local optima.

• An example of using Matlab’s nlinfit is shown below.• In this example the true frequency was f0 = 0.28. The

estimation fails miserably and results in f0 = 0.02.• You can also see that the optimizer gets stuck in a local

minimum after a few iterations.• Note: later we will find an approximate estimator using

maximum likelihood principle. This could be used as astarting point for LS, as well.

Linear Least Squares

0 10 20 30 40 50 60 70 80 90 100−4

−3

−2

−1

0

1

2

3

4

True signalNoisy samplesEstimated signal

Linear Least Squares - Vector Parameter

• We assume thats = Hθ

where H is the observation matrix, a known N× pmatrix(N > p) of full rank p. (No noise pdf assumption !)

• The LSE is found by minimizing

J(θ) = (x − Hθ)T (x − Hθ)

= xTx − xTHθ− θTHTx + θTHTHθ

= xTx − 2xTHθ+ θTHTHθ

Linear Least Squares - Vector Parameter

• By setting the gradient ∂J(θ)∂θ = −2HTx + 2HTHθ to zero

we obtain the normal equations HTHθ = HTx and theestimator

θ = (HTH)−1HTx.

• The minimum LS error

Jmin = (x − Hθ)T (x − Hθ)

= (x − H(HTH)−1HTx)T (x − H(HTH)−1HTx)= xT (I − H(HTH)−1HT )(I − H(HTH)−1HT )x= xT (I − H(HTH)−1HT )x= xT (x − Hθ)

Idempotence of (I − H(HTH)−1HT)

• In the previous slide we used the property that matrix(I − H(HTH)−1HT ) is idempotent3.

• Note that the matrix is easy to show to be symmetric.• The idempotence can be shown as follows:

(I − H(HTH)−1HT )(I − H(HTH)−1HT )

=I − H(HTH)−1H − H(HTH)−1H + H(HTH)−1 HTH(HTH)−1︸︷︷︸I

HT

=I − H(HTH)−1H − H(HTH)−1H + H(HTH)−1HT

=I − H(HTH)−1H

3A square matrix A is idempotent (or projection matrix) if AA = A.

Weighted Linear Least Squares

• The Least squares approach can be extended byintroducing weights for the errors of the individualsamples.

• This is done by an N×N symmetric weighting matrix W:

J(θ) = (x − Hθ)TW(x − Hθ)

where W is a N×N positive definite matrix.• Similar derivation as in the unweighted LS case shows that

the weighted LS estimator is given by

θ = (HTWH)−1HTWx

Weighted Linear Least Squares

• Note, that setting W = I gives the unweighted case.

• The minimum LS error is given by

Jmin = xT (W − WH(HTWH)−1HTW)x

Weighted LS: example

• Consider the DC level estimation problem

x[n] = A+w[n], n = 0, 1, . . . ,N− 1,

where the noise samples have different variances:w[n] ∼ N(0,σ2

n).


• In this case it would be reasonable to choose the weightssuch that the samples with smaller variance had higherweight, for example,

W =

1σ2

00 0 · · · 0

0 1σ2

10 · · · 0

......

.... . .

...0 0 0 · · · 1

σ2N−1


• The weighted LS estimator becomes now

A = (HTWH)−1HTWx = · · · =

∑N−1n=0

x[n]σ2n∑N−1

n=01σ2n

Chapter 5: General MVU estimation

General MVU Estimation using Sufficientstatistic

• Previously we saw that it’s possible to find the MVUestimator, if it is efficient (achieves the CRLB).

• As a special case, the linear model makes this practical.• In this chapter we’ll consider the case where the MVU does

not achieve the bound. It is still possible to find the MVUestimator.

• The key to this is so called sufficient statistic, which is afunction of the data that contains all available informationabout it.4

4The Finnish translation is quite descriptive: tyhjentävä statistiikka, which says that the statistic drains (exhausts,condenses) all the information from the data.

Sufficient statistic

• Let us consider the example of DC level in WGN, wherethe MVU estimator is

A =1N

N−1∑n=0

x[n].

• In addition to the MVUE, consider the first sampleestimator: A = x[0]

• One way of viewing their difference is that the latter onedisregards the majority of the samples.


• The question arises: which set (or function) of the datawould be sufficient to contain all the necessaryinformation?

• Possible candidates for a sufficient set of data (sufficient tocompute the MVU estimate):

1: S1 = {x[0], x[1], x[2], . . . , x[N− 1]} (all N samples as such)

2: S2 = {x[0] + x[1], x[2], . . . , x[N− 1]} (N− 2 samples + a sum of two)

3: S3 =

{N−1∑n=0

x[n]

}(sum of all the samples)


• All of the sets S1, S2 and S3 are sufficient, because theMVUE can be derived from any of them.

• The set S3 is special in the sense that it compresses theinformation into a single value.

• In general, among all sufficient sets, the one that containsthe least amount of elements is called the minimal sufficientstatistic. There may be several minimal sufficients statistics:for example in our example it’s easy to come up with otherone-element sets of sufficient statistics.

• Next we’ll define sufficiency in mathematical terms.

Definition of Sufficiency

DefinitionConsider a set of data x = {x[0], x[1], . . . , x[N− 1]} sampled froma distribution depending on the parameter θ. When estimatingθ, a function of the data T(x) is called a sufficient statistic, if theconditional PDF (after observing the value of T(x))

p(x | T(x) = T0; θ)

does not depend on the parameter θ.

• A direct use of the above definition is typically difficult.Instead, it is usually more convinient to use Neyman-Fisherfactorization theorem that we’ll describe soon.

Definition of Sufficiency

• However, before that, let’s see an example of the difficultway.

Example: Proof of Sufficiency using theDefinition

• Consider the familiar DC level in WGN example. Let’sshow that the statistic T(x) =

∑x[n] is sufficient.

• The PDF of the data is

p(x;A) =1

(2πσ2)N2

exp

[−

12σ2

N−1∑n=0

(x[n] −A)2

]

• We need to show that the conditional PDFp(x | T(x) = T0;A) does not depend on A.


• The PDF can be evaluated by the following rule fromprobability theory

p(E1 | E2) =p(E1,E2)

p(E2)

• In our case this becomes

p(x | T(x) = T0;A) =p(x, T(x) = T0;A)p(T(x) = T0)


• The denominator p(T(x) = T0) is easy, because a sum ofGaussian random variables (N(A,σ2)) is Gaussian withdistribution N(NA,Nσ2), or

p(T(x) = T0) = p

(N−1∑n=0

x[n] = T0

)

=1√

2πNσ2exp

[−

12Nσ2 (T0 −NA)

2]

• The numerator p(x, T(x) = T0;A) is the joint PDF of thedata and the statistic.


• However, T(x) depends on the data x such that their jointprobability takes nonzero values only when T(x) = T0holds.5

• This condition can be written more compactly as:

p(x, T(x) = T0;A) = p(x;A)δ(T(x) − T0),

where δ(·) is the Dirac delta function6.

5This is analogous to the case of two random variables x and ywith the relationship y ≡ 2x. In this case theirjoint PDF is p(x,y) = p(x)δ(y− 2x). The delta term ensures that the joint PDF is zero outside the line y = 2x.

6The defining characteristic of δ(·) is ∫∞−∞ p(x)δ(x) = p(0),

i.e., it can be used to pick a single value of a continuous function.


• Thus, we have

p(x | T(x) = T0;A) =p(x, T(x) = T0;A)p(T(x) = T0)

=p(x;A)δ(T(x) − T0)

1√2πNσ2 exp

[− 1

2Nσ2 (T0 −NA)2]

=

1

(2πσ2)N2

exp[− 1

2σ2

∑N−1n=0 (x[n] −A)

2]

1√2πNσ2 exp

[− 1

2Nσ2 (T0 −NA)2] δ(T(x) − T0)


• Algebraic manipulation shows that A vanishes from theformula:

p(x | T(x) = T0;A) =√N

(2πσ2)N−1

2exp

[−

12σ2

N−1∑n=0

x2[n]

]exp

[T 2

0

2Nσ2

]δ(T(x)−T0),

or, without the delta function:

p(x | T(x) = T0;A) =

√N

(2πσ2)N−1

2exp

[− 1

2σ2

∑N−1n=0 x

2[n]]

exp[T 2

02Nσ2

], if T(x) = T0,

0, otherwise.

• Therefore T(x) is sufficient.


• If nothing else, we learned that using the definitiondirectly is difficult. There’s also a more straightforwardway, as we’ll see next.

Theorem for Finding a Sufficient Statistic

TheoremNeyman-Fisher Factorization Theorem:The function T(x) is a sufficient statistic for θ if and only if we canfactorize the pdf p(x, θ) as

p(x, θ) = g(T(x), θ)︸︷︷︸no x here

h(x)︸︷︷︸no θ here

where g is a function depending on x only through T(x) and h is afunction depending only on x.

Example: DC Level in WGN

• Let us find the factorization in the DC level in WGN case.As discussed previously, this should give T(x) =

∑x[n].

• Start with the PDF:

p(x;A) =1

(2πσ2)N2

exp

[−

12σ2

N−1∑n=0

(x[n] −A)2

]


• The exponent can be broken into parts:

N−1∑n=0

(x[n] −A)2 =

N−1∑n=0

(x2[n] − 2Ax[n] +A2)

=

N−1∑n=0

x2[n] − 2AN−1∑n=0

x[n] +NA2


• Let’s separate the PDF into two functions: one with A andT(x) only and one with x only:

p(x;A) =1

(2πσ2)N2

exp

[−

12σ2

(NA2 − 2A

N−1∑n=0

x[n]

)]︸︷︷︸

g(T(x),A)

exp

[−

12σ2

N−1∑n=0

x2[n]

]︸︷︷︸

h(x)

• Thus,

T(x) =N−1∑n=0

x[n]

is a sufficient statistic for A.


• Note that the factorization suggests

T(x) = 2N−1∑n=0

x[n],

which is also a sufficient statistic of A.

• In fact, any one-to-one mapping of T(x) is a sufficientstatistic as well, because we can always return to T(x).

Example: Power of WGN

• Consider estimating the variance σ2 of a signal with thefollowing model:

x[n] = 0 +w[n],

where w[n] ∼ N(0,σ2). Find a sufficient statistic.• Solution: Start with the PDF:

p(x;σ2) =1

(2πσ2)N2

exp

[−

12σ2

N−1∑n=0

x2[n]

].

Example: Power of WGN

• This can be factored in a silly way into the required form:

p(x;σ2) =1

(2πσ2)N2

exp

[−

12σ2

N−1∑n=0

x2[n]

]︸︷︷︸

g(T(x),σ2)

· 1︸︷︷︸h(x)

.

Thus, T(x) =∑N−1n=0 x

2[n] is a sufficient statistic for A.• Other examples of sufficient statistics follow later as we’re

really using the concept to find MVUE.

Finding the MVUE using a sufficient statistic

• Let T(x) be a complete sufficient statistic for a parameter θ.There are two ways of finding an MVU estimator usingT(x):

1 Find any unbiased estimator θ and determineθ = E(θ | T(x)).

2 Find some function g so that θ = g(T(x)) is an unbiasedestimator of θ.

• As one could predict, the latter one is easier to use. Thatwill be the one that we’ll generally employ.

• However, let us find the MVU in our familiar example case(DC level in WGN) using both methods.

Method Ê: The difficult way: θ = E(θ | T(x))

• First we need a sufficient statistic T(x) and an unbiasedestimator A. These can be, for example,

T(x) =N−1∑n=0

x[n] and A = x[0].

• The MVU estimator A can then be expressed in terms ofthe two as A = E(A | T(x)).

• How to calculate?


• In general, the conditional expectation E(x | y) can becalculated by:

E(x | y) =

∫∞−∞ x · p(x | y)dx =

∫∞−∞ x ·

p(x,y)p(y)

dx

• In particular, if x and y are Gaussian, this can be shown tobe7

E(x | y) = E(x) +cov(x,y)

var(y)(y− E(y))

7See proof in Appendix 10 A of S. Kay’s book or:http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions

http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions


• Therefore, in our case

E(A | T(x)) =

∫∞−∞ Ap(A | T(x))dA

= E(A) +cov(A, T(x))

var(T(x))(T(x) − E(T(x)))

= A+σ2

Nσ2 (

N−1∑n=0

x[n] −NA)

=1N

N−1∑n=0

x[n],

which we know is the MVUE.

Method Ë: The easy way: Find aTransformation g such that g(T(x)) is unbiased

• The easy way:We need to find a function g(·) such that g(T(x)) isunbiased, i.e.,

E

(g

(N−1∑n=0

x[n]

))= A

Method Ë: The easy way: Find aTransformation g such that g(T(x)) is unbiased

• Because E(∑x[n]) = NA, the function has to satisfy

E(g(∑N−1

n=0 x[n]))

E(∑N−1

n=0 x[n]) =

A

NA=

1N

• Thus, a simple choice of g is g(x) = x/N. Thus

A =1N

N−1∑n=0

x[n]

The RBLS Theorem

• We are now ready to state the theorem named afterstatisticians Rao and Blackwell. It was extended byLehmann and Scheffé (the “Additionally...” part); hencethe lengthy name.

• The theorem uses the concept of completeness of a statisticthat needs to be defined.

• Verification (and even understanding) of completeness canbe difficult, so generally we’re just happy to believe that allstatistics we’ll see are complete.

The RBLS Theorem

• Definitions of completeness include• A statistic is complete if there is only one function of the

statistic that is unbiased.• A statistic T(y) is complete if the only real valued functiong defined on the range of T(y) that satisfies

Eθ(g(T)) = 0, ∀θ

is the function g(T) = 0.

• You may also check 8 for the definition and examples ofcompleteness.

8http://en.wikipedia.org/wiki/Completeness_(statistics)

http://en.wikipedia.org/wiki/Completeness_(statistics)

The RBLS Theorem

• For example in our DC level case, the statisticT(x) =

∑x[n] is complete, because the only function for

whichE[g(T(x))] = 0 for all A ∈ R

is the zero function g(·) ≡ 0.

The RBLS Theorem

TheoremRao-Blackwell-Lehmann-Scheffe-theoremIf θ is an unbiased estimator of θ and T(x) is a sufficient statistic forθ, then θ = E(θ | T) is

1 a valid estimator for θ (not dependent on θ)2 unbiased3 has variance at most that of θ, for all θ.

Additionally, if the sufficient statistic is complete, then θ is the MVUestimator.

The RBLS Theorem

CorollaryIf the sufficient statistic T(x) is complete, then there’s only onefunction g(·) that makes it unbiased. Applied to the statistic, g givesthe MVUE:

θ = g(T(x)).

Proof. See Appendix 5B and the discussion on p. 109-110 ofKay’s book.

Example: Mean of Uniform Noise

• Suppose we have observed the data

x[n] = w[n],

where w[n] is i.i.d. noise with PDF U(0,β).9

• What is the MVU estimator of the mean θ = β/2 from thedata x? Is it just the sample mean as usual?

• Earlier we saw that CRLB regularity conditions do nothold, so that theorem didn’t help us.

• Let’s first find the sufficient statistic using the factorizationtheorem.

9The German Tank Problem is an instance of this model:http://en.wikipedia.org/wiki/German_tank_problem

http://en.wikipedia.org/wiki/German_tank_problem


• The PDF of the samples can be written as

p(x[n]; θ) =1β[u(x[n]) − u(x[n] − β)],

where u(x) is the unit step function

u(x) =

{1, x > 00, x < 0


• The joint PDF of all the samples is

p(x; θ) =1βN

N−1∏n=0

[u(x[n]) − u(x[n] − β)]

• In other words, each value has an equal probability as longas it is in the interval [0,β]; otherwise zero. This can bewritten as a formula:

p(x; θ) =

{1βN

, if 0 < x[n] < β for all n = 0, 1, . . . ,N− 1

0, otherwise

• This form we cannot factorize as required by the theorem:there areN conditions that cannot be written as a formula.


• However, we can cleverly express the N conditions usingonly two: the conditions

0 < x[n] < β for all n = 0, 1, . . . ,N− 1

are equivalent to

0 < min(x[n]) and max(x[n]) < β.


• Thus,

p(x; θ) =

{1βN

, if 0 < min(x[n]) and max(x[n]) < β

0, otherwise

or in a single line using the unit step function u(·):

p(x; θ) =1βN

u(β− max(x[n]))︸︷︷︸g(T(x),θ)

u(min(x[n]))︸︷︷︸h(x)


• Hence we have a factorization required by theNeyman-Fisher factorization theorem, and we know thatthe sufficient statistic is T(x) = max(x[n]).

• Surprise: all the data needed by the MVU is only in thelargest sample.


• If we had unlimited time, we could now prove that T(x) iscomplete. Well, it is.

• In order to find the function g(·) that makes T(x) unbiasedwe could check what’s E(T(x)). We omit this proof as well,and just note that

E(T(x)) =2NN+ 1

θ

• Therefore, the MVU results from the application of such gthat the result is unbiased. Let’s choose

g(x) =N+ 1

2Nx


• Result: the MVU of the mean of the uniform distribution is

θ = g(T(x)) =N+ 1

2Nmax(x[n]).


• The optimality can be seen also in practice. The plot belowshows the distribution of the MVU estimator (top) in a testof 100 realizations of uniform noise with β = 3. The otherunbiased estimators that one might think of are the samplemean (center) and the sample median (bottom), that bothhave significantly worse performance than the MVUE.


1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

20

40

60MVU: Sample variance = 0.00022122. Sample mean = 1.4997

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

5

10

15MEAN: Sample variance = 0.0090779. Sample mean = 1.5099

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

2

4

6

8MEDIAN: Sample variance = 0.024855. Sample mean = 1.5179

Finding estimator MVU using sufficient statistic- vector parameter

• Both theorems extend to the vector case in a natural way.

TheoremNeyman-Fisher Factorization - vector parameter:If we can factorize the pdf p(x,θ) as

p(x;θ) = g(T(x),θ)h(x) (1)

where g is a function depending only on x through T(x) ∈ Rr×1, andalso on θ, and h is a function depending only on x, then T(x) is asufficient statistic for θ. Conversely, if T(x) is a sufficient statistic forθ, then the pdf can be factorized as in (1).


TheoremRao-Blackwell-Lehmann-Scheffe - vector parameterIf θ is an unbiased estimator of θ and T(x) is an r× 1 sufficientstatistic for θ, then θ = E(θ | T) is

1 a valid estimator for θ (not dependent on θ)2 unbiased3 of lesser or equal variance than that of θ, for all θ (each element

of θ has lesser or equal variance).Additionally, if the sufficient statistic is complete, then θ is the MVUestimator.


CorollaryIf the sufficient statistic T(x) is complete, then there’s only onefunction g(·) that makes it unbiased. Applied to the statistic, g givesthe MVU:

θ = g(T(x)).

Example: Estimation of Both DC Level andNoise Power

• Consider the DC level in WGN problem,

x[n] = A+w[n],

where w[n] is WGN, and the problem is to estimate A andthe variance σ2 of w[n] simultaneously.

• The parameter vector is θ = [A,σ2]T .


• The PDF is

p(x;θ) =1

(2πσ2)N2

exp

[−

12σ2

N−1∑n=0

(x[n] −A)2

]

=1

(2πσ2)N2

exp

[−

12σ2

(N−1∑n=0

x2[n] − 2AN−1∑n=0

x[n] +NA2

)]

• By selecting T(x) = [T1(x), T2(x)]T with T1(x) =∑x[n] and

T2(x) =∑x2[n], we arrive at the Neyman-Fisher

factorization.


• The factorization is now

p(x;θ) =1

(2πσ2)N2

exp[−

12σ2

(T2(x) − 2AT1(x) +NA2)]

︸︷︷︸g(T(x);θ)

· 1︸︷︷︸h(x)

• The sufficient statistic is thus

T(x) =

N−1∑n=0

x[n]

N−1∑n=0

x2[n]


• The next step is to find a transformation g(·) that makesthe statistic T(x) unbiased.

• Let’s find the expectations:

E(T(x)) =

N−1∑n=0

E(x[n])

N−1∑n=0

E(x2[n])

=

(NA

N(σ2 +A2)

)


• The question is: how to remove the bias from bothcomponents?

• How about simply dividing by N:

g(T(x)) =1N

N−1∑n=0

x[n]

N−1∑n=0

x2[n]

=⇒ E(g(T(x))) =(

A

σ2 +A2

),

(A

σ2

)

No! The second term won’t get unbiased this easily.


• What about trying to cancel the extra A2 term by usingT1(x):

g(T(x)) =( 1

NT1(x)1NT2(x) − [ 1

NT1(x)]2

)=

x

1N

N−1∑n=0

x2[n] − x2

• This way the expectation becomes quite close to the true

value:

E(g(T(x))) =

E(x)

E

(1N

N−1∑n=0

x2[n] − x2) =

(A

σ2 +A2 − E(x2)

)


• Since x is Gaussian, we know the expectation of its square:E(x2) = A2 + σ2/N, which yields

E(g(T(x))) =(

A

σ2 +A2 −A2 − σ2/N

)=

(A

(N− 1)σ2/N

)If we multiply the second term with N/(N− 1) the resultwill become unbiased. Thus, the MVUE is

θ =

x

NN−1

(1N

N−1∑n=0

x2[n] − x2) = · · · =

x

1N−1

N−1∑n=0

(x[n] − x)2


• The estimator (or its second term) is not efficient. It can beshown that the variance of the second term is larger thanthe CRLB:

Cθ =

(σ2

N 00 2σ4

N−1

)while I−1(θ) =

(σ2

N 00 2σ4

N

)

chapter 8: least squares (beginning of chapter) - tuthehu/ssp/lecture4.pdf · the least squares...

Documents