chapter 8: least squares (beginning of chapter) - tuthehu/ssp/lecture4.pdf · the least squares...
TRANSCRIPT
Chapter 8: Least squares (beginning of chapter)
Least Squares
• So far, we have been trying to determine an estimatorwhich was unbiased and had minimum variance.
• Next we’ll consider a class of estimators that in many caseshave no optimality properties associated with them, butare still applicable in a number of problems due to theirease of use.
• The Least Squares approach originates from 1795, whenGauss invented the method (at the time he was only 18years old)1
• However, the method was properly formalized into itscurrent form and published in 1806 by Legendre.
1Compare the date with maximum likelihood from 1912 and CRLB and RBLS from mid 1900’s.
Least Squares
• The method became widely known as Gauss was the onlyone able to describe the orbit of Ceres, a minor planet in theasteroid belt between Mars and Jupiter that wasdiscovered in 1801.
• In the least squares approach (LS) we attempt to minimizethe squared difference between the given data x[n] and theassumed signal model2.
• Note, that there is no assumption about the noise PDF.
2In Gauss’ case, the goal was to minimize the squared difference between the measurements of the location ofCeres and a family of functions describing the orbit.
Least Squares
0 5 10 15 20 25
0
5
10
15
y(n)
x(n)
LS minimizes the total length of vertical distances
Introductory Example
• Consider the model
x[n] = A+ Bn+w[n], n = 0, . . . ,N− 1
where A and B are unknown deterministic parameters andw[n] ∼ N(0,σ2).
Introductory Example
• The LS estimate θLS is the value of θ that minimizes the LSerror criterion:
J(θ) =
N−1∑n=0
(x[n] − s[n; θ])2,
where s[n; θ] is the deterministic part of our modelparameterized by θ.
Introductory Example
• In the previous example, θ = [A,B]T and the LS criterion is
J(θ) =
N−1∑n=0
(x[n] − s[n;θ])2 =
N−1∑n=0
(x[n] − (A+ Bn))2,
and the LS estimator is defined as
θLS = arg minθ
N−1∑n=0
(x[n] − (A+ Bn))2
• Due to the quadratic form, this estimation problem iseasily solved, as we will soon see.
Types of LS Problems: Linear and Nonlinear
• The LS problems can be divided into two categories: linearand nonlinear.
• The problems where data depends on the parameters in alinear manner are easily solved, whereas nonlinearproblems may be impossible to solve in closed form.
Example of a linear LS problem
• Consider estimating the DC level A of a signal:
x[n] = A+w[n], n = 0, . . . ,N− 1
• According to the LS approach, we can estimate A byminimizing
J(A) =
N−1∑n=0
(x[n] −A)2
Example of a linear LS problem
• Due to the quadratic form, differentiating is easy:
∂J(A)
∂A=
N−1∑n=0
2(x[n] −A) · (−1)
• Note that the quadratic form also guarantees that the zeroof the derivative is the unique global minimum.
Example of a linear LS problem
• When set equal to zero, we get the minimum of J(A).
N−1∑n=0
2(x[n] − A) · (−1) = 0
N−1∑n=0
x[n] −
N−1∑n=0
A = 0
N−1∑n=0
x[n] = NA
A =1N
N−1∑n=0
x[n]
Example of a nonlinear LS problem
• Consider the signal model
x[n] = cos 2πf0n+w[n]
where the frequency f0 is to be estimated. The LSE isdefined by the error function
J(f0) =
N−1∑n=0
(x[n] − cos 2πf0n)2
Example of a nonlinear LS problem
• Now, the differentiation gives
∂J(f0)
∂f0=
N−1∑n=0
2(x[n] − cos 2πf0n)(2 sin 2πf0n)
= 4N−1∑n=0
(x[n] sin 2πf0n− cos 2πf0n sin 2πf0n)
• We see that it is not possible to solve the root in closedform.
• The problem could be solved using grid search or iterativeminimization.
Example of a nonlinear LS problem
• Non-exhaustive heuristics tend to fail in this kind ofproblems due to numerous local optima.
• An example of using Matlab’s nlinfit is shown below.• In this example the true frequency was f0 = 0.28. The
estimation fails miserably and results in f0 = 0.02.• You can also see that the optimizer gets stuck in a local
minimum after a few iterations.• Note: later we will find an approximate estimator using
maximum likelihood principle. This could be used as astarting point for LS, as well.
Linear Least Squares
0 10 20 30 40 50 60 70 80 90 100−4
−3
−2
−1
0
1
2
3
4
True signalNoisy samplesEstimated signal
Linear Least Squares - Vector Parameter
• We assume thats = Hθ
where H is the observation matrix, a known N× pmatrix(N > p) of full rank p. (No noise pdf assumption !)
• The LSE is found by minimizing
J(θ) = (x − Hθ)T (x − Hθ)
= xTx − xTHθ− θTHTx + θTHTHθ
= xTx − 2xTHθ+ θTHTHθ
Linear Least Squares - Vector Parameter
• By setting the gradient ∂J(θ)∂θ = −2HTx + 2HTHθ to zero
we obtain the normal equations HTHθ = HTx and theestimator
θ = (HTH)−1HTx.
• The minimum LS error
Jmin = (x − Hθ)T (x − Hθ)
= (x − H(HTH)−1HTx)T (x − H(HTH)−1HTx)= xT (I − H(HTH)−1HT )(I − H(HTH)−1HT )x= xT (I − H(HTH)−1HT )x= xT (x − Hθ)
Idempotence of (I − H(HTH)−1HT)
• In the previous slide we used the property that matrix(I − H(HTH)−1HT ) is idempotent3.
• Note that the matrix is easy to show to be symmetric.• The idempotence can be shown as follows:
(I − H(HTH)−1HT )(I − H(HTH)−1HT )
=I − H(HTH)−1H − H(HTH)−1H + H(HTH)−1 HTH(HTH)−1︸ ︷︷ ︸I
HT
=I − H(HTH)−1H − H(HTH)−1H + H(HTH)−1HT
=I − H(HTH)−1H
3A square matrix A is idempotent (or projection matrix) if AA = A.
Weighted Linear Least Squares
• The Least squares approach can be extended byintroducing weights for the errors of the individualsamples.
• This is done by an N×N symmetric weighting matrix W:
J(θ) = (x − Hθ)TW(x − Hθ)
where W is a N×N positive definite matrix.• Similar derivation as in the unweighted LS case shows that
the weighted LS estimator is given by
θ = (HTWH)−1HTWx
Weighted Linear Least Squares
• Note, that setting W = I gives the unweighted case.
• The minimum LS error is given by
Jmin = xT (W − WH(HTWH)−1HTW)x
Weighted LS: example
• Consider the DC level estimation problem
x[n] = A+w[n], n = 0, 1, . . . ,N− 1,
where the noise samples have different variances:w[n] ∼ N(0,σ2
n).
Weighted LS: example
• In this case it would be reasonable to choose the weightssuch that the samples with smaller variance had higherweight, for example,
W =
1σ2
00 0 · · · 0
0 1σ2
10 · · · 0
......
.... . .
...0 0 0 · · · 1
σ2N−1
Weighted LS: example
• The weighted LS estimator becomes now
A = (HTWH)−1HTWx = · · · =
∑N−1n=0
x[n]σ2n∑N−1
n=01σ2n
Chapter 5: General MVU estimation
General MVU Estimation using Sufficientstatistic
• Previously we saw that it’s possible to find the MVUestimator, if it is efficient (achieves the CRLB).
• As a special case, the linear model makes this practical.• In this chapter we’ll consider the case where the MVU does
not achieve the bound. It is still possible to find the MVUestimator.
• The key to this is so called sufficient statistic, which is afunction of the data that contains all available informationabout it.4
4The Finnish translation is quite descriptive: tyhjentävä statistiikka, which says that the statistic drains (exhausts,condenses) all the information from the data.
Sufficient statistic
• Let us consider the example of DC level in WGN, wherethe MVU estimator is
A =1N
N−1∑n=0
x[n].
• In addition to the MVUE, consider the first sampleestimator: A = x[0]
• One way of viewing their difference is that the latter onedisregards the majority of the samples.
Sufficient statistic
• The question arises: which set (or function) of the datawould be sufficient to contain all the necessaryinformation?
• Possible candidates for a sufficient set of data (sufficient tocompute the MVU estimate):
1: S1 = {x[0], x[1], x[2], . . . , x[N− 1]} (all N samples as such)
2: S2 = {x[0] + x[1], x[2], . . . , x[N− 1]} (N− 2 samples + a sum of two)
3: S3 =
{N−1∑n=0
x[n]
}(sum of all the samples)
Sufficient statistic
• All of the sets S1, S2 and S3 are sufficient, because theMVUE can be derived from any of them.
• The set S3 is special in the sense that it compresses theinformation into a single value.
• In general, among all sufficient sets, the one that containsthe least amount of elements is called the minimal sufficientstatistic. There may be several minimal sufficients statistics:for example in our example it’s easy to come up with otherone-element sets of sufficient statistics.
• Next we’ll define sufficiency in mathematical terms.
Definition of Sufficiency
DefinitionConsider a set of data x = {x[0], x[1], . . . , x[N− 1]} sampled froma distribution depending on the parameter θ. When estimatingθ, a function of the data T(x) is called a sufficient statistic, if theconditional PDF (after observing the value of T(x))
p(x | T(x) = T0; θ)
does not depend on the parameter θ.
• A direct use of the above definition is typically difficult.Instead, it is usually more convinient to use Neyman-Fisherfactorization theorem that we’ll describe soon.
Definition of Sufficiency
• However, before that, let’s see an example of the difficultway.
Example: Proof of Sufficiency using theDefinition
• Consider the familiar DC level in WGN example. Let’sshow that the statistic T(x) =
∑x[n] is sufficient.
• The PDF of the data is
p(x;A) =1
(2πσ2)N2
exp
[−
12σ2
N−1∑n=0
(x[n] −A)2
]
• We need to show that the conditional PDFp(x | T(x) = T0;A) does not depend on A.
Example: Proof of Sufficiency using theDefinition
• The PDF can be evaluated by the following rule fromprobability theory
p(E1 | E2) =p(E1,E2)
p(E2)
• In our case this becomes
p(x | T(x) = T0;A) =p(x, T(x) = T0;A)p(T(x) = T0)
Example: Proof of Sufficiency using theDefinition
• The denominator p(T(x) = T0) is easy, because a sum ofGaussian random variables (N(A,σ2)) is Gaussian withdistribution N(NA,Nσ2), or
p(T(x) = T0) = p
(N−1∑n=0
x[n] = T0
)
=1√
2πNσ2exp
[−
12Nσ2 (T0 −NA)
2]
• The numerator p(x, T(x) = T0;A) is the joint PDF of thedata and the statistic.
Example: Proof of Sufficiency using theDefinition
• However, T(x) depends on the data x such that their jointprobability takes nonzero values only when T(x) = T0holds.5
• This condition can be written more compactly as:
p(x, T(x) = T0;A) = p(x;A)δ(T(x) − T0),
where δ(·) is the Dirac delta function6.
5This is analogous to the case of two random variables x and ywith the relationship y ≡ 2x. In this case theirjoint PDF is p(x,y) = p(x)δ(y− 2x). The delta term ensures that the joint PDF is zero outside the line y = 2x.
6The defining characteristic of δ(·) is ∫∞−∞ p(x)δ(x) = p(0),
i.e., it can be used to pick a single value of a continuous function.
Example: Proof of Sufficiency using theDefinition
• Thus, we have
p(x | T(x) = T0;A) =p(x, T(x) = T0;A)p(T(x) = T0)
=p(x;A)δ(T(x) − T0)
1√2πNσ2 exp
[− 1
2Nσ2 (T0 −NA)2]
=
1
(2πσ2)N2
exp[− 1
2σ2
∑N−1n=0 (x[n] −A)
2]
1√2πNσ2 exp
[− 1
2Nσ2 (T0 −NA)2] δ(T(x) − T0)
Example: Proof of Sufficiency using theDefinition
• Algebraic manipulation shows that A vanishes from theformula:
p(x | T(x) = T0;A) =√N
(2πσ2)N−1
2exp
[−
12σ2
N−1∑n=0
x2[n]
]exp
[T 2
0
2Nσ2
]δ(T(x)−T0),
or, without the delta function:
p(x | T(x) = T0;A) =
√N
(2πσ2)N−1
2exp
[− 1
2σ2
∑N−1n=0 x
2[n]]
exp[T 2
02Nσ2
], if T(x) = T0,
0, otherwise.
• Therefore T(x) is sufficient.
Example: Proof of Sufficiency using theDefinition
• If nothing else, we learned that using the definitiondirectly is difficult. There’s also a more straightforwardway, as we’ll see next.
Theorem for Finding a Sufficient Statistic
TheoremNeyman-Fisher Factorization Theorem:The function T(x) is a sufficient statistic for θ if and only if we canfactorize the pdf p(x, θ) as
p(x, θ) = g(T(x), θ)︸ ︷︷ ︸no x here
h(x)︸ ︷︷ ︸no θ here
where g is a function depending on x only through T(x) and h is afunction depending only on x.
Example: DC Level in WGN
• Let us find the factorization in the DC level in WGN case.As discussed previously, this should give T(x) =
∑x[n].
• Start with the PDF:
p(x;A) =1
(2πσ2)N2
exp
[−
12σ2
N−1∑n=0
(x[n] −A)2
]
Example: DC Level in WGN
• The exponent can be broken into parts:
N−1∑n=0
(x[n] −A)2 =
N−1∑n=0
(x2[n] − 2Ax[n] +A2)
=
N−1∑n=0
x2[n] − 2AN−1∑n=0
x[n] +NA2
Example: DC Level in WGN
• Let’s separate the PDF into two functions: one with A andT(x) only and one with x only:
p(x;A) =1
(2πσ2)N2
exp
[−
12σ2
(NA2 − 2A
N−1∑n=0
x[n]
)]︸ ︷︷ ︸
g(T(x),A)
exp
[−
12σ2
N−1∑n=0
x2[n]
]︸ ︷︷ ︸
h(x)
• Thus,
T(x) =N−1∑n=0
x[n]
is a sufficient statistic for A.
Example: DC Level in WGN
• Note that the factorization suggests
T(x) = 2N−1∑n=0
x[n],
which is also a sufficient statistic of A.
• In fact, any one-to-one mapping of T(x) is a sufficientstatistic as well, because we can always return to T(x).
Example: Power of WGN
• Consider estimating the variance σ2 of a signal with thefollowing model:
x[n] = 0 +w[n],
where w[n] ∼ N(0,σ2). Find a sufficient statistic.• Solution: Start with the PDF:
p(x;σ2) =1
(2πσ2)N2
exp
[−
12σ2
N−1∑n=0
x2[n]
].
Example: Power of WGN
• This can be factored in a silly way into the required form:
p(x;σ2) =1
(2πσ2)N2
exp
[−
12σ2
N−1∑n=0
x2[n]
]︸ ︷︷ ︸
g(T(x),σ2)
· 1︸︷︷︸h(x)
.
Thus, T(x) =∑N−1n=0 x
2[n] is a sufficient statistic for A.• Other examples of sufficient statistics follow later as we’re
really using the concept to find MVUE.
Finding the MVUE using a sufficient statistic
• Let T(x) be a complete sufficient statistic for a parameter θ.There are two ways of finding an MVU estimator usingT(x):
1 Find any unbiased estimator θ and determineθ = E(θ | T(x)).
2 Find some function g so that θ = g(T(x)) is an unbiasedestimator of θ.
• As one could predict, the latter one is easier to use. Thatwill be the one that we’ll generally employ.
• However, let us find the MVU in our familiar example case(DC level in WGN) using both methods.
Method Ê: The difficult way: θ = E(θ | T(x))
• First we need a sufficient statistic T(x) and an unbiasedestimator A. These can be, for example,
T(x) =N−1∑n=0
x[n] and A = x[0].
• The MVU estimator A can then be expressed in terms ofthe two as A = E(A | T(x)).
• How to calculate?
Method Ê: The difficult way: θ = E(θ | T(x))
• In general, the conditional expectation E(x | y) can becalculated by:
E(x | y) =
∫∞−∞ x · p(x | y)dx =
∫∞−∞ x ·
p(x,y)p(y)
dx
• In particular, if x and y are Gaussian, this can be shown tobe7
E(x | y) = E(x) +cov(x,y)
var(y)(y− E(y))
7See proof in Appendix 10 A of S. Kay’s book or:http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions
Method Ê: The difficult way: θ = E(θ | T(x))
• Therefore, in our case
E(A | T(x)) =
∫∞−∞ Ap(A | T(x))dA
= E(A) +cov(A, T(x))
var(T(x))(T(x) − E(T(x)))
= A+σ2
Nσ2 (
N−1∑n=0
x[n] −NA)
=1N
N−1∑n=0
x[n],
which we know is the MVUE.
Method Ë: The easy way: Find aTransformation g such that g(T(x)) is unbiased
• The easy way:We need to find a function g(·) such that g(T(x)) isunbiased, i.e.,
E
(g
(N−1∑n=0
x[n]
))= A
Method Ë: The easy way: Find aTransformation g such that g(T(x)) is unbiased
• Because E(∑x[n]) = NA, the function has to satisfy
E(g(∑N−1
n=0 x[n]))
E(∑N−1
n=0 x[n]) =
A
NA=
1N
• Thus, a simple choice of g is g(x) = x/N. Thus
A =1N
N−1∑n=0
x[n]
The RBLS Theorem
• We are now ready to state the theorem named afterstatisticians Rao and Blackwell. It was extended byLehmann and Scheffé (the “Additionally...” part); hencethe lengthy name.
• The theorem uses the concept of completeness of a statisticthat needs to be defined.
• Verification (and even understanding) of completeness canbe difficult, so generally we’re just happy to believe that allstatistics we’ll see are complete.
The RBLS Theorem
• Definitions of completeness include• A statistic is complete if there is only one function of the
statistic that is unbiased.• A statistic T(y) is complete if the only real valued functiong defined on the range of T(y) that satisfies
Eθ(g(T)) = 0, ∀θ
is the function g(T) = 0.
• You may also check 8 for the definition and examples ofcompleteness.
8http://en.wikipedia.org/wiki/Completeness_(statistics)
The RBLS Theorem
• For example in our DC level case, the statisticT(x) =
∑x[n] is complete, because the only function for
whichE[g(T(x))] = 0 for all A ∈ R
is the zero function g(·) ≡ 0.
The RBLS Theorem
TheoremRao-Blackwell-Lehmann-Scheffe-theoremIf θ is an unbiased estimator of θ and T(x) is a sufficient statistic forθ, then θ = E(θ | T) is
1 a valid estimator for θ (not dependent on θ)2 unbiased3 has variance at most that of θ, for all θ.
Additionally, if the sufficient statistic is complete, then θ is the MVUestimator.
The RBLS Theorem
CorollaryIf the sufficient statistic T(x) is complete, then there’s only onefunction g(·) that makes it unbiased. Applied to the statistic, g givesthe MVUE:
θ = g(T(x)).
Proof. See Appendix 5B and the discussion on p. 109-110 ofKay’s book.
Example: Mean of Uniform Noise
• Suppose we have observed the data
x[n] = w[n],
where w[n] is i.i.d. noise with PDF U(0,β).9
• What is the MVU estimator of the mean θ = β/2 from thedata x? Is it just the sample mean as usual?
• Earlier we saw that CRLB regularity conditions do nothold, so that theorem didn’t help us.
• Let’s first find the sufficient statistic using the factorizationtheorem.
9The German Tank Problem is an instance of this model:http://en.wikipedia.org/wiki/German_tank_problem
Example: Mean of Uniform Noise
• The PDF of the samples can be written as
p(x[n]; θ) =1β[u(x[n]) − u(x[n] − β)],
where u(x) is the unit step function
u(x) =
{1, x > 00, x < 0
Example: Mean of Uniform Noise
• The joint PDF of all the samples is
p(x; θ) =1βN
N−1∏n=0
[u(x[n]) − u(x[n] − β)]
• In other words, each value has an equal probability as longas it is in the interval [0,β]; otherwise zero. This can bewritten as a formula:
p(x; θ) =
{1βN
, if 0 < x[n] < β for all n = 0, 1, . . . ,N− 1
0, otherwise
• This form we cannot factorize as required by the theorem:there areN conditions that cannot be written as a formula.
Example: Mean of Uniform Noise
• However, we can cleverly express the N conditions usingonly two: the conditions
0 < x[n] < β for all n = 0, 1, . . . ,N− 1
are equivalent to
0 < min(x[n]) and max(x[n]) < β.
Example: Mean of Uniform Noise
• Thus,
p(x; θ) =
{1βN
, if 0 < min(x[n]) and max(x[n]) < β
0, otherwise
or in a single line using the unit step function u(·):
p(x; θ) =1βN
u(β− max(x[n]))︸ ︷︷ ︸g(T(x),θ)
u(min(x[n]))︸ ︷︷ ︸h(x)
Example: Mean of Uniform Noise
• Hence we have a factorization required by theNeyman-Fisher factorization theorem, and we know thatthe sufficient statistic is T(x) = max(x[n]).
• Surprise: all the data needed by the MVU is only in thelargest sample.
Example: Mean of Uniform Noise
• If we had unlimited time, we could now prove that T(x) iscomplete. Well, it is.
• In order to find the function g(·) that makes T(x) unbiasedwe could check what’s E(T(x)). We omit this proof as well,and just note that
E(T(x)) =2NN+ 1
θ
• Therefore, the MVU results from the application of such gthat the result is unbiased. Let’s choose
g(x) =N+ 1
2Nx
Example: Mean of Uniform Noise
• Result: the MVU of the mean of the uniform distribution is
θ = g(T(x)) =N+ 1
2Nmax(x[n]).
Example: Mean of Uniform Noise
• The optimality can be seen also in practice. The plot belowshows the distribution of the MVU estimator (top) in a testof 100 realizations of uniform noise with β = 3. The otherunbiased estimators that one might think of are the samplemean (center) and the sample median (bottom), that bothhave significantly worse performance than the MVUE.
Example: Mean of Uniform Noise
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
20
40
60MVU: Sample variance = 0.00022122. Sample mean = 1.4997
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
5
10
15MEAN: Sample variance = 0.0090779. Sample mean = 1.5099
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
2
4
6
8MEDIAN: Sample variance = 0.024855. Sample mean = 1.5179
Finding estimator MVU using sufficient statistic- vector parameter
• Both theorems extend to the vector case in a natural way.
TheoremNeyman-Fisher Factorization - vector parameter:If we can factorize the pdf p(x,θ) as
p(x;θ) = g(T(x),θ)h(x) (1)
where g is a function depending only on x through T(x) ∈ Rr×1, andalso on θ, and h is a function depending only on x, then T(x) is asufficient statistic for θ. Conversely, if T(x) is a sufficient statistic forθ, then the pdf can be factorized as in (1).
Finding estimator MVU using sufficient statistic- vector parameter
TheoremRao-Blackwell-Lehmann-Scheffe - vector parameterIf θ is an unbiased estimator of θ and T(x) is an r× 1 sufficientstatistic for θ, then θ = E(θ | T) is
1 a valid estimator for θ (not dependent on θ)2 unbiased3 of lesser or equal variance than that of θ, for all θ (each element
of θ has lesser or equal variance).Additionally, if the sufficient statistic is complete, then θ is the MVUestimator.
Finding estimator MVU using sufficient statistic- vector parameter
CorollaryIf the sufficient statistic T(x) is complete, then there’s only onefunction g(·) that makes it unbiased. Applied to the statistic, g givesthe MVU:
θ = g(T(x)).
Example: Estimation of Both DC Level andNoise Power
• Consider the DC level in WGN problem,
x[n] = A+w[n],
where w[n] is WGN, and the problem is to estimate A andthe variance σ2 of w[n] simultaneously.
• The parameter vector is θ = [A,σ2]T .
Example: Estimation of Both DC Level andNoise Power
• The PDF is
p(x;θ) =1
(2πσ2)N2
exp
[−
12σ2
N−1∑n=0
(x[n] −A)2
]
=1
(2πσ2)N2
exp
[−
12σ2
(N−1∑n=0
x2[n] − 2AN−1∑n=0
x[n] +NA2
)]
• By selecting T(x) = [T1(x), T2(x)]T with T1(x) =∑x[n] and
T2(x) =∑x2[n], we arrive at the Neyman-Fisher
factorization.
Example: Estimation of Both DC Level andNoise Power
• The factorization is now
p(x;θ) =1
(2πσ2)N2
exp[−
12σ2
(T2(x) − 2AT1(x) +NA2)]
︸ ︷︷ ︸g(T(x);θ)
· 1︸︷︷︸h(x)
• The sufficient statistic is thus
T(x) =
N−1∑n=0
x[n]
N−1∑n=0
x2[n]
Example: Estimation of Both DC Level andNoise Power
• The next step is to find a transformation g(·) that makesthe statistic T(x) unbiased.
• Let’s find the expectations:
E(T(x)) =
N−1∑n=0
E(x[n])
N−1∑n=0
E(x2[n])
=
(NA
N(σ2 +A2)
)
Example: Estimation of Both DC Level andNoise Power
• The question is: how to remove the bias from bothcomponents?
• How about simply dividing by N:
g(T(x)) =1N
N−1∑n=0
x[n]
N−1∑n=0
x2[n]
=⇒ E(g(T(x))) =(
A
σ2 +A2
),
(A
σ2
)
No! The second term won’t get unbiased this easily.
Example: Estimation of Both DC Level andNoise Power
• What about trying to cancel the extra A2 term by usingT1(x):
g(T(x)) =( 1
NT1(x)1NT2(x) − [ 1
NT1(x)]2
)=
x
1N
N−1∑n=0
x2[n] − x2
• This way the expectation becomes quite close to the true
value:
E(g(T(x))) =
E(x)
E
(1N
N−1∑n=0
x2[n] − x2) =
(A
σ2 +A2 − E(x2)
)
Example: Estimation of Both DC Level andNoise Power
• Since x is Gaussian, we know the expectation of its square:E(x2) = A2 + σ2/N, which yields
E(g(T(x))) =(
A
σ2 +A2 −A2 − σ2/N
)=
(A
(N− 1)σ2/N
)If we multiply the second term with N/(N− 1) the resultwill become unbiased. Thus, the MVUE is
θ =
x
NN−1
(1N
N−1∑n=0
x2[n] − x2) = · · · =
x
1N−1
N−1∑n=0
(x[n] − x)2
Example: Estimation of Both DC Level andNoise Power
• The estimator (or its second term) is not efficient. It can beshown that the variance of the second term is larger thanthe CRLB:
Cθ =
(σ2
N 00 2σ4
N−1
)while I−1(θ) =
(σ2
N 00 2σ4
N
)