lecture 8 the principle of maximum likelihood
DESCRIPTION
Lecture 8 The Principle of Maximum Likelihood. Syllabus. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/1.jpg)
Lecture 8
The Principle of Maximum Likelihood
![Page 2: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/2.jpg)
SyllabusLecture 01 Describing Inverse ProblemsLecture 02 Probability and Measurement Error, Part 1Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least SquaresLecture 05 A Priori Information and Weighted Least SquaredLecture 06 Resolution and Generalized InversesLecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and VarianceLecture 08 The Principle of Maximum LikelihoodLecture 09 Inexact TheoriesLecture 10 Nonuniqueness and Localized AveragesLecture 11 Vector Spaces and Singular Value DecompositionLecture 12 Equality and Inequality ConstraintsLecture 13 L1 , L∞ Norm Problems and Linear ProgrammingLecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor AnalysisLecture 18 Varimax Factors, Empircal Orthogonal FunctionsLecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s ProblemLecture 20 Linear Operators and Their AdjointsLecture 21 Fréchet DerivativesLecture 22 Exemplary Inverse Problems, incl. Filter DesignLecture 23 Exemplary Inverse Problems, incl. Earthquake LocationLecture 24 Exemplary Inverse Problems, incl. Vibrational Problems
![Page 3: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/3.jpg)
Purpose of the Lecture
Introduce the spaces of all possible data,all possible models and the idea of likelihood
Use maximization of likelihood as a guiding principle for solving inverse problems
![Page 4: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/4.jpg)
Part 1
The spaces of all possible data,all possible models and the idea of
likelihood
![Page 5: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/5.jpg)
viewpoint
the observed data is one point in the space of all possible observations
or
dobs is a point in S(d)
![Page 6: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/6.jpg)
d2
d3
d1O
plot of dobs
![Page 7: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/7.jpg)
d2
d3
d1O
dobs
plot of dobs
![Page 8: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/8.jpg)
now suppose …the data are independent
each is drawn from a Gaussian distribution
with the same mean m1 and variance σ2
(but m1 and σ unknown)
![Page 9: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/9.jpg)
d2
d3
d1O
plot of p(d)
![Page 10: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/10.jpg)
d2
d3
d1O
plot of p(d)
cloud centered on d1=d2=d3with radius proportional to σ
![Page 11: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/11.jpg)
now interpret …
p(dobs)as the probability that the observed data was in
fact observed
L = log p(dobs)called the likelihood
![Page 12: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/12.jpg)
find parameters in the distribution
maximizep(dobs)with respect to m1 and σ
maximize the probability that the observed datawere in fact observed
thePrinciple of Maximum Likelihood
![Page 13: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/13.jpg)
Example
![Page 14: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/14.jpg)
solving the two equations
![Page 15: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/15.jpg)
solving the two equations
usual formula for the sample
mean
almost the usual formula for the sample standard
deviation
![Page 16: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/16.jpg)
these two estimates linked to the assumption of the data being Gaussian-distributed
might get a different formula for a different p.d.f.
![Page 17: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/17.jpg)
L(m 1,
σ)
σm1
maximumlikelihoodpoint
example of a likelihood surface
![Page 18: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/18.jpg)
d1d 2 d 2d1
p(d1, ,d1 ) p(d1, ,d1 )(A) (B)
likelihood maximization process will fail if p.d.f. has no well-defined peak
![Page 19: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/19.jpg)
Part 2
Using the maximization of likelihood as a guiding principle for
solving inverse problems
![Page 20: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/20.jpg)
linear inverse problem for with Gaussian-distibuted data
with known covariance [cov d]assumeGm=d
gives the mean dT
![Page 21: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/21.jpg)
principle of maximum likelihood
maximize L = log p(dobs)minimize
with respect to m
T
![Page 22: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/22.jpg)
principle of maximum likelihood
maximize L = log p(dobs)minimize
This is just weighted least squares
E = T
![Page 23: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/23.jpg)
principle of maximum likelihood
when data Gaussian-distributedsolve Gm=d with weighted least
squares
with weighting of
![Page 24: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/24.jpg)
special case of uncorrelated dataeach datum with a different variance[cov d]ii = σdi2
minimize
![Page 25: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/25.jpg)
special case of uncorrelated dataeach datum with a different variance[cov d]ii = σdi2
minimize
errors weighted by
their certainty
![Page 26: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/26.jpg)
but what about a priori information?
![Page 27: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/27.jpg)
probabilistic representation of a priori information
probability that the model parameters are
near mgiven by p.d.f.
pA(m)
![Page 28: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/28.jpg)
probabilistic representation of a priori information
probability that the model parameters are
near mgiven by p.d.f.
pA(m)centered at a priori value
<m>
![Page 29: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/29.jpg)
probabilistic representation of a priori information
probability that the model parameters are
near mgiven by p.d.f.
pA(m)variance reflects uncertainty in a
priori information
![Page 30: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/30.jpg)
certain uncertain<m2> <m2><m 1> <m 1>
m1 m1
m2 m2
![Page 31: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/31.jpg)
<m2><m 1>
m1
m2
![Page 32: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/32.jpg)
linear relationship approximation with Gaussian<m2>
<m 1 >m1 m1
m2 m2
![Page 33: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/33.jpg)
m1
m2 p=constantp=0
![Page 34: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/34.jpg)
assessing the information contentin pA(m)
Do we know a little about mor
a lot about m ?
![Page 35: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/35.jpg)
Information Gain, S
-S called Relative Entropy,
![Page 36: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/36.jpg)
Relative Entropy, Salso called Information Gain
null p.d.f.state of no knowledge
![Page 37: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/37.jpg)
Relative Entropy, Salso called Information Gain
uniform p.d.f. might work for this
![Page 38: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/38.jpg)
probabilistic representation of data
probability that the data are
near dgiven by p.d.f.
pA(d)
![Page 39: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/39.jpg)
probabilistic representation of data
probability that the data are
near dgiven by p.d.f.
p(d)centered at
observed data dobs
![Page 40: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/40.jpg)
probabilistic representation of data
probability that the data are
near dgiven by p.d.f.
p(d)variance reflects
uncertainty in measurements
![Page 41: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/41.jpg)
probabilistic representation of both prior information and observed data
assume observations and a priori information are uncorrelated
![Page 42: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/42.jpg)
dobs model, m
datu
m,
d map
Example of
![Page 43: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/43.jpg)
the theoryd = g(m)is a surface in the combined space of
data and model parameters
on which the estimated model parameters and predicted data must lie
![Page 44: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/44.jpg)
the theoryd = g(m)is a surface in the combined space of
data and model parameters
on which the estimated model parameters and predicted data must lie
for a linear theorythe surface is planar
![Page 45: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/45.jpg)
the principle of maximum likelihood says
maximize
on the surface d=g(m)
![Page 46: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/46.jpg)
0 1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
m
d
dobs model, m
datu
m, d
mapmest
d pred=g(m)
0 1 2 3 4 50
0.1
0.2
s
P(s
)
position along curve, sp(s)
(B)
smax
(A)
![Page 47: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/47.jpg)
0 1 2 3 4 50
0.5
1
s
P(s
)
0 1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
m
d
dobs model, m
datu
m,
d mest≈map
d pre
d=g(m)
position along curve, s
p(s)
smax
![Page 48: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/48.jpg)
0 1 2 3 4 50
0.5
1
s
P(s
)
0 1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
m
dd pre ≈dob s model, m
datu
m,
d mestd=g(m)
position along curve, s
p(s)
map
smax
(A)
(B)
![Page 49: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/49.jpg)
principle of maximum likelihoodwith
Gaussian-distributed dataGaussian-distributed a priori information
minimize
![Page 50: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/50.jpg)
this is just weighted least squareswith
so we already know the solution
![Page 51: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/51.jpg)
solve Fm=f with simple least squares
![Page 52: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/52.jpg)
when [cov d]=σd2I and [cov m]=σm2I
![Page 53: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/53.jpg)
this provides and answer to the question
What should be the value of ε2in damped least squares?
The answer
it should be set to the ratio of variances of the data and the a priori model parameters
![Page 54: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/54.jpg)
if the a priori information is
Hm=hwith covariance [cov h]Athen the Fm=f becomes
![Page 55: Lecture 8 The Principle of Maximum Likelihood](https://reader036.vdocument.in/reader036/viewer/2022062305/56813ada550346895da31bb4/html5/thumbnails/55.jpg)
Gm=dobs with covariance [cov d]Hm=h with covariance [cov h]Amest = (FTF)-1FTdobs
with
the most useful formula in inverse theory