clive d rodgers - european space...
TRANSCRIPT
1
PRINCIPLES OF RETRIEVAL THEORY
Clive D Rodgers
Atmospheric, Oceanic and Planetary Physics
University of Oxford
ESA Advanced Atmospheric Training Course
September 15th – 20th, 2008
2
ATMOSPHERIC REMOTE SENSING: THE INVERSE PROBLEM
Topics
1. Introduction
2. Bayesian approach
3. Information content
4. Error analysis and characterisation
3
ADVERTISEMENT
C. D. Rodgers, Inverse Methods for Atmospheric Sounding: Theory and Practice,
World Scientific Publishing Co., 2000.
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
•
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
◦ Describe the measurement in terms of some Forward Model
◦ Don’t forget experimental error!
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
◦ Describe the measurement in terms of some Forward Model
◦ Don’t forget experimental error!
• Finding a solution, inverting the forward model
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
◦ Describe the measurement in terms of some Forward Model
◦ Don’t forget experimental error!
• Finding a solution, inverting the forward model
◦ Algebraic
◦ Numerical
◦ No unique solution
◦ No solution at all
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
◦ Describe the measurement in terms of some Forward Model
◦ Don’t forget experimental error!
• Finding a solution, inverting the forward model
◦ Algebraic
◦ Numerical
◦ No unique solution
◦ No solution at all
• Finding the ‘best’ solution
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
◦ Describe the measurement in terms of some Forward Model
◦ Don’t forget experimental error!
• Finding a solution, inverting the forward model
◦ Algebraic
◦ Numerical
◦ No unique solution
◦ No solution at all
• Finding the ‘best’ solution
◦ Uniqueness - a unique solution may not be the best...
◦ Accuracy
◦ Efficiency
4
WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?
• Almost any measurement you make...
• When you measure some function of the quantity you really want, you have a retrieval
problem.
• Sometimes it’s trivial, sometimes it isn’t.
• Various aspects:
• Formulate the problem properly:
◦ Describe the measurement in terms of some Forward Model
◦ Don’t forget experimental error!
• Finding a solution, inverting the forward model
◦ Algebraic
◦ Numerical
◦ No unique solution
◦ No solution at all
• Finding the ‘best’ solution
◦ Uniqueness - a unique solution may not be the best...
◦ Accuracy
◦ Efficiency
• Understanding the answer
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
• You always have some prior knowledge of the ‘unknown’
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
• You always have some prior knowledge of the ‘unknown’
• the measurement improves that knowledge
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
• You always have some prior knowledge of the ‘unknown’
• the measurement improves that knowledge
• the measurement may not be enough by itself to completely determine the unknown
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
• You always have some prior knowledge of the ‘unknown’
• the measurement improves that knowledge
• the measurement may not be enough by itself to completely determine the unknown
• Ill-posed problems
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
• You always have some prior knowledge of the ‘unknown’
• the measurement improves that knowledge
• the measurement may not be enough by itself to completely determine the unknown
• Ill-posed problems
• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.
5
THINGS TO THINK ABOUT
• Why isn’t the problem trivial?
• Forward models which are not explicitly invertible
• Ill-conditioned or ill-posed problems
• Errors in the measurement (and in the forward model) can map into errors in the solution in
a non-trivial way.
• What to measure?
• Does it actually contain the information you want?
• Updating existing knowledge
• You always have some prior knowledge of the ‘unknown’
• the measurement improves that knowledge
• the measurement may not be enough by itself to completely determine the unknown
• Ill-posed problems
• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.
• Which of an infinite manifold of solutions do you want?
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
• Express the unknown in terms of a finite number of parameters
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
• Express the unknown in terms of a finite number of parameters
• They do not all have to be of the same type
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
• Express the unknown in terms of a finite number of parameters
• They do not all have to be of the same type
• Arrange them as a vector for computational purposes
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
• Express the unknown in terms of a finite number of parameters
• They do not all have to be of the same type
• Arrange them as a vector for computational purposes
• Examples:
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
• Express the unknown in terms of a finite number of parameters
• They do not all have to be of the same type
• Arrange them as a vector for computational purposes
• Examples:
◦ Temperature on a set of pressure levels, with a specified interpolation rule.
6
MATHEMATICAL CONCEPTS I
• Measurement Vector: y = (y1, y2, ...ym)
• Any measurement is of a finite number of quantities.
• Arrange them as a vector for computational purposes
• State Vector: x = (x1, x2, ...xn)
• The desired quantity is often continuous - e.g. a temperature profile
• We can only make a finite number of measurements and calculations
• Express the unknown in terms of a finite number of parameters
• They do not all have to be of the same type
• Arrange them as a vector for computational purposes
• Examples:
◦ Temperature on a set of pressure levels, with a specified interpolation rule.
◦ Fourier coefficients for a set of waves
7
MATHEMATICAL CONCEPTS II
Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the
forward model is not linear
7
MATHEMATICAL CONCEPTS II
Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the
forward model is not linear
Measurement Space
– Measurement space is the space of measurement vectors, dimension m.
7
MATHEMATICAL CONCEPTS II
Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the
forward model is not linear
Measurement Space
– Measurement space is the space of measurement vectors, dimension m.
State Space
– State space is the space of state vectors, dimension n.
7
MATHEMATICAL CONCEPTS II
Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the
forward model is not linear
Measurement Space
– Measurement space is the space of measurement vectors, dimension m.
State Space
– State space is the space of state vectors, dimension n.
Generally the two vector spaces will have different dimensions.
8
MATHEMATICAL CONCEPTS III
8
MATHEMATICAL CONCEPTS III
Forward Function and Model
– The Forward Function f(x) maps from state space onto measurement space, depending on the
physics of the measurement.
8
MATHEMATICAL CONCEPTS III
Forward Function and Model
– The Forward Function f(x) maps from state space onto measurement space, depending on the
physics of the measurement.
– The Forward Model F(x)is the best we can do in the circumstances to model the forward
function
8
MATHEMATICAL CONCEPTS III
Forward Function and Model
– The Forward Function f(x) maps from state space onto measurement space, depending on the
physics of the measurement.
– The Forward Model F(x)is the best we can do in the circumstances to model the forward
function
Inverse or Retrieval Method
– The inverse problem is one of finding an inverse mapping R(y):
Given a point in measurement space, which point or set of points in state space could havemapped into it?
9
NOTATION
I have tried to make it mnemonic as far as possible:
Matrices Bold upper case
Vectors Bold lower case
State vectors xMeasurement vectors yCovariance matrices S (based on σ, and not wanting to use Σ)
Measurement error ε
Forward model F(x) (Really ought to be f(x) it’s a vector)
Jacobian K (originally a Kernel of an integral transform)
Gain matrix G (I’ve used up K, which might stand for Kalman gain)
Averaging Kernel A
Different vectors, matrices of the same type are distinguished by superscripts, subscripts, etc:
a priori xa (background)
estimate xfirst guess x0
‘true’ value x (no subscript)
10
STANDARD ILLUSTRATION
Idealised thermal-emission nadir sounder represented as a linear forward model:
y = Kx + ε
K is the ‘weighting function’ matrix, ε is measurement error or noise.
10
STANDARD ILLUSTRATION
Idealised thermal-emission nadir sounder represented as a linear forward model:
y = Kx + ε
K is the ‘weighting function’ matrix, ε is measurement error or noise.
• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –
around 0 to 70 km.
10
STANDARD ILLUSTRATION
Idealised thermal-emission nadir sounder represented as a linear forward model:
y = Kx + ε
K is the ‘weighting function’ matrix, ε is measurement error or noise.
• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –
around 0 to 70 km.
• Eight channels (elements of y).
10
STANDARD ILLUSTRATION
Idealised thermal-emission nadir sounder represented as a linear forward model:
y = Kx + ε
K is the ‘weighting function’ matrix, ε is measurement error or noise.
• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –
around 0 to 70 km.
• Eight channels (elements of y).
• State vector is notionally temperature at 100 levels.
10
STANDARD ILLUSTRATION
Idealised thermal-emission nadir sounder represented as a linear forward model:
y = Kx + ε
K is the ‘weighting function’ matrix, ε is measurement error or noise.
• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –
around 0 to 70 km.
• Eight channels (elements of y).
• State vector is notionally temperature at 100 levels.
• Measurement error (when considered) is 0.5 K.
11
STANDARD WEIGHTING FUNCTIONS
12
EXACT RETRIEVAL SIMULATION
The state vector is a set of
eight coefficients of a degree seven
polynomial.
(a) Original profile: US standard
atmosphere
(b) Exact retrieval with no
experimental error
(c) Exact retrieval with simulated
0.5 K error
13
NOISE FREE MEASUREMENTS
Row Space and Null Space
Consider an error-free linear measurement, equivalent to solving linear equations:
y = Kx
13
NOISE FREE MEASUREMENTS
Row Space and Null Space
Consider an error-free linear measurement, equivalent to solving linear equations:
y = Kx
The rows of K are the weighting functions ki:
yi = kTi x
13
NOISE FREE MEASUREMENTS
Row Space and Null Space
Consider an error-free linear measurement, equivalent to solving linear equations:
y = Kx
The rows of K are the weighting functions ki:
yi = kTi x
The ki are a set of vectors in state space; the measurements are projections of the state x onto
them.
13
NOISE FREE MEASUREMENTS
Row Space and Null Space
Consider an error-free linear measurement, equivalent to solving linear equations:
y = Kx
The rows of K are the weighting functions ki:
yi = kTi x
The ki are a set of vectors in state space; the measurements are projections of the state x onto
them.
They span a subspace called the row space, of dimension equal to the rank of K,
p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.
13
NOISE FREE MEASUREMENTS
Row Space and Null Space
Consider an error-free linear measurement, equivalent to solving linear equations:
y = Kx
The rows of K are the weighting functions ki:
yi = kTi x
The ki are a set of vectors in state space; the measurements are projections of the state x onto
them.
They span a subspace called the row space, of dimension equal to the rank of K,
p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.
Only those components of x in the row space can be measured.
13
NOISE FREE MEASUREMENTS
Row Space and Null Space
Consider an error-free linear measurement, equivalent to solving linear equations:
y = Kx
The rows of K are the weighting functions ki:
yi = kTi x
The ki are a set of vectors in state space; the measurements are projections of the state x onto
them.
They span a subspace called the row space, of dimension equal to the rank of K,
p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.
Only those components of x in the row space can be measured.
The null space is the part of state space which is not in the row space.
14
ILL-POSED AND WELL-POSED PROBLEMS
Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?
Which is which?
14
ILL-POSED AND WELL-POSED PROBLEMS
Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?
Which is which?
1. p = m = n. Well posed.
The number of unknowns is equal to the number of measurements, and they are all independent.
14
ILL-POSED AND WELL-POSED PROBLEMS
Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?
Which is which?
1. p = m = n. Well posed.
The number of unknowns is equal to the number of measurements, and they are all independent.
2. p = m < n. Underconstrained, ill-posed
More unknowns than measurements, but the measurements are all independent.
14
ILL-POSED AND WELL-POSED PROBLEMS
Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?
Which is which?
1. p = m = n. Well posed.
The number of unknowns is equal to the number of measurements, and they are all independent.
2. p = m < n. Underconstrained, ill-posed
More unknowns than measurements, but the measurements are all independent.
3. p = n < m. Overconstrained, ill-posed
More measurements than unknowns, so they could be inconsistent, but the unknowns are all in
the row space, so there is information about all of them.
15
ILL-POSED AND WELL-POSED PROBLEMS II
Another category: Mixed-determined, the problem is both underconstrained and overconstrained.
15
ILL-POSED AND WELL-POSED PROBLEMS II
Another category: Mixed-determined, the problem is both underconstrained and overconstrained.
1. p < m = n.
The number of unknowns is equal to the number of measurements, but the measurements
are not independent, so they could be inconsistent, and the number of independent pieces of
information is less than the number of unknowns.
15
ILL-POSED AND WELL-POSED PROBLEMS II
Another category: Mixed-determined, the problem is both underconstrained and overconstrained.
1. p < m = n.
The number of unknowns is equal to the number of measurements, but the measurements
are not independent, so they could be inconsistent, and the number of independent pieces of
information is less than the number of unknowns. Simple example:
y1 = x1 + x2 + ε1 (1)
y2 = x1 + x2 + ε2 (2)
15
ILL-POSED AND WELL-POSED PROBLEMS II
Another category: Mixed-determined, the problem is both underconstrained and overconstrained.
1. p < m = n.
The number of unknowns is equal to the number of measurements, but the measurements
are not independent, so they could be inconsistent, and the number of independent pieces of
information is less than the number of unknowns. Simple example:
y1 = x1 + x2 + ε1 (1)
y2 = x1 + x2 + ε2 (2)
2. p < m < n.
More unknowns than measurements, but the measurements are not independent, so they could
be inconsistent.
15
ILL-POSED AND WELL-POSED PROBLEMS II
Another category: Mixed-determined, the problem is both underconstrained and overconstrained.
1. p < m = n.
The number of unknowns is equal to the number of measurements, but the measurements
are not independent, so they could be inconsistent, and the number of independent pieces of
information is less than the number of unknowns. Simple example:
y1 = x1 + x2 + ε1 (1)
y2 = x1 + x2 + ε2 (2)
2. p < m < n.
More unknowns than measurements, but the measurements are not independent, so they could
be inconsistent.
3. p < n < m.
More measurements than the rank, so they could be inconsistent, more unknowns than the rank,
so not all are defined by the measurement.
16
ILL-POSED AND WELL-POSED PROBLEMS III
Summary
If p < n then the system is underconstrained; there is a null space.
If p < m then the system is overconstrained in some part of the row space.
16
ILL-POSED AND WELL-POSED PROBLEMS III
Summary
If p < n then the system is underconstrained; there is a null space.
If p < m then the system is overconstrained in some part of the row space.
How do we identify the row and null spaces?
One straightforward way is Gram-Schmidt orthogonalisation, but . . .
17
SINGULAR VECTOR DECOMPOSITION [>>]
Is the neatest way of doing the job.
18
MATRIX ALGEBRA – EIGENVECTORS
The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find
eigenvectors l and scalar eigenvalues λ which satisfy
Al = λl
18
MATRIX ALGEBRA – EIGENVECTORS
The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find
eigenvectors l and scalar eigenvalues λ which satisfy
Al = λl
If A is a coordinate transformation, then l has the same representation in the untransformed and
transformed coordinates, apart from a factor λ.
18
MATRIX ALGEBRA – EIGENVECTORS
The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find
eigenvectors l and scalar eigenvalues λ which satisfy
Al = λl
If A is a coordinate transformation, then l has the same representation in the untransformed and
transformed coordinates, apart from a factor λ.
This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a
solution if |A− λI| = 0,
18
MATRIX ALGEBRA – EIGENVECTORS
The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find
eigenvectors l and scalar eigenvalues λ which satisfy
Al = λl
If A is a coordinate transformation, then l has the same representation in the untransformed and
transformed coordinates, apart from a factor λ.
This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a
solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They
will be complex in general.
18
MATRIX ALGEBRA – EIGENVECTORS
The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find
eigenvectors l and scalar eigenvalues λ which satisfy
Al = λl
If A is a coordinate transformation, then l has the same representation in the untransformed and
transformed coordinates, apart from a factor λ.
This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a
solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They
will be complex in general.
An eigenvector can be scaled by an arbitrary factor. It is conventional to normalise them so that
lT l = 1 or l†l = 1 (Hermitian adjoint)
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
Transpose LTAT = ΛLT
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
Transpose LTAT = ΛLT
Multiply by R = (LT )−1 AT = RΛLT
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
Transpose LTAT = ΛLT
Multiply by R = (LT )−1 AT = RΛLT
Postmultiply by R ATR = RΛ
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
Transpose LTAT = ΛLT
Multiply by R = (LT )−1 AT = RΛLT
Postmultiply by R ATR = RΛ
Thus:
R is the matrix of eigenvectors of AT .
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
Transpose LTAT = ΛLT
Multiply by R = (LT )−1 AT = RΛLT
Postmultiply by R ATR = RΛ
Thus:
R is the matrix of eigenvectors of AT .
AT has the same eigenvalues as A.
19
EIGENVECTORS II
We can assemble the eigenvectors as columns in a matrix L:
AL = LΛ
where Λ is a diagonal matrix, with the eigenvalues on the diagonal.
Transpose LTAT = ΛLT
Multiply by R = (LT )−1 AT = RΛLT
Postmultiply by R ATR = RΛ
Thus:
R is the matrix of eigenvectors of AT .
AT has the same eigenvalues as A.
In the case of a Symmetric matrix, S = ST we must have L = R, so that LTL = LLT = I or
LT = L−1, and the eigenvectors are orthogonal.
In this case the eigenvalues are real.
20
EIGENVECTORS – GEOMETRIC INTERPRETATION
Consider the scalar equation:
xTSx = 1
where S is symmetric. This is the equation of a quadratic surface centered on the origin, in
n-space.
20
EIGENVECTORS – GEOMETRIC INTERPRETATION
Consider the scalar equation:
xTSx = 1
where S is symmetric. This is the equation of a quadratic surface centered on the origin, in
n-space.
The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector,
20
EIGENVECTORS – GEOMETRIC INTERPRETATION
Consider the scalar equation:
xTSx = 1
where S is symmetric. This is the equation of a quadratic surface centered on the origin, in
n-space.
The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so
Sx = λx
is the problem of finding points where the normal and the radius vector are parallel. These are
where the principal axes intersect the surface.
20
EIGENVECTORS – GEOMETRIC INTERPRETATION
Consider the scalar equation:
xTSx = 1
where S is symmetric. This is the equation of a quadratic surface centered on the origin, in
n-space.
The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so
Sx = λx
is the problem of finding points where the normal and the radius vector are parallel. These are
where the principal axes intersect the surface.
At these points, xTSx = 1, so xTλx = 1 or:
λ =1
xTx
So the eigenvalues are the reciprocals of the squares of the lengths of the principal axes.
21
GEOMETRIC INTERPRETATION II
The lengths are independent of the coordinate system, so will also be invariant under an arbitrary
orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.
21
GEOMETRIC INTERPRETATION II
The lengths are independent of the coordinate system, so will also be invariant under an arbitrary
orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.
Consider using the eigenvectors of S to transform the equation for the quadratic surface:
xTLΛLTx = 1 or yTΛy = 1 orX
λiy2i = 1
where y = LTx or x = Ly. This transforms the surface into its principal axis representation.
22
EIGENVECTORS - USEFUL RELATIONSHIPS
Asymmetric Matrices Symmetric Matrices
AR = RΛ SL = LΛ
LTA = ΛLLT = R−1, RT = L−1 LT = L−1
LRT = LTR = I LLT = LTL = IA = RΛLT =
PλirilTi S = LΛLT =
PλililTi
AT = LΛRT =PλilirTi
A−1 = RΛ−1LT S−1 = LΛ−1LT
An = RΛnLT Sn = LΛnLT
LTAR = Λ LTSL = Λ
LTAnR = Λn LTSnL = Λn
LTA−1R = Λ−1 LTS−1L = Λ−1
|A| =Q
i λi |A| =Q
i λi
23
SINGULAR VECTOR DECOMPOSITION
The standard eigenvalue problem is meaningless for non-square matrices.
23
SINGULAR VECTOR DECOMPOSITION
The standard eigenvalue problem is meaningless for non-square matrices.
A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n
columns can be constructed:
Kv = λuKTu = λv (3)
where v, of length n, and u, of length m, are called the singular vectors of K.
23
SINGULAR VECTOR DECOMPOSITION
The standard eigenvalue problem is meaningless for non-square matrices.
A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n
columns can be constructed:
Kv = λuKTu = λv (3)
where v, of length n, and u, of length m, are called the singular vectors of K.
This is equivalent to the symmetric problem:„O K
KT O
«„uv
«= λ
„uv
«
23
SINGULAR VECTOR DECOMPOSITION
The standard eigenvalue problem is meaningless for non-square matrices.
A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n
columns can be constructed:
Kv = λuKTu = λv (3)
where v, of length n, and u, of length m, are called the singular vectors of K.
This is equivalent to the symmetric problem:„O K
KT O
«„uv
«= λ
„uv
«From (3) we can get
KTKv = λKTu = λ2vKKTu = λKv = λ2u (4)
so u and v are the eigenvectors of KKT (m×m) and KTK (n× n) respectively.
24
SINGULAR VECTOR DECOMPOSITION II
Care is needed in constructing a matrix of singular vectors, because individual u and v vectors
correspond to each other, yet there are potentially different numbers of v and u vectors.
24
SINGULAR VECTOR DECOMPOSITION II
Care is needed in constructing a matrix of singular vectors, because individual u and v vectors
correspond to each other, yet there are potentially different numbers of v and u vectors.
If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will
have p non-zero eigenvalues.
24
SINGULAR VECTOR DECOMPOSITION II
Care is needed in constructing a matrix of singular vectors, because individual u and v vectors
correspond to each other, yet there are potentially different numbers of v and u vectors.
If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will
have p non-zero eigenvalues.
The surplus eigenvectors will have zero eigenvalues, and can be discarded and we can write:„O K
KT O
«„UV
«=
„UV
«Λ
where Λ is p× p, U is m× p, and V is n× p.
There will be n+m− p more eigenvectors of the composite matrix, all with zero eigenvalue.
25
SINGULAR VECTORS - USEFUL RELATIONSHIPS
KV = UΛKTU = VΛ
UTKV = VTKTU = λ
K = UΛVT
KT= VΛUT
VTV = UTU = IpKKTU = UΛ2
KTKV = VΛ2(5)
26
SINGULAR VECTOR DECOMPOSITION [<<]
Is the neatest way of doing the job.
Express K as
mKn = mUpΛpVTn
where the subscripts indicate the sizes of the matrices.
26
SINGULAR VECTOR DECOMPOSITION [<<]
Is the neatest way of doing the job.
Express K as
mKn = mUpΛpVTn
where the subscripts indicate the sizes of the matrices.
27
SINGULAR VECTOR DECOMPOSITION [<<]
Is the neatest way of doing the job
Express K as
mKn = mUpΛpVTn
where the subscripts indicate the sizes of the matrices.
Then the forward model (no noise) becomes:
my1 = mKnx1 = mUpΛpVTnx1
27
SINGULAR VECTOR DECOMPOSITION [<<]
Is the neatest way of doing the job
Express K as
mKn = mUpΛpVTn
where the subscripts indicate the sizes of the matrices.
Then the forward model (no noise) becomes:
my1 = mKnx1 = mUpΛpVTnx1
so that
pUTmy1 = pΛpV
Tnx1
27
SINGULAR VECTOR DECOMPOSITION [<<]
Is the neatest way of doing the job
Express K as
mKn = mUpΛpVTn
where the subscripts indicate the sizes of the matrices.
Then the forward model (no noise) becomes:
my1 = mKnx1 = mUpΛpVTnx1
so that
pUTmy1 = pΛpV
Tnx1
or
y′ = Λx′
where y′ = UTy and x′ = VTx are both of order p.
27
SINGULAR VECTOR DECOMPOSITION [<<]
Is the neatest way of doing the job
Express K as
mKn = mUpΛpVTn
where the subscripts indicate the sizes of the matrices.
Then the forward model (no noise) becomes:
my1 = mKnx1 = mUpΛpVTnx1
so that
pUTmy1 = pΛpV
Tnx1
or
y′ = Λx′
where y′ = UTy and x′ = VTx are both of order p.
The rows of VT , or the columns of V (in state space) are a basis for the row space of K. Similarly
the columns of U (in measurement space) are a basis of its column space.
28
SINGULAR VECTOR DECOMPOSITION II
We can also see that an exact solution is
x′ = Λ−1y′
or
x = VΛ−1UTy
This is only a unique solution if p = n. If p < n any multiples of vectors with zero singular values
can be added, and still satisfy the equations.
29
SVD OF THE STANDARD WEIGHTING FUNCTIONS
30
EXACT RETRIEVAL SIMULATION
The state vector is a set of
eight coefficients of a degree seven
polynomial.
(a) Original profile: US standard
atmosphere
(b) Exact retrieval with no
experimental error
(c) Exact retrieval with simulated
0.5 K error
31
APPROACHES TO INVERSE PROBLEMS
31
APPROACHES TO INVERSE PROBLEMS
• Bayesian Approach
• What is the pdf of the state, given the measurement and the a priori ?
31
APPROACHES TO INVERSE PROBLEMS
• Bayesian Approach
• What is the pdf of the state, given the measurement and the a priori ?
• Optimisation Approaches:
• Maximum Likelihood
• Maximum A Posteriori
• Minimum Variance
• Backus-Gilbert - resolution/noise trade-off
31
APPROACHES TO INVERSE PROBLEMS
• Bayesian Approach
• What is the pdf of the state, given the measurement and the a priori ?
• Optimisation Approaches:
• Maximum Likelihood
• Maximum A Posteriori
• Minimum Variance
• Backus-Gilbert - resolution/noise trade-off
• Ad hoc Approaches
• Relaxation
• Exact algebraic solutions
32
BAYESIAN APPROACH
This is the most general approach to the problem (that I know of).
Knowledge is represented in terms of probability density functions:
• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make
the measurement.
32
BAYESIAN APPROACH
This is the most general approach to the problem (that I know of).
Knowledge is represented in terms of probability density functions:
• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make
the measurement.
• P (y) is the a priori p.d.f. of the measurement.
32
BAYESIAN APPROACH
This is the most general approach to the problem (that I know of).
Knowledge is represented in terms of probability density functions:
• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make
the measurement.
• P (y) is the a priori p.d.f. of the measurement.
• P (x, y) is the joint a priori p.d.f. of x and y.
32
BAYESIAN APPROACH
This is the most general approach to the problem (that I know of).
Knowledge is represented in terms of probability density functions:
• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make
the measurement.
• P (y) is the a priori p.d.f. of the measurement.
• P (x, y) is the joint a priori p.d.f. of x and y.
• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error
and the forward function.
32
BAYESIAN APPROACH
This is the most general approach to the problem (that I know of).
Knowledge is represented in terms of probability density functions:
• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make
the measurement.
• P (y) is the a priori p.d.f. of the measurement.
• P (x, y) is the joint a priori p.d.f. of x and y.
• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error
and the forward function.
• P (x|y) is the p.d.f. of the state given the measurement - this is what we want to find.
33
BAYES THEOREM
The theorem states:
P (x, y) = P (x|y)P (y)
and of course
P (y, x) = P (y|x)P (x)
33
BAYES THEOREM
The theorem states:
P (x, y) = P (x|y)P (y)
and of course
P (y, x) = P (y|x)P (x)
so that
P (x|y) =P (y|x)P (x)
P (y)
33
BAYES THEOREM
The theorem states:
P (x, y) = P (x|y)P (y)
and of course
P (y, x) = P (y|x)P (x)
so that
P (x|y) =P (y|x)P (x)
P (y)
If we have a prior p.d.f. for x, P(x), and we know statistically how y is related to x via P (y|x),
then we can find an un-normalised version of P (x|y), namely P (y|x)P (x), which can be
normalised if required.
34
BAYES THEOREM GEOMETRICALLY
35
APPLICATION OF THE BAYESIAN APPROACH
We need explicit forms for the p.d.f’s:
35
APPLICATION OF THE BAYESIAN APPROACH
We need explicit forms for the p.d.f’s:
• Assume that experimental error is Gaussian:
− lnP (y|x) =1
2(y − F(x))
TS−1ε (y − F(x)) + const
where F(x) is the Forward model:
y = F(x) + ε
and Sε is the covariance matrix of the experimental error, ε:
Sε = E{εεT} = E{(y − F(x))(y − F(x))T}
35
APPLICATION OF THE BAYESIAN APPROACH
We need explicit forms for the p.d.f’s:
• Assume that experimental error is Gaussian:
− lnP (y|x) =1
2(y − F(x))
TS−1ε (y − F(x)) + const
where F(x) is the Forward model:
y = F(x) + ε
and Sε is the covariance matrix of the experimental error, ε:
Sε = E{εεT} = E{(y − F(x))(y − F(x))T}
• Assume that the a priori p.d.f. is Gaussian (less justifiable):
− lnP (x) =1
2(x− xa)
TS−1a (x− xa) + const
i.e. x is distributed normally with mean xa and covariance Sa.
36
APPLICATION OF THE BAYESIAN APPROACH II
• Thus the pdf of the state when the measurements and the a priori are given is:
−2 lnP (x|y) = [y − F(x)]TS−1
ε [y − F(x)] + [x− xa]TS−1
a [x− xa] + const
36
APPLICATION OF THE BAYESIAN APPROACH II
• Thus the pdf of the state when the measurements and the a priori are given is:
−2 lnP (x|y) = [y − F(x)]TS−1
ε [y − F(x)] + [x− xa]TS−1
a [x− xa] + const
• If we want a state estimate x rather than a p.d.f., then we must calculate some function of
P (x|y), such as its mean or its maximum
x =
ZP (x|y)x dx or
dP (x|y)
dx= 0
37
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM
The linear problem has a forward model:
F(x) = Kx
so the p.d.f. P (x|y) becomes:
−2 lnP (x|y) = [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa] + c1
37
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM
The linear problem has a forward model:
F(x) = Kx
so the p.d.f. P (x|y) becomes:
−2 lnP (x|y) = [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa] + c1
This is quadratic in x, so has to be of the form:
−2 lnP (x|y) = [x− x]T S−1
[x− x] + c2
37
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM
The linear problem has a forward model:
F(x) = Kx
so the p.d.f. P (x|y) becomes:
−2 lnP (x|y) = [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa] + c1
This is quadratic in x, so has to be of the form:
−2 lnP (x|y) = [x− x]T S−1
[x− x] + c2
Equate the terms that are quadratic in x:
xTKTS−1ε Kx + xTS−1
a x = xT S−1x
37
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM
The linear problem has a forward model:
F(x) = Kx
so the p.d.f. P (x|y) becomes:
−2 lnP (x|y) = [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa] + c1
This is quadratic in x, so has to be of the form:
−2 lnP (x|y) = [x− x]T S−1
[x− x] + c2
Equate the terms that are quadratic in x:
xTKTS−1ε Kx + xTS−1
a x = xT S−1x
giving
S−1= KTS−1
ε K + S−1a
• ‘The Fisher Information Matrix’.
38
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II
Equating the terms linear in xT gives:
(−Kx)TS−1
ε (y) + (x)TS−1
a (−xa) = xT S−1(−x)
38
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II
Equating the terms linear in xT gives:
(−Kx)TS−1
ε (y) + (x)TS−1
a (−xa) = xT S−1(−x)
This must be valid for any x. Cancel the xT ’s, and substitute for S−1:
KTS−1ε y + S−1
a xa = (KTS−1ε K + S−1
a )x
38
BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II
Equating the terms linear in xT gives:
(−Kx)TS−1
ε (y) + (x)TS−1
a (−xa) = xT S−1(−x)
This must be valid for any x. Cancel the xT ’s, and substitute for S−1:
KTS−1ε y + S−1
a xa = (KTS−1ε K + S−1
a )x
giving:
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε y + S−1
a xa)
The mean x and the covariance S define the full posterior pdf.
39
A GEOMETRIC INTERPRETATION OF THE SOLUTION
40
AN ALGEBRAIC INTERPRETATION OF THE SOLUTION
The expected value is:
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε y + S−1
a xa) (1)
40
AN ALGEBRAIC INTERPRETATION OF THE SOLUTION
The expected value is:
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε y + S−1
a xa) (1)
Underconstrained case
There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.
For example G = KT (KKT )−1.
Replace y by Kxe in (1):
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxe + S−1
a xa)
40
AN ALGEBRAIC INTERPRETATION OF THE SOLUTION
The expected value is:
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε y + S−1
a xa) (1)
Underconstrained case
There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.
For example G = KT (KKT )−1.
Replace y by Kxe in (1):
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxe + S−1
a xa)
Overconstrained case
The least squares solution xl satisfies KTS−1ε Kxl = KTS−1
ε y.
Inserting this in (1) gives:
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxl + S−1
a xa)
41
AN INTERPRETATION OF THE SOLUTION II
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxe + S−1
a xa)
or
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxl + S−1
a xa)
41
AN INTERPRETATION OF THE SOLUTION II
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxe + S−1
a xa)
or
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxl + S−1
a xa)
• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using
relative weights KTS−1ε K and S−1
a respectively – their Fisher information matrices.
41
AN INTERPRETATION OF THE SOLUTION II
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxe + S−1
a xa)
or
x = (KTS−1ε K + S−1
a )−1
(KTS−1ε Kxl + S−1
a xa)
• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using
relative weights KTS−1ε K and S−1
a respectively – their Fisher information matrices.
• This is exactly like the familiar combination of scalar measurements x1 and x2 of an unknown
x, with variances σ21 and σ2
2 respectively:
x = (1/σ21 + 1/σ
22)−1
(x1/σ21 + x2/σ
22)
42
End of Section
43
INFORMATION CONTENT AND ERROR ANALYSIS
Clive D Rodgers
Atmospheric, Oceanic and Planetary Physics
University of Oxford
ESA Advanced Atmospheric Training Course
September 15th – 20th, 2008
44
INFORMATION CONTENT OF A MEASUREMENT
Information in a general qualitative sense:
Conceptually, what does y tell you about x?
44
INFORMATION CONTENT OF A MEASUREMENT
Information in a general qualitative sense:
Conceptually, what does y tell you about x?
We need to answer this
• to determine if a conceptual instrument design actually works
• to optimise designs
Use the linear problem for simplicity to illustrate the ideas.
y = Kx + ε
45
SHANNON INFORMATION
• The Shannon information content of a measurement of x is the change in the entropy of the
probability density function describing our knowledge of x.
45
SHANNON INFORMATION
• The Shannon information content of a measurement of x is the change in the entropy of the
probability density function describing our knowledge of x.
• Entropy is defined by:
S{P} = −ZP (x) log(P (x)/M(x))dx
M(x) is a measure function. We will take it to be constant.
45
SHANNON INFORMATION
• The Shannon information content of a measurement of x is the change in the entropy of the
probability density function describing our knowledge of x.
• Entropy is defined by:
S{P} = −ZP (x) log(P (x)/M(x))dx
M(x) is a measure function. We will take it to be constant.
• Compare this with the statistical mechanics definition of entropy:
S = −kXi
pi ln pi
P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.
45
SHANNON INFORMATION
• The Shannon information content of a measurement of x is the change in the entropy of the
probability density function describing our knowledge of x.
• Entropy is defined by:
S{P} = −ZP (x) log(P (x)/M(x))dx
M(x) is a measure function. We will take it to be constant.
• Compare this with the statistical mechanics definition of entropy:
S = −kXi
pi ln pi
P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.
• The Shannon information content of a measurement is the change in entropy between the p.d.f.
before, P (x), and the p.d.f. after, P (x|y), the measurement:
H = S{P (x)} − S{P (x|y)}
45
SHANNON INFORMATION
• The Shannon information content of a measurement of x is the change in the entropy of the
probability density function describing our knowledge of x.
• Entropy is defined by:
S{P} = −ZP (x) log(P (x)/M(x))dx
M(x) is a measure function. We will take it to be constant.
• Compare this with the statistical mechanics definition of entropy:
S = −kXi
pi ln pi
P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.
• The Shannon information content of a measurement is the change in entropy between the p.d.f.
before, P (x), and the p.d.f. after, P (x|y), the measurement:
H = S{P (x)} − S{P (x|y)}
What does this all mean?
46
ENTROPY OF A BOXCAR PDF
Consider a uniform p.d.f in one dimension, constant in (0,a):
P (x) = 1/a 0 < x < a
and zero outside.
46
ENTROPY OF A BOXCAR PDF
Consider a uniform p.d.f in one dimension, constant in (0,a):
P (x) = 1/a 0 < x < a
and zero outside.The entropy is given by
S = −Z a
0
1
aln
„1
a
«dx
46
ENTROPY OF A BOXCAR PDF
Consider a uniform p.d.f in one dimension, constant in (0,a):
P (x) = 1/a 0 < x < a
and zero outside.The entropy is given by
S = −Z a
0
1
aln
„1
a
«dx = ln a
46
ENTROPY OF A BOXCAR PDF
Consider a uniform p.d.f in one dimension, constant in (0,a):
P (x) = 1/a 0 < x < a
and zero outside.The entropy is given by
S = −Z a
0
1
aln
„1
a
«dx = ln a
Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:
S = −ZV
1
Vln
„1
V
«dv = lnV
46
ENTROPY OF A BOXCAR PDF
Consider a uniform p.d.f in one dimension, constant in (0,a):
P (x) = 1/a 0 < x < a
and zero outside.The entropy is given by
S = −Z a
0
1
aln
„1
a
«dx = ln a
Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:
S = −ZV
1
Vln
„1
V
«dv = lnV
i.e the entropy is the log of the volume of state space occupied by the p.d.f.
47
ENTROPY OF A GAUSSIAN PDF
Consider the Gaussian distribution:
P (x) =1
(2π)n2 |S|
12
exp[−1
2(x− x)
TS−1(x− x)]
47
ENTROPY OF A GAUSSIAN PDF
Consider the Gaussian distribution:
P (x) =1
(2π)n2 |S|
12
exp[−1
2(x− x)
TS−1(x− x)]
If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .
47
ENTROPY OF A GAUSSIAN PDF
Consider the Gaussian distribution:
P (x) =1
(2π)n2 |S|
12
exp[−1
2(x− x)
TS−1(x− x)]
If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .
• The contours of P (x) in n-space are ellipsoidal, described by
(x− x)TS−1
(x− x) = constant
47
ENTROPY OF A GAUSSIAN PDF
Consider the Gaussian distribution:
P (x) =1
(2π)n2 |S|
12
exp[−1
2(x− x)
TS−1(x− x)]
If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .
• The contours of P (x) in n-space are ellipsoidal, described by
(x− x)TS−1
(x− x) = constant
• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional
to the square roots of the corresponding eigenvalues.
47
ENTROPY OF A GAUSSIAN PDF
Consider the Gaussian distribution:
P (x) =1
(2π)n2 |S|
12
exp[−1
2(x− x)
TS−1(x− x)]
If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .
• The contours of P (x) in n-space are ellipsoidal, described by
(x− x)TS−1
(x− x) = constant
• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional
to the square roots of the corresponding eigenvalues.
• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which
is proportional to |S|12 .
47
ENTROPY OF A GAUSSIAN PDF
Consider the Gaussian distribution:
P (x) =1
(2π)n2 |S|
12
exp[−1
2(x− x)
TS−1(x− x)]
If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .
• The contours of P (x) in n-space are ellipsoidal, described by
(x− x)TS−1
(x− x) = constant
• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional
to the square roots of the corresponding eigenvalues.
• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which
is proportional to |S|12 .
• Therefore entropy is the log of the volume enclosed by some particular contour of P (x). A
‘volume of uncertainty’.
48
ENTROPY AND INFORMATION
Information content is the change in entropy,
48
ENTROPY AND INFORMATION
Information content is the change in entropy,
• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.
48
ENTROPY AND INFORMATION
Information content is the change in entropy,
• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.
• A generalisation of ‘signal to noise’.
48
ENTROPY AND INFORMATION
Information content is the change in entropy,
• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.
• A generalisation of ‘signal to noise’.
• In the Boxcar case, the log of the ratio of the volumes before and after
48
ENTROPY AND INFORMATION
Information content is the change in entropy,
• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.
• A generalisation of ‘signal to noise’.
• In the Boxcar case, the log of the ratio of the volumes before and after
• In the Gaussian case:
H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1
a )−1S−1
a |
48
ENTROPY AND INFORMATION
Information content is the change in entropy,
• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.
• A generalisation of ‘signal to noise’.
• In the Boxcar case, the log of the ratio of the volumes before and after
• In the Gaussian case:
H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1
a )−1S−1
a |
• minus the log of the determinant of the weight of xa in the Bayesian expectation.
49
The log of the ratio of the volumes of the ellipsoids
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so
the expected value of χ2 is m.
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so
the expected value of χ2 is m.
These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of
freedom for signal ds according to:
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so
the expected value of χ2 is m.
These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of
freedom for signal ds according to:
dn = E{[y −Kx]TS−1
ε [y −Kx]}
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so
the expected value of χ2 is m.
These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of
freedom for signal ds according to:
dn = E{[y −Kx]TS−1
ε [y −Kx]}
and
ds = E{[x− xa]TS−1
a [x− xa]}
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so
the expected value of χ2 is m.
These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of
freedom for signal ds according to:
dn = E{[y −Kx]TS−1
ε [y −Kx]}
and
ds = E{[x− xa]TS−1
a [x− xa]}
Using tr(CD)=tr(DC), we can see that
ds = E{tr([x− xa][x− xa]TS−1
a )}
50
DEGREES OF FREEDOM FOR SIGNAL AND NOISE
The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises
χ2
= [y −Kx]TS−1
ε [y −Kx] + [x− xa]TS−1
a [x− xa]
The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so
the expected value of χ2 is m.
These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of
freedom for signal ds according to:
dn = E{[y −Kx]TS−1
ε [y −Kx]}
and
ds = E{[x− xa]TS−1
a [x− xa]}
Using tr(CD)=tr(DC), we can see that
ds = E{tr([x− xa][x− xa]TS−1
a )}= tr(E{[x− xa][x− xa]
T}S−1a ) (6)
51
DEGREES OF FREEDOM FOR SIGNAL AND NOISE II
With some manipulation we can find
ds = tr((KTS−1ε K + S−1
a )−1KTS−1
ε K)
= tr(KSaKT(KSaK
T+ Sε)
−1)
51
DEGREES OF FREEDOM FOR SIGNAL AND NOISE II
With some manipulation we can find
ds = tr((KTS−1ε K + S−1
a )−1KTS−1
ε K)
= tr(KSaKT(KSaK
T+ Sε)
−1) (7)
and
dn = tr((KTS−1ε K + S−1
a )−1S−1
a ) +m− n= tr(Sε(KSaK
T+ Sε)
−1) (8)
52
INDEPENDENT MEASUREMENTS
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
52
INDEPENDENT MEASUREMENTS
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
Therefore it helps to understand where the information comes from if we transform to a different
basis.
52
INDEPENDENT MEASUREMENTS
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
Therefore it helps to understand where the information comes from if we transform to a different
basis.
First, statistical independence. Define:
y = S−1
2ε y x = S
−12
a x
52
INDEPENDENT MEASUREMENTS
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
Therefore it helps to understand where the information comes from if we transform to a different
basis.
First, statistical independence. Define:
y = S−1
2ε y x = S
−12
a x
The transformed covariances Sa and Sε both become unit matrices. [>>]
53
SQUARE ROOTS OF MATRICES
The square root of an arbitrary matrix is defined as A12 where
A12A
12 = A
53
SQUARE ROOTS OF MATRICES
The square root of an arbitrary matrix is defined as A12 where
A12A
12 = A
Using An = RΛnLT for n = 1/2:
A12 = RΛ
12LT
53
SQUARE ROOTS OF MATRICES
The square root of an arbitrary matrix is defined as A12 where
A12A
12 = A
Using An = RΛnLT for n = 1/2:
A12 = RΛ
12LT
This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ
12LT can
have either sign, leading to 2n possibilities.
53
SQUARE ROOTS OF MATRICES
The square root of an arbitrary matrix is defined as A12 where
A12A
12 = A
Using An = RΛnLT for n = 1/2:
A12 = RΛ
12LT
This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ
12LT can
have either sign, leading to 2n possibilities.
We only use square roots of symmetric covariance matrices. In this case S12 = LΛ
12LT is
symmetric.
54
SQUARE ROOTS OF SYMMETRIC MATRICES
Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS
12 , of which the
Cholesky decomposition:
S = TTT
where T is upper triangular, is the most useful.
54
SQUARE ROOTS OF SYMMETRIC MATRICES
Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS
12 , of which the
Cholesky decomposition:
S = TTT
where T is upper triangular, is the most useful.
There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so
is XS12 where X is any orthonormal matrix.
54
SQUARE ROOTS OF SYMMETRIC MATRICES
Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS
12 , of which the
Cholesky decomposition:
S = TTT
where T is upper triangular, is the most useful.
There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so
is XS12 where X is any orthonormal matrix.
The inverse symmetric square root is S−12 = LΛ−
12LT .
The inverse Cholesky decomposition is S−1 = T−1T−T .
The inverse square root T−1 is triangular, and its numerical effect is implemented efficiently by
back substitution.
55
INDEPENDENT MEASUREMENTS[<<]
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
Therefore it helps to understand where the information comes from if we transform to a different
basis.
First, statistical independence. Define:
y = S−1
2ε y x = S
−12
a x
The transformed covariances Sa and Sε both become unit matrices.
55
INDEPENDENT MEASUREMENTS[<<]
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
Therefore it helps to understand where the information comes from if we transform to a different
basis.
First, statistical independence. Define:
y = S−1
2ε y x = S
−12
a x
The transformed covariances Sa and Sε both become unit matrices.
The forward model becomes:
y = Kx + ε
where K = S−1
2ε KS
12a .
56
INDEPENDENT MEASUREMENTS[<<]
If the measurement error covariance is not diagonal, the elements of the y vector will not be
statistically independent. Likewise for the a priori.
The measurements will not be independent functions of the state if K is not diagonal.
Therefore it helps to understand where the information comes from if we transform to a different
basis.
First, statistical independence. Define:
y = S−1
2ε y x = S
−12
a x
The transformed covariances Sa and Sε both become unit matrices.
The forward model becomes:
y = Kx + ε
where K = S−1
2ε KS
12a .
The solution covariance becomes:ˆS = (In + KT K)
−1
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
Define:
x′ = VT x y′ = UT y ε′= UT
ε
The forward model becomes:
y′ = Λx′ + ε′
(1)
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
Define:
x′ = VT x y′ = UT y ε′= UT
ε
The forward model becomes:
y′ = Λx′ + ε′
(1)
The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,
hence the solution covariance becomes:
ˆS = (In + KT K)−1 → S′ = (In + Λ2
)−1
which is diagonal,
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
Define:
x′ = VT x y′ = UT y ε′= UT
ε
The forward model becomes:
y′ = Λx′ + ε′
(1)
The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,
hence the solution covariance becomes:
ˆS = (In + KT K)−1 → S′ = (In + Λ2
)−1
which is diagonal, and the solution itself is
x′ = (In + Λ2)−1
(Λy′ + x′a)
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
Define:
x′ = VT x y′ = UT y ε′= UT
ε
The forward model becomes:
y′ = Λx′ + ε′
(1)
The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,
hence the solution covariance becomes:
ˆS = (In + KT K)−1 → S′ = (In + Λ2
)−1
which is diagonal, and the solution itself is
x′ = (In + Λ2)−1
(Λy′ + x′a)
not x′ = Λ−1y′ as you might expect from (1).
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
Define:
x′ = VT x y′ = UT y ε′= UT
ε
The forward model becomes:
y′ = Λx′ + ε′
(1)
The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,
hence the solution covariance becomes:
ˆS = (In + KT K)−1 → S′ = (In + Λ2
)−1
which is diagonal, and the solution itself is
x′ = (In + Λ2)−1
(Λy′ + x′a)
not x′ = Λ−1y′ as you might expect from (1).
• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured
57
TRANSFORM AGAIN
Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors
of K:
y = Kx + ε → y = UΛVT x + ε
Define:
x′ = VT x y′ = UT y ε′= UT
ε
The forward model becomes:
y′ = Λx′ + ε′
(1)
The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,
hence the solution covariance becomes:
ˆS = (In + KT K)−1 → S′ = (In + Λ2
)−1
which is diagonal, and the solution itself is
x′ = (In + Λ2)−1
(Λy′ + x′a)
not x′ = Λ−1y′ as you might expect from (1).
• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured
• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are poorly measured.
58
INFORMATION
Shannon Information in the Transformed Basis
Because it is a ratio of volumes, the linear transformation does not change the information
content. So consider information in the x′, y′ system:
H = S{S′a} − S{S′}
= −1
2log(|In|) +
1
2log(|(Λ2
+ I)−1|)
=Xi
1
2log(1 + λ
2i ) (9)
59
DEGREES OF FREEDOM
Degres of Freedom in the Transformed Basis
The number of independent quantities measured can be thought of as the number of singular
values for which λi � 1
The degrees of freedom for signal is
ds =Xi
λ2i (1 + λ
2i )−1
It is also the sum of the eigenvalues of In − ˆS.
Summary: for each independent component x′i
• The information content is 12 log(1 + λ2
i )
• The degrees of freedom for signal is λ2i (1 + λ2
i )−1
60
THE OBSERVABLE AND NULL SPACES
• The part of measurement space that can be seen is that spanned by the weighting functions.
Anything outside that is in the null space of K.
60
THE OBSERVABLE AND NULL SPACES
• The part of measurement space that can be seen is that spanned by the weighting functions.
Anything outside that is in the null space of K.
• Any orthogonal linear combination of the weighting functions will form a basis (coordinate
system) for the observable space. An example is those singular vectors of K which have non-zero
singular values.
60
THE OBSERVABLE AND NULL SPACES
• The part of measurement space that can be seen is that spanned by the weighting functions.
Anything outside that is in the null space of K.
• Any orthogonal linear combination of the weighting functions will form a basis (coordinate
system) for the observable space. An example is those singular vectors of K which have non-zero
singular values.
• The vectors which have zero singular values form a basis for the null space.
60
THE OBSERVABLE AND NULL SPACES
• The part of measurement space that can be seen is that spanned by the weighting functions.
Anything outside that is in the null space of K.
• Any orthogonal linear combination of the weighting functions will form a basis (coordinate
system) for the observable space. An example is those singular vectors of K which have non-zero
singular values.
• The vectors which have zero singular values form a basis for the null space.
• Any component of the state in the null space maps onto the origin in measurement space.
60
THE OBSERVABLE AND NULL SPACES
• The part of measurement space that can be seen is that spanned by the weighting functions.
Anything outside that is in the null space of K.
• Any orthogonal linear combination of the weighting functions will form a basis (coordinate
system) for the observable space. An example is those singular vectors of K which have non-zero
singular values.
• The vectors which have zero singular values form a basis for the null space.
• Any component of the state in the null space maps onto the origin in measurement space.
• This implies that there are distinct states, in fact whole subspaces, which map onto the same
point in measurement space, and cannot be distinguished by the measurement.
61
THE NEAR NULL SPACE
However
• the solution can have components in the null space - from the a priori.
61
THE NEAR NULL SPACE
However
• the solution can have components in the null space - from the a priori.
• components observable in principle can have near zero contributions from the measurement, the
‘near null space’
61
THE NEAR NULL SPACE
However
• the solution can have components in the null space - from the a priori.
• components observable in principle can have near zero contributions from the measurement, the
‘near null space’
In the x′, y′ system:
• vectors with λ = 0 are in the null space
61
THE NEAR NULL SPACE
However
• the solution can have components in the null space - from the a priori.
• components observable in principle can have near zero contributions from the measurement, the
‘near null space’
In the x′, y′ system:
• vectors with λ = 0 are in the null space
• vectors with λ� 1 are in the near null space
61
THE NEAR NULL SPACE
However
• the solution can have components in the null space - from the a priori.
• components observable in principle can have near zero contributions from the measurement, the
‘near null space’
In the x′, y′ system:
• vectors with λ = 0 are in the null space
• vectors with λ� 1 are in the near null space
• vectors with λ >∼ 1 are in the non-null space, and are observable.
62
Diagnostics for the Standard Case
Table 1: Singular values of K, together with contributions of each vector to the degrees of freedom
ds and information content H for both covariance matrices. Measurement noise variance is 0.25 K2.
Diagonal covariance Full covariance
i λi dsi Hi (bits) λi dsi Hi (bits)
1 6.51929 0.97701 2.72149 27.81364 0.99871 4.79865
2 4.79231 0.95827 2.29147 18.07567 0.99695 4.17818
3 3.09445 0.90544 1.70134 9.94379 0.98999 3.32105
4 1.84370 0.77269 1.06862 5.00738 0.96165 2.35227
5 1.03787 0.51858 0.52731 2.39204 0.85123 1.37443
6 0.55497 0.23547 0.19368 1.09086 0.54337 0.56546
7 0.27941 0.07242 0.05423 0.46770 0.17948 0.14270
8 0.13011 0.01665 0.01211 0.17989 0.03135 0.02297
totals 4.45653 8.57024 5.55272 16.75571
Sa = 100I K2
Saij = 100[1− exp(−
|zi − zj|h
)]
where h = 1 in ln(p) coordinates.
63
FTIR Measurements of CO
30 levels; 894 measurements:
Apparently heavily overconstrained.
63
FTIR Measurements of CO
30 levels; 894 measurements:
Apparently heavily overconstrained.
Table 2: Singular Values of K
5.345 3.498 0.033 0.0046 7.15e-4 2.56e-4
7.76e-5 2.13e-3 1.71e-5 1.38e-5 9.01e-6 6.73e-6
5.82e-6 4.79e-6 2,87e-6 3.52e-6 3.75e-6 1.91e-6
9.83e-7 2.37e-7 7.71e-7 1.18e-7 1.48e-6 1.95e-7
1.37e-7 6.67e-8 3.50e-8 3.37e-8 5.83e-9 6.29e-9
Noise is 0.03 in these units;
Prior variance is about 1.
There are 2.5 degrees of freedom.
Lots of near null space.
64
A SIMULATED RETRIEVAL
——— ’true state’
· · · · a priori
— — noise free simulation
— · — with simulated noise
How do we understand the nature of a
retrieval?
65
ERROR ANALYSIS AND CHARACTERISATION
Conceptually
• The measurement y is a function of some unknown state x:
y = f(x, b) + ε
where:
65
ERROR ANALYSIS AND CHARACTERISATION
Conceptually
• The measurement y is a function of some unknown state x:
y = f(x, b) + ε
where:
• y is the measurement vector, length m
65
ERROR ANALYSIS AND CHARACTERISATION
Conceptually
• The measurement y is a function of some unknown state x:
y = f(x, b) + ε
where:
• y is the measurement vector, length m
• x is the state vector, length n
65
ERROR ANALYSIS AND CHARACTERISATION
Conceptually
• The measurement y is a function of some unknown state x:
y = f(x, b) + ε
where:
• y is the measurement vector, length m
• x is the state vector, length n
• f is a function describing the physics of the measurement,
65
ERROR ANALYSIS AND CHARACTERISATION
Conceptually
• The measurement y is a function of some unknown state x:
y = f(x, b) + ε
where:
• y is the measurement vector, length m
• x is the state vector, length n
• f is a function describing the physics of the measurement,
• b is a set of ‘known’ parameters of this function,
65
ERROR ANALYSIS AND CHARACTERISATION
Conceptually
• The measurement y is a function of some unknown state x:
y = f(x, b) + ε
where:
• y is the measurement vector, length m
• x is the state vector, length n
• f is a function describing the physics of the measurement,
• b is a set of ‘known’ parameters of this function,
• ε is measurement error, with covariance Sε.
66
ERROR ANALYSIS & CHARACTERISATION II
• The retrieval x is a function of the form:
x = R(y, b, c)
where:
66
ERROR ANALYSIS & CHARACTERISATION II
• The retrieval x is a function of the form:
x = R(y, b, c)
where:
• R represents the retrieval method
66
ERROR ANALYSIS & CHARACTERISATION II
• The retrieval x is a function of the form:
x = R(y, b, c)
where:
• R represents the retrieval method
• b is our estimate of the forward function parameters b
66
ERROR ANALYSIS & CHARACTERISATION II
• The retrieval x is a function of the form:
x = R(y, b, c)
where:
• R represents the retrieval method
• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.
a priori.
66
ERROR ANALYSIS & CHARACTERISATION II
• The retrieval x is a function of the form:
x = R(y, b, c)
where:
• R represents the retrieval method
• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.
a priori.
The Transfer function
Thus the retrieval is related to the ‘truth’ x formally by:
x = R(f(x, b) + ε, b, c)
66
ERROR ANALYSIS & CHARACTERISATION II
• The retrieval x is a function of the form:
x = R(y, b, c)
where:
• R represents the retrieval method
• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.
a priori.
The Transfer function
Thus the retrieval is related to the ‘truth’ x formally by:
x = R(f(x, b) + ε, b, c)
which may be regarded as the transfer function of the measurement and retrieval system as a
whole:
x = T(x, b, ε, b, c)
67
ERROR ANALYSIS & CHARACTERISATION III
Characterisation means evaluating:
• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix
67
ERROR ANALYSIS & CHARACTERISATION III
Characterisation means evaluating:
• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix
Error analysis involves evaluating:
• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)
67
ERROR ANALYSIS & CHARACTERISATION III
Characterisation means evaluating:
• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix
Error analysis involves evaluating:
• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)
• ∂x/∂b = Gb, sensitivity to non-retrieved parameters
67
ERROR ANALYSIS & CHARACTERISATION III
Characterisation means evaluating:
• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix
Error analysis involves evaluating:
• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)
• ∂x/∂b = Gb, sensitivity to non-retrieved parameters
• ∂x/∂c = Gc, sensitivity to retrieval method parameters
67
ERROR ANALYSIS & CHARACTERISATION III
Characterisation means evaluating:
• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix
Error analysis involves evaluating:
• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)
• ∂x/∂b = Gb, sensitivity to non-retrieved parameters
• ∂x/∂c = Gc, sensitivity to retrieval method parameters
and understanding the effect of replacing f by a numerical Forward Model F.
68
THE FORWARD MODEL
We often need to approximate the forward function by a Forward Model:
F(x, b) ' f(x, b)
Where F models the physics of the measurement, including the instrument, as well as we can.
68
THE FORWARD MODEL
We often need to approximate the forward function by a Forward Model:
F(x, b) ' f(x, b)
Where F models the physics of the measurement, including the instrument, as well as we can.
• It usually has parameters b which have experimental error
68
THE FORWARD MODEL
We often need to approximate the forward function by a Forward Model:
F(x, b) ' f(x, b)
Where F models the physics of the measurement, including the instrument, as well as we can.
• It usually has parameters b which have experimental error
• The vector b is not a target for retrieval
68
THE FORWARD MODEL
We often need to approximate the forward function by a Forward Model:
F(x, b) ' f(x, b)
Where F models the physics of the measurement, including the instrument, as well as we can.
• It usually has parameters b which have experimental error
• The vector b is not a target for retrieval
• There may be parameters b′ of the forward function that are not included in the forward model:
F(x, b) ' f(x, b, b′)
69
LINEARISE THE TRANSFER FUNCTION
The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components of c.
69
LINEARISE THE TRANSFER FUNCTION
The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components of c.
Replace f by F + ∆f :
x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)
69
LINEARISE THE TRANSFER FUNCTION
The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components of c.
Replace f by F + ∆f :
x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)
where ∆f is the error in the forward model relative to the real physics:
∆f = f(x, b, b′)− F(x, b)
69
LINEARISE THE TRANSFER FUNCTION
The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components of c.
Replace f by F + ∆f :
x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)
where ∆f is the error in the forward model relative to the real physics:
∆f = f(x, b, b′)− F(x, b)
Linearise F about x = xa, b = b:
69
LINEARISE THE TRANSFER FUNCTION
The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components of c.
Replace f by F + ∆f :
x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)
where ∆f is the error in the forward model relative to the real physics:
∆f = f(x, b, b′)− F(x, b)
Linearise F about x = xa, b = b:
x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)
where Kx = ∂F/∂x (the weighting function) and Kb = ∂F/∂b.
70
LINEARISE THE TRANSFER FUNCTION II
x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)
Linearise R with respect to its first argument, y:
70
LINEARISE THE TRANSFER FUNCTION II
x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)
Linearise R with respect to its first argument, y:
x = R[F(xa, b), b, xa, c] + Gy[Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε]
where Gy = ∂R/∂y (the contribution function)
71
CHARACTERISATION
Some rearrangement gives:
x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing
+Gyεy error (10)
where
71
CHARACTERISATION
Some rearrangement gives:
x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing
+Gyεy error (10)
where
A = GyKx = ∂x/∂x
the averaging kernel,
71
CHARACTERISATION
Some rearrangement gives:
x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing
+Gyεy error (10)
where
A = GyKx = ∂x/∂x
the averaging kernel, and
εy = Kb(b− b) + ∆f(x, b, b′) + ε
is the total error in the measurement relative to the forward model.
71
CHARACTERISATION
Some rearrangement gives:
x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing
+Gyεy error (10)
where
A = GyKx = ∂x/∂x
the averaging kernel, and
εy = Kb(b− b) + ∆f(x, b, b′) + ε
is the total error in the measurement relative to the forward model.
72
BIAS
This is the error you would get on doing a simulated error-free retrieval of the a priori.
72
BIAS
This is the error you would get on doing a simulated error-free retrieval of the a priori.
A priori is what you know about the state before you make the measurement. Any reasonable
retrieval method should return the a priori given a measurement vector that corresponds to it, so
the bias should be zero.
72
BIAS
This is the error you would get on doing a simulated error-free retrieval of the a priori.
A priori is what you know about the state before you make the measurement. Any reasonable
retrieval method should return the a priori given a measurement vector that corresponds to it, so
the bias should be zero.
But check your system to be sure it is. . .
73
THE AVERAGING KERNEL
The retrieved state is a smoothed version of the true state with smoothing functions given by the
rows of A, plus error terms:
x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy
73
THE AVERAGING KERNEL
The retrieved state is a smoothed version of the true state with smoothing functions given by the
rows of A, plus error terms:
x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy
You can either:
• accept that the retrieval is an estimate of a smoothed state, not the true state
73
THE AVERAGING KERNEL
The retrieved state is a smoothed version of the true state with smoothing functions given by the
rows of A, plus error terms:
x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy
You can either:
• accept that the retrieval is an estimate of a smoothed state, not the true state
or• consider the retrieval as an estimate of the true state, with an error contribution due to
smoothing.
73
THE AVERAGING KERNEL
The retrieved state is a smoothed version of the true state with smoothing functions given by the
rows of A, plus error terms:
x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy
You can either:
• accept that the retrieval is an estimate of a smoothed state, not the true state
or• consider the retrieval as an estimate of the true state, with an error contribution due to
smoothing.
The error analysis is different in the two cases, because in the second case there is an extra term.
73
THE AVERAGING KERNEL
The retrieved state is a smoothed version of the true state with smoothing functions given by the
rows of A, plus error terms:
x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy
You can either:
• accept that the retrieval is an estimate of a smoothed state, not the true state
or• consider the retrieval as an estimate of the true state, with an error contribution due to
smoothing.
The error analysis is different in the two cases, because in the second case there is an extra term.
• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.
73
THE AVERAGING KERNEL
The retrieved state is a smoothed version of the true state with smoothing functions given by the
rows of A, plus error terms:
x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy
You can either:
• accept that the retrieval is an estimate of a smoothed state, not the true state
or• consider the retrieval as an estimate of the true state, with an error contribution due to
smoothing.
The error analysis is different in the two cases, because in the second case there is an extra term.
• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.
• The width is a measure of the resolution of the observing system
• The area (generally between zero and unity) is a simple measure of the amount of real
information that appears in the retrieval.
74
STANDARD EXAMPLE: AVERAGING KERNEL
75
STANDARD EXAMPLE: RETRIEVAL GAIN
76
THE OTHER TWO PARAMETERS
“The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components
of c.”
We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.
76
THE OTHER TWO PARAMETERS
“The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components
of c.”
We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.
The linear expansion in x, b and ε gave:
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
76
THE OTHER TWO PARAMETERS
“The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components
of c.”
We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.
The linear expansion in x, b and ε gave:
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.
76
THE OTHER TWO PARAMETERS
“The retrieved quantity is expressed as:
x = R(f(x, b, b′) + ε, b, xa, c)
where we have also separated xa, the a priori, from other components
of c.”
We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.
The linear expansion in x, b and ε gave:
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.
This has the consequence that, at least to first order, ∂R/∂c = 0 for any reasonable retrieval.
77
THE OTHER TWO PARAMETERS II
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
77
THE OTHER TWO PARAMETERS II
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
It also follows that:∂R∂xa
=∂x∂xa
= In − A
For any reasonable retrieval.
77
THE OTHER TWO PARAMETERS II
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
It also follows that:∂R∂xa
=∂x∂xa
= In − A
For any reasonable retrieval.
Thus the reasonableness criterion has the consequence that there can be no inverse model
parameters c that matter, other than xa.
77
THE OTHER TWO PARAMETERS II
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
It also follows that:∂R∂xa
=∂x∂xa
= In − A
For any reasonable retrieval.
Thus the reasonableness criterion has the consequence that there can be no inverse model
parameters c that matter, other than xa.
Incidentally:
The term in square brackets should not depend on b either.
77
THE OTHER TWO PARAMETERS II
x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy
It also follows that:∂R∂xa
=∂x∂xa
= In − A
For any reasonable retrieval.
Thus the reasonableness criterion has the consequence that there can be no inverse model
parameters c that matter, other than xa.
Incidentally:
The term in square brackets should not depend on b either.
This implies that Gb + GyKb = 0, which is no more than:
∂x∂b
+∂x∂y∂y∂b
= 0
78
ERROR ANALYSIS
Some further rearrangement gives for the error in x:
x− x = (A− I)(x− xa) smoothing+Gyεy measurement error
where the bias term has been dropped, and:
εy = Kb(b− b) + ∆f(x, b, b′) + ε
78
ERROR ANALYSIS
Some further rearrangement gives for the error in x:
x− x = (A− I)(x− xa) smoothing+Gyεy measurement error
where the bias term has been dropped, and:
εy = Kb(b− b) + ∆f(x, b, b′) + ε
Thus the error sources can be split up as:
x− x = (A− I)(x− xa) smoothing+GyKb(b− b) model parameters+Gy∆f(x, b, b′) modelling error
+Gyε measurement noise
Some of these are easy to estimate, some are not.
79
MEASUREMENT NOISE
measurement noise = Gyε
This is the easiest component to evaluate.
ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a
known covariance matrix.
79
MEASUREMENT NOISE
measurement noise = Gyε
This is the easiest component to evaluate.
ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a
known covariance matrix.
The covariance of the measurement noise is:
Sn = GySεGTy
Note that Sn is not necessarily diagonal, so there will be errors which are correlated between
different elements of the state vector.
79
MEASUREMENT NOISE
measurement noise = Gyε
This is the easiest component to evaluate.
ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a
known covariance matrix.
The covariance of the measurement noise is:
Sn = GySεGTy
Note that Sn is not necessarily diagonal, so there will be errors which are correlated between
different elements of the state vector.
This is true of all of the error sources.
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
The covariance is:
Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
The covariance is:
Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}
= (A− I)E{(x− xa)(x− xa)T}(A− I)T
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
The covariance is:
Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}
= (A− I)E{(x− xa)(x− xa)T}(A− I)T
= (A− I)Sa(A− I)T
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
The covariance is:
Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}
= (A− I)E{(x− xa)(x− xa)T}(A− I)T
= (A− I)Sa(A− I)T
where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded
as the covariance of a climatology.
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
The covariance is:
Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}
= (A− I)E{(x− xa)(x− xa)T}(A− I)T
= (A− I)Sa(A− I)T
where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded
as the covariance of a climatology.
To estimate the smoothing error, you need to know the climatological covariance matrix.
80
SMOOTHING ERROR
To estimate the actual smoothing error, you need to know the true state:
smoothing error = (A− I)(x− xa)
To characterise the statistics of this error, you need its mean and covariance over some ensemble.
The mean should be zero.
The covariance is:
Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}
= (A− I)E{(x− xa)(x− xa)T}(A− I)T
= (A− I)Sa(A− I)T
where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded
as the covariance of a climatology.
To estimate the smoothing error, you need to know the climatological covariance matrix.
To do the job properly, you need the real climatology, not just some ad hoc matrix that has been
used as a constraint in the retrieval. The real climatology is often not available. Much of the
smoothing error can be in fine spatial scales that may never have been measured.
81
STANDARD EXAMPLE: ERROR COMPONENTS
Standard Deviations:
——— A Priori
· · · · Smoothing error
— — Retrieval noise
— · — Total retrieval error
82
FORWARD MODEL ERRORS
Forward model parameters
forward model parameter error = GyKb(b− b)
This one is easy. (In principle)
82
FORWARD MODEL ERRORS
Forward model parameters
forward model parameter error = GyKb(b− b)
This one is easy. (In principle)
If you have estimated the forward model parameters properly, their individual errors will be
unbiassed, so the mean error will be zero.
82
FORWARD MODEL ERRORS
Forward model parameters
forward model parameter error = GyKb(b− b)
This one is easy. (In principle)
If you have estimated the forward model parameters properly, their individual errors will be
unbiassed, so the mean error will be zero.
The covariance is:
Sf = GyKbSbKTb GT
y
where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}
82
FORWARD MODEL ERRORS
Forward model parameters
forward model parameter error = GyKb(b− b)
This one is easy. (In principle)
If you have estimated the forward model parameters properly, their individual errors will be
unbiassed, so the mean error will be zero.
The covariance is:
Sf = GyKbSbKTb GT
y
where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}
However remember that this is most probably a systematic, not a random error.
83
FORWARD MODEL ERRORS II
Modelling error
modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))
83
FORWARD MODEL ERRORS II
Modelling error
modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))
Note that this is evaluated at the true state, and with the true value of b, but hopefully its
sensitivity to these quantities is not large.
83
FORWARD MODEL ERRORS II
Modelling error
modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))
Note that this is evaluated at the true state, and with the true value of b, but hopefully its
sensitivity to these quantities is not large.
This can be hard to evaluate, because it requires a model f which includes the correct physics. If Fis simply a numerical approximation for efficiency’s sake, it may not be too difficult, but if f is not
known in detail, or so horrendously complex that no proper model is feasible, then modelling error
can be tricky to estimate.
This is also usually a systematic error.
84
ERROR COVARIANCE MATRIX
An Error Covariance Matrix S is defined as
Sij = E{εiεj}
Diagonal elements are the familiar error variances.
84
ERROR COVARIANCE MATRIX
An Error Covariance Matrix S is defined as
Sij = E{εiεj}
Diagonal elements are the familiar error variances.
The corresponding probability density function (PDF), if Gaussian, is:
P (y) ∝ exp
„−
1
2(y − y)
TS−1(y − y)
«
84
ERROR COVARIANCE MATRIX
An Error Covariance Matrix S is defined as
Sij = E{εiεj}
Diagonal elements are the familiar error variances.
The corresponding probability density function (PDF), if Gaussian, is:
P (y) ∝ exp
„−
1
2(y − y)
TS−1(y − y)
«Contours of the PDF are
(y − y)TS−1
(y − y) = const
i.e. ellipsoids, with arbitrary axes.
85
CORRELATED ERRORS
• How do we understand an error covariance matrix?
• What corresponds to error bars for a profile?
85
CORRELATED ERRORS
• How do we understand an error covariance matrix?
• What corresponds to error bars for a profile?
We would really like a PDF to be of the form
P (z) ∝Yi
exp(−z2i /2σ
2i )
i.e. each zi to have independent errors.
85
CORRELATED ERRORS
• How do we understand an error covariance matrix?
• What corresponds to error bars for a profile?
We would really like a PDF to be of the form
P (z) ∝Yi
exp(−z2i /2σ
2i )
i.e. each zi to have independent errors.
This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of
eigenvectors li. Then
P (x) ∝ exp
„−
1
2(x− x)
TLΛ−1LT (x− x)
«
85
CORRELATED ERRORS
• How do we understand an error covariance matrix?
• What corresponds to error bars for a profile?
We would really like a PDF to be of the form
P (z) ∝Yi
exp(−z2i /2σ
2i )
i.e. each zi to have independent errors.
This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of
eigenvectors li. Then
P (x) ∝ exp
„−
1
2(x− x)
TLΛ−1LT (x− x)
«
So if we put z = LT (x− x) then the zi are independent with variance λi.
86
ERROR PATTERNS
• Error covariance is described in state space by an ellipsoid
86
ERROR PATTERNS
• Error covariance is described in state space by an ellipsoid
• We have, in effect, transformed to the principal axes of the ellipsoid
86
ERROR PATTERNS
• Error covariance is described in state space by an ellipsoid
• We have, in effect, transformed to the principal axes of the ellipsoid
• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in
terms of Error Patterns ei = λ12i li such that the total error is of the form
x− x =Xi
βiei
where the error patterns are orthogonal, and the coefficients β are independent random variables
with unit variance.
86
ERROR PATTERNS
• Error covariance is described in state space by an ellipsoid
• We have, in effect, transformed to the principal axes of the ellipsoid
• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in
terms of Error Patterns ei = λ12i li such that the total error is of the form
x− x =Xi
βiei
where the error patterns are orthogonal, and the coefficients β are independent random variables
with unit variance.
87
STANDARD EXAMPLE: NOISE ERROR PATTERNS
There are only eight, same as the rank of the problem.
88
STANDARD EXAMPLE: SMOOTHING ERROR PATTERNS
The largest ten out of one hundred, the dimension of the state vector.
89
Unsolved Problem
89
Unsolved Problem
How do you describe an error covariance to the naive data user?
90
End of Section