clive d rodgers - european space...

1

PRINCIPLES OF RETRIEVAL THEORY

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

2

ATMOSPHERIC REMOTE SENSING: THE INVERSE PROBLEM

Topics

1. Introduction

2. Bayesian approach

3. Information content

4. Error analysis and characterisation

3

ADVERTISEMENT

C. D. Rodgers, Inverse Methods for Atmospheric Sounding: Theory and Practice,

World Scientific Publishing Co., 2000.

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

•

4


• Almost any measurement you make...

4



• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

4




problem.


• Various aspects:

4




problem.



• Formulate the problem properly:

4




problem.




◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

4




problem.






• Finding a solution, inverting the forward model

4




problem.







◦ Algebraic

◦ Numerical

◦ No unique solution

◦ No solution at all

4




problem.







◦ Algebraic

◦ Numerical



• Finding the ‘best’ solution

4




problem.







◦ Algebraic

◦ Numerical




◦ Uniqueness - a unique solution may not be the best...

◦ Accuracy

◦ Efficiency

4




problem.







◦ Algebraic

◦ Numerical




◦ Uniqueness - a unique solution may not be the best...

◦ Accuracy

◦ Efficiency

• Understanding the answer

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

5



• Forward models which are not explicitly invertible

5




• Ill-conditioned or ill-posed problems

5





• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

5






a non-trivial way.

• What to measure?

5






a non-trivial way.


• Does it actually contain the information you want?

5






a non-trivial way.



• Updating existing knowledge

5






a non-trivial way.




• You always have some prior knowledge of the ‘unknown’

5






a non-trivial way.





• the measurement improves that knowledge

5






a non-trivial way.






• the measurement may not be enough by itself to completely determine the unknown

5






a non-trivial way.







• Ill-posed problems

5






a non-trivial way.








• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.

5






a non-trivial way.








• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.

• Which of an infinite manifold of solutions do you want?

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

6



• Any measurement is of a finite number of quantities.

6




• Arrange them as a vector for computational purposes

6





• State Vector: x = (x1, x2, ...xn)

6






• The desired quantity is often continuous - e.g. a temperature profile

6







• We can only make a finite number of measurements and calculations

6








• Express the unknown in terms of a finite number of parameters

6









• They do not all have to be of the same type

6











6











• Examples:

6











• Examples:

◦ Temperature on a set of pressure levels, with a specified interpolation rule.

6











• Examples:

◦ Temperature on a set of pressure levels, with a specified interpolation rule.

◦ Fourier coefficients for a set of waves

7

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the

forward model is not linear

7




Measurement Space

– Measurement space is the space of measurement vectors, dimension m.

7




Measurement Space


State Space

– State space is the space of state vectors, dimension n.

7




Measurement Space


State Space

– State space is the space of state vectors, dimension n.

Generally the two vector spaces will have different dimensions.

8

MATHEMATICAL CONCEPTS III

8


Forward Function and Model

– The Forward Function f(x) maps from state space onto measurement space, depending on the

physics of the measurement.

8





– The Forward Model F(x)is the best we can do in the circumstances to model the forward

function

8





– The Forward Model F(x)is the best we can do in the circumstances to model the forward

function

Inverse or Retrieval Method

– The inverse problem is one of finding an inverse mapping R(y):

Given a point in measurement space, which point or set of points in state space could havemapped into it?

9

NOTATION

I have tried to make it mnemonic as far as possible:

Matrices Bold upper case

Vectors Bold lower case

State vectors xMeasurement vectors yCovariance matrices S (based on σ, and not wanting to use Σ)

Measurement error ε

Forward model F(x) (Really ought to be f(x) it’s a vector)

Jacobian K (originally a Kernel of an integral transform)

Gain matrix G (I’ve used up K, which might stand for Kalman gain)

Averaging Kernel A

Different vectors, matrices of the same type are distinguished by superscripts, subscripts, etc:

a priori xa (background)

estimate xfirst guess x0

‘true’ value x (no subscript)

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

10



y = Kx + ε


• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –

around 0 to 70 km.

10



y = Kx + ε



around 0 to 70 km.

• Eight channels (elements of y).

10



y = Kx + ε



around 0 to 70 km.


• State vector is notionally temperature at 100 levels.

10



y = Kx + ε



around 0 to 70 km.


• State vector is notionally temperature at 100 levels.

• Measurement error (when considered) is 0.5 K.

11

STANDARD WEIGHTING FUNCTIONS

12

EXACT RETRIEVAL SIMULATION

The state vector is a set of

eight coefficients of a degree seven

polynomial.

(a) Original profile: US standard

atmosphere

(b) Exact retrieval with no

experimental error

(c) Exact retrieval with simulated

0.5 K error

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

13




y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

13




y = Kx


yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x onto

them.

13




y = Kx


yi = kTi x


them.

They span a subspace called the row space, of dimension equal to the rank of K,

p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.

13




y = Kx


yi = kTi x


them.



Only those components of x in the row space can be measured.

13




y = Kx


yi = kTi x


them.



Only those components of x in the row space can be measured.

The null space is the part of state space which is not in the row space.

14

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

14



Which is which?

1. p = m = n. Well posed.

The number of unknowns is equal to the number of measurements, and they are all independent.

14



Which is which?



2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

14



Which is which?



2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

3. p = n < m. Overconstrained, ill-posed

More measurements than unknowns, so they could be inconsistent, but the unknowns are all in

the row space, so there is information about all of them.

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

15



1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurements

are not independent, so they could be inconsistent, and the number of independent pieces of

information is less than the number of unknowns.

15



1. p < m = n.



information is less than the number of unknowns. Simple example:

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

15



1. p < m = n.




y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they could

be inconsistent.

15



1. p < m = n.




y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they could

be inconsistent.

3. p < n < m.

More measurements than the rank, so they could be inconsistent, more unknowns than the rank,

so not all are defined by the measurement.

16

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

16

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

How do we identify the row and null spaces?

One straightforward way is Gram-Schmidt orthogonalisation, but . . .

17

SINGULAR VECTOR DECOMPOSITION [>>]

Is the neatest way of doing the job.

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

18




Al = λl

If A is a coordinate transformation, then l has the same representation in the untransformed and

transformed coordinates, apart from a factor λ.

18




Al = λl



This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a

solution if |A− λI| = 0,

18




Al = λl




solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They

will be complex in general.

18




Al = λl




solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They

will be complex in general.

An eigenvector can be scaled by an arbitrary factor. It is conventional to normalise them so that

lT l = 1 or l†l = 1 (Hermitian adjoint)

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

19

EIGENVECTORS II


AL = LΛ


Transpose LTAT = ΛLT

19

EIGENVECTORS II


AL = LΛ



Multiply by R = (LT )−1 AT = RΛLT

19

EIGENVECTORS II


AL = LΛ




Postmultiply by R ATR = RΛ

19

EIGENVECTORS II


AL = LΛ





Thus:

R is the matrix of eigenvectors of AT .

19

EIGENVECTORS II


AL = LΛ





Thus:


AT has the same eigenvalues as A.

19

EIGENVECTORS II


AL = LΛ





Thus:


AT has the same eigenvalues as A.

In the case of a Symmetric matrix, S = ST we must have L = R, so that LTL = LLT = I or

LT = L−1, and the eigenvectors are orthogonal.

In this case the eigenvalues are real.

20

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:

xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, in

n-space.

20



xTSx = 1


n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector,

20



xTSx = 1


n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so

Sx = λx

is the problem of finding points where the normal and the radius vector are parallel. These are

where the principal axes intersect the surface.

20



xTSx = 1


n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so

Sx = λx

is the problem of finding points where the normal and the radius vector are parallel. These are

where the principal axes intersect the surface.

At these points, xTSx = 1, so xTλx = 1 or:

λ =1

xTx

So the eigenvalues are the reciprocals of the squares of the lengths of the principal axes.

21

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitrary

orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

21

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitrary

orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

Consider using the eigenvectors of S to transform the equation for the quadratic surface:

xTLΛLTx = 1 or yTΛy = 1 orX

λiy2i = 1

where y = LTx or x = Ly. This transforms the surface into its principal axis representation.

22

EIGENVECTORS - USEFUL RELATIONSHIPS

Asymmetric Matrices Symmetric Matrices

AR = RΛ SL = LΛ

LTA = ΛLLT = R−1, RT = L−1 LT = L−1

LRT = LTR = I LLT = LTL = IA = RΛLT =

PλirilTi S = LΛLT =

PλililTi

AT = LΛRT =PλilirTi

A−1 = RΛ−1LT S−1 = LΛ−1LT

An = RΛnLT Sn = LΛnLT

LTAR = Λ LTSL = Λ

LTAnR = Λn LTSnL = Λn

LTA−1R = Λ−1 LTS−1L = Λ−1

|A| =Q

i λi |A| =Q

i λi

23

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

23



A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n

columns can be constructed:

Kv = λuKTu = λv (3)

where v, of length n, and u, of length m, are called the singular vectors of K.

23







This is equivalent to the symmetric problem:„O K

KT O

«„uv

«= λ

„uv

«

23







This is equivalent to the symmetric problem:„O K

KT O

«„uv

«= λ

„uv

«From (3) we can get

KTKv = λKTu = λ2vKKTu = λKv = λ2u (4)

so u and v are the eigenvectors of KKT (m×m) and KTK (n× n) respectively.

24

SINGULAR VECTOR DECOMPOSITION II

Care is needed in constructing a matrix of singular vectors, because individual u and v vectors

correspond to each other, yet there are potentially different numbers of v and u vectors.

24




If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will

have p non-zero eigenvalues.

24




If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will

have p non-zero eigenvalues.

The surplus eigenvectors will have zero eigenvalues, and can be discarded and we can write:„O K

KT O

«„UV

«=

„UV

«Λ

where Λ is p× p, U is m× p, and V is n× p.

There will be n+m− p more eigenvectors of the composite matrix, all with zero eigenvalue.

25

SINGULAR VECTORS - USEFUL RELATIONSHIPS

KV = UΛKTU = VΛ

UTKV = VTKTU = λ

K = UΛVT

KT= VΛUT

VTV = UTU = IpKKTU = UΛ2

KTKV = VΛ2(5)

26

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job.

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

27


Is the neatest way of doing the job

Express K as

mKn = mUpΛpVTn


Then the forward model (no noise) becomes:

my1 = mKnx1 = mUpΛpVTnx1

27



Express K as

mKn = mUpΛpVTn




so that

pUTmy1 = pΛpV

Tnx1

27



Express K as

mKn = mUpΛpVTn




so that

pUTmy1 = pΛpV

Tnx1

or

y′ = Λx′

where y′ = UTy and x′ = VTx are both of order p.

27



Express K as

mKn = mUpΛpVTn




so that

pUTmy1 = pΛpV

Tnx1

or

y′ = Λx′

where y′ = UTy and x′ = VTx are both of order p.

The rows of VT , or the columns of V (in state space) are a basis for the row space of K. Similarly

the columns of U (in measurement space) are a basis of its column space.

28


We can also see that an exact solution is

x′ = Λ−1y′

or

x = VΛ−1UTy

This is only a unique solution if p = n. If p < n any multiples of vectors with zero singular values

can be added, and still satisfy the equations.

29

SVD OF THE STANDARD WEIGHTING FUNCTIONS

30

EXACT RETRIEVAL SIMULATION

The state vector is a set of

eight coefficients of a degree seven

polynomial.

(a) Original profile: US standard

atmosphere

(b) Exact retrieval with no

experimental error

(c) Exact retrieval with simulated

0.5 K error

31

APPROACHES TO INVERSE PROBLEMS

31


• Bayesian Approach

• What is the pdf of the state, given the measurement and the a priori ?

31




• Optimisation Approaches:

• Maximum Likelihood

• Maximum A Posteriori

• Minimum Variance

• Backus-Gilbert - resolution/noise trade-off

31




• Optimisation Approaches:

• Maximum Likelihood

• Maximum A Posteriori

• Minimum Variance

• Backus-Gilbert - resolution/noise trade-off

• Ad hoc Approaches

• Relaxation

• Exact algebraic solutions

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

32

BAYESIAN APPROACH




the measurement.

• P (y) is the a priori p.d.f. of the measurement.

32

BAYESIAN APPROACH




the measurement.


• P (x, y) is the joint a priori p.d.f. of x and y.

32

BAYESIAN APPROACH




the measurement.



• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error

and the forward function.

32

BAYESIAN APPROACH




the measurement.



• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error

and the forward function.

• P (x|y) is the p.d.f. of the state given the measurement - this is what we want to find.

33

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

33

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

P (y)

33

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

P (y)

If we have a prior p.d.f. for x, P(x), and we know statistically how y is related to x via P (y|x),

then we can find an un-normalised version of P (x|y), namely P (y|x)P (x), which can be

normalised if required.

34

BAYES THEOREM GEOMETRICALLY

35

APPLICATION OF THE BAYESIAN APPROACH

We need explicit forms for the p.d.f’s:

35



• Assume that experimental error is Gaussian:

− lnP (y|x) =1

2(y − F(x))

TS−1ε (y − F(x)) + const

where F(x) is the Forward model:

y = F(x) + ε

and Sε is the covariance matrix of the experimental error, ε:

Sε = E{εεT} = E{(y − F(x))(y − F(x))T}

35



• Assume that experimental error is Gaussian:

− lnP (y|x) =1

2(y − F(x))

TS−1ε (y − F(x)) + const

where F(x) is the Forward model:

y = F(x) + ε

and Sε is the covariance matrix of the experimental error, ε:

Sε = E{εεT} = E{(y − F(x))(y − F(x))T}

• Assume that the a priori p.d.f. is Gaussian (less justifiable):

− lnP (x) =1

2(x− xa)

TS−1a (x− xa) + const

i.e. x is distributed normally with mean xa and covariance Sa.

36

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

−2 lnP (x|y) = [y − F(x)]TS−1

ε [y − F(x)] + [x− xa]TS−1

a [x− xa] + const

36

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

−2 lnP (x|y) = [y − F(x)]TS−1

ε [y − F(x)] + [x− xa]TS−1

a [x− xa] + const

• If we want a state estimate x rather than a p.d.f., then we must calculate some function of

P (x|y), such as its mean or its maximum

x =

ZP (x|y)x dx or

dP (x|y)

dx= 0

37

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

37



F(x) = Kx


−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

This is quadratic in x, so has to be of the form:

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

37



F(x) = Kx


−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1


−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

Equate the terms that are quadratic in x:

xTKTS−1ε Kx + xTS−1

a x = xT S−1x

37



F(x) = Kx


−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1


−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

Equate the terms that are quadratic in x:

xTKTS−1ε Kx + xTS−1

a x = xT S−1x

giving

S−1= KTS−1

ε K + S−1a

• ‘The Fisher Information Matrix’.

38

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II

Equating the terms linear in xT gives:

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

38



(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

This must be valid for any x. Cancel the xT ’s, and substitute for S−1:

KTS−1ε y + S−1

a xa = (KTS−1ε K + S−1

a )x

38



(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

This must be valid for any x. Cancel the xT ’s, and substitute for S−1:

KTS−1ε y + S−1

a xa = (KTS−1ε K + S−1

a )x

giving:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε y + S−1

a xa)

The mean x and the covariance S define the full posterior pdf.

39

A GEOMETRIC INTERPRETATION OF THE SOLUTION

40

AN ALGEBRAIC INTERPRETATION OF THE SOLUTION

The expected value is:

x = (KTS−1ε K + S−1

a )−1


a xa) (1)

40



x = (KTS−1ε K + S−1

a )−1


a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.

For example G = KT (KKT )−1.

Replace y by Kxe in (1):

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

a xa)

40



x = (KTS−1ε K + S−1

a )−1


a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.

For example G = KT (KKT )−1.

Replace y by Kxe in (1):

x = (KTS−1ε K + S−1

a )−1


a xa)

Overconstrained case

The least squares solution xl satisfies KTS−1ε Kxl = KTS−1

ε y.

Inserting this in (1) gives:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxl + S−1

a xa)

41

AN INTERPRETATION OF THE SOLUTION II

x = (KTS−1ε K + S−1

a )−1


a xa)

or

x = (KTS−1ε K + S−1

a )−1


a xa)

41


x = (KTS−1ε K + S−1

a )−1


a xa)

or

x = (KTS−1ε K + S−1

a )−1


a xa)

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using

relative weights KTS−1ε K and S−1

a respectively – their Fisher information matrices.

41


x = (KTS−1ε K + S−1

a )−1


a xa)

or

x = (KTS−1ε K + S−1

a )−1


a xa)

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using

relative weights KTS−1ε K and S−1

a respectively – their Fisher information matrices.

• This is exactly like the familiar combination of scalar measurements x1 and x2 of an unknown

x, with variances σ21 and σ2

2 respectively:

x = (1/σ21 + 1/σ

22)−1

(x1/σ21 + x2/σ

22)

42

End of Section

43

INFORMATION CONTENT AND ERROR ANALYSIS

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

44

INFORMATION CONTENT OF A MEASUREMENT

Information in a general qualitative sense:

Conceptually, what does y tell you about x?

44

INFORMATION CONTENT OF A MEASUREMENT

Information in a general qualitative sense:

Conceptually, what does y tell you about x?

We need to answer this

• to determine if a conceptual instrument design actually works

• to optimise designs

Use the linear problem for simplicity to illustrate the ideas.

y = Kx + ε

45

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

45

SHANNON INFORMATION



• Entropy is defined by:

S{P} = −ZP (x) log(P (x)/M(x))dx

M(x) is a measure function. We will take it to be constant.

45

SHANNON INFORMATION






• Compare this with the statistical mechanics definition of entropy:

S = −kXi

pi ln pi

P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.

45

SHANNON INFORMATION







S = −kXi

pi ln pi


• The Shannon information content of a measurement is the change in entropy between the p.d.f.

before, P (x), and the p.d.f. after, P (x|y), the measurement:

H = S{P (x)} − S{P (x|y)}

45

SHANNON INFORMATION







S = −kXi

pi ln pi


• The Shannon information content of a measurement is the change in entropy between the p.d.f.

before, P (x), and the p.d.f. after, P (x|y), the measurement:

H = S{P (x)} − S{P (x|y)}

What does this all mean?

46

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.

46



P (x) = 1/a 0 < x < a

and zero outside.The entropy is given by

S = −Z a

0

1

aln

„1

a

«dx

46



P (x) = 1/a 0 < x < a


S = −Z a

0

1

aln

„1

a

«dx = ln a

46



P (x) = 1/a 0 < x < a


S = −Z a

0

1

aln

„1

a

«dx = ln a

Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:

S = −ZV

1

Vln

„1

V

«dv = lnV

46



P (x) = 1/a 0 < x < a


S = −Z a

0

1

aln

„1

a

«dx = ln a

Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:

S = −ZV

1

Vln

„1

V

«dv = lnV

i.e the entropy is the log of the volume of state space occupied by the p.d.f.

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

47



P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

47



P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]


• The contours of P (x) in n-space are ellipsoidal, described by

(x− x)TS−1

(x− x) = constant

47



P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]



(x− x)TS−1

(x− x) = constant

• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional

to the square roots of the corresponding eigenvalues.

47



P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]



(x− x)TS−1

(x− x) = constant



• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which

is proportional to |S|12 .

47



P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]



(x− x)TS−1

(x− x) = constant



• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which

is proportional to |S|12 .

• Therefore entropy is the log of the volume enclosed by some particular contour of P (x). A

‘volume of uncertainty’.

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

48



• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

48




• A generalisation of ‘signal to noise’.

48





• In the Boxcar case, the log of the ratio of the volumes before and after

48






• In the Gaussian case:

H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1

a )−1S−1

a |

48






• In the Gaussian case:

H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1

a )−1S−1

a |

• minus the log of the determinant of the weight of xa in the Bayesian expectation.

49

The log of the ratio of the volumes of the ellipsoids

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

50



χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

50



χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]



These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

50



χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]





dn = E{[y −Kx]TS−1

ε [y −Kx]}

50



χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]






ε [y −Kx]}

and

ds = E{[x− xa]TS−1

a [x− xa]}

50



χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]






ε [y −Kx]}

and


a [x− xa]}

Using tr(CD)=tr(DC), we can see that

ds = E{tr([x− xa][x− xa]TS−1

a )}

50



χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]






ε [y −Kx]}

and


a [x− xa]}

Using tr(CD)=tr(DC), we can see that

ds = E{tr([x− xa][x− xa]TS−1

a )}= tr(E{[x− xa][x− xa]

T}S−1a ) (6)

51

DEGREES OF FREEDOM FOR SIGNAL AND NOISE II

With some manipulation we can find

ds = tr((KTS−1ε K + S−1

a )−1KTS−1

ε K)

= tr(KSaKT(KSaK

T+ Sε)

−1)

51

DEGREES OF FREEDOM FOR SIGNAL AND NOISE II

With some manipulation we can find

ds = tr((KTS−1ε K + S−1

a )−1KTS−1

ε K)

= tr(KSaKT(KSaK

T+ Sε)

−1) (7)

and

dn = tr((KTS−1ε K + S−1

a )−1S−1

a ) +m− n= tr(Sε(KSaK

T+ Sε)

−1) (8)

52

INDEPENDENT MEASUREMENTS

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

52





Therefore it helps to understand where the information comes from if we transform to a different

basis.

52






basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

−12

a x

52






basis.


y = S−1

2ε y x = S

−12

a x

The transformed covariances Sa and Sε both become unit matrices. [>>]

53

SQUARE ROOTS OF MATRICES

The square root of an arbitrary matrix is defined as A12 where

A12A

12 = A

53



A12A

12 = A

Using An = RΛnLT for n = 1/2:

A12 = RΛ

12LT

53



A12A

12 = A


A12 = RΛ

12LT

This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ

12LT can

have either sign, leading to 2n possibilities.

53



A12A

12 = A


A12 = RΛ

12LT

This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ

12LT can

have either sign, leading to 2n possibilities.

We only use square roots of symmetric covariance matrices. In this case S12 = LΛ

12LT is

symmetric.

54

SQUARE ROOTS OF SYMMETRIC MATRICES

Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS

12 , of which the

Cholesky decomposition:

S = TTT

where T is upper triangular, is the most useful.

54



12 , of which the


S = TTT


There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so

is XS12 where X is any orthonormal matrix.

54



12 , of which the


S = TTT


There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so

is XS12 where X is any orthonormal matrix.

The inverse symmetric square root is S−12 = LΛ−

12LT .

The inverse Cholesky decomposition is S−1 = T−1T−T .

The inverse square root T−1 is triangular, and its numerical effect is implemented efficiently by

back substitution.

55

INDEPENDENT MEASUREMENTS[<<]





basis.


y = S−1

2ε y x = S

−12

a x

The transformed covariances Sa and Sε both become unit matrices.

55






basis.


y = S−1

2ε y x = S

−12

a x


The forward model becomes:

y = Kx + ε

where K = S−1

2ε KS

12a .

56






basis.


y = S−1

2ε y x = S

−12

a x



y = Kx + ε

where K = S−1

2ε KS

12a .

The solution covariance becomes:ˆS = (In + KT K)

−1

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

57

TRANSFORM AGAIN


of K:


Define:

x′ = VT x y′ = UT y ε′= UT

ε


y′ = Λx′ + ε′

(1)

57

TRANSFORM AGAIN


of K:


Define:


ε


y′ = Λx′ + ε′

(1)

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal,

57

TRANSFORM AGAIN


of K:


Define:


ε


y′ = Λx′ + ε′

(1)



ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal, and the solution itself is

x′ = (In + Λ2)−1

(Λy′ + x′a)

57

TRANSFORM AGAIN


of K:


Define:


ε


y′ = Λx′ + ε′

(1)



ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1


x′ = (In + Λ2)−1

(Λy′ + x′a)

not x′ = Λ−1y′ as you might expect from (1).

57

TRANSFORM AGAIN


of K:


Define:


ε


y′ = Λx′ + ε′

(1)



ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1


x′ = (In + Λ2)−1

(Λy′ + x′a)


• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured

57

TRANSFORM AGAIN


of K:


Define:


ε


y′ = Λx′ + ε′

(1)



ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1


x′ = (In + Λ2)−1

(Λy′ + x′a)


• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are poorly measured.

58

INFORMATION

Shannon Information in the Transformed Basis

Because it is a ratio of volumes, the linear transformation does not change the information

content. So consider information in the x′, y′ system:

H = S{S′a} − S{S′}

= −1

2log(|In|) +

1

2log(|(Λ2

+ I)−1|)

=Xi

1

2log(1 + λ

2i ) (9)

59

DEGREES OF FREEDOM

Degres of Freedom in the Transformed Basis

The number of independent quantities measured can be thought of as the number of singular

values for which λi � 1

The degrees of freedom for signal is

ds =Xi

λ2i (1 + λ

2i )−1

It is also the sum of the eigenvalues of In − ˆS.

Summary: for each independent component x′i

• The information content is 12 log(1 + λ2

i )

• The degrees of freedom for signal is λ2i (1 + λ2

i )−1

60

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

60




• Any orthogonal linear combination of the weighting functions will form a basis (coordinate

system) for the observable space. An example is those singular vectors of K which have non-zero

singular values.

60






singular values.

• The vectors which have zero singular values form a basis for the null space.

60






singular values.


• Any component of the state in the null space maps onto the origin in measurement space.

60






singular values.


• Any component of the state in the null space maps onto the origin in measurement space.

• This implies that there are distinct states, in fact whole subspaces, which map onto the same

point in measurement space, and cannot be distinguished by the measurement.

61

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

61

THE NEAR NULL SPACE

However


• components observable in principle can have near zero contributions from the measurement, the

‘near null space’

61

THE NEAR NULL SPACE

However




In the x′, y′ system:

• vectors with λ = 0 are in the null space

61

THE NEAR NULL SPACE

However






• vectors with λ� 1 are in the near null space

61

THE NEAR NULL SPACE

However






• vectors with λ� 1 are in the near null space

• vectors with λ >∼ 1 are in the non-null space, and are observable.

62

Diagnostics for the Standard Case

Table 1: Singular values of K, together with contributions of each vector to the degrees of freedom

ds and information content H for both covariance matrices. Measurement noise variance is 0.25 K2.

Diagonal covariance Full covariance

i λi dsi Hi (bits) λi dsi Hi (bits)

1 6.51929 0.97701 2.72149 27.81364 0.99871 4.79865

2 4.79231 0.95827 2.29147 18.07567 0.99695 4.17818

3 3.09445 0.90544 1.70134 9.94379 0.98999 3.32105

4 1.84370 0.77269 1.06862 5.00738 0.96165 2.35227

5 1.03787 0.51858 0.52731 2.39204 0.85123 1.37443

6 0.55497 0.23547 0.19368 1.09086 0.54337 0.56546

7 0.27941 0.07242 0.05423 0.46770 0.17948 0.14270

8 0.13011 0.01665 0.01211 0.17989 0.03135 0.02297

totals 4.45653 8.57024 5.55272 16.75571

Sa = 100I K2

Saij = 100[1− exp(−

|zi − zj|h

)]

where h = 1 in ln(p) coordinates.

63

FTIR Measurements of CO

30 levels; 894 measurements:

Apparently heavily overconstrained.

63

FTIR Measurements of CO

30 levels; 894 measurements:

Apparently heavily overconstrained.

Table 2: Singular Values of K

5.345 3.498 0.033 0.0046 7.15e-4 2.56e-4

7.76e-5 2.13e-3 1.71e-5 1.38e-5 9.01e-6 6.73e-6

5.82e-6 4.79e-6 2,87e-6 3.52e-6 3.75e-6 1.91e-6

9.83e-7 2.37e-7 7.71e-7 1.18e-7 1.48e-6 1.95e-7

1.37e-7 6.67e-8 3.50e-8 3.37e-8 5.83e-9 6.29e-9

Noise is 0.03 in these units;

Prior variance is about 1.

There are 2.5 degrees of freedom.

Lots of near null space.

64

A SIMULATED RETRIEVAL

——— ’true state’

· · · · a priori

— — noise free simulation

— · — with simulated noise

How do we understand the nature of a

retrieval?

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

65


Conceptually


y = f(x, b) + ε

where:

• y is the measurement vector, length m

65


Conceptually


y = f(x, b) + ε

where:


• x is the state vector, length n

65


Conceptually


y = f(x, b) + ε

where:



• f is a function describing the physics of the measurement,

65


Conceptually


y = f(x, b) + ε

where:




• b is a set of ‘known’ parameters of this function,

65


Conceptually


y = f(x, b) + ε

where:




• b is a set of ‘known’ parameters of this function,

• ε is measurement error, with covariance Sε.

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

66



x = R(y, b, c)

where:

• R represents the retrieval method

66



x = R(y, b, c)

where:


• b is our estimate of the forward function parameters b

66



x = R(y, b, c)

where:


• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.

a priori.

66



x = R(y, b, c)

where:



a priori.

The Transfer function

Thus the retrieval is related to the ‘truth’ x formally by:

x = R(f(x, b) + ε, b, c)

66



x = R(y, b, c)

where:



a priori.

The Transfer function

Thus the retrieval is related to the ‘truth’ x formally by:

x = R(f(x, b) + ε, b, c)

which may be regarded as the transfer function of the measurement and retrieval system as a

whole:

x = T(x, b, ε, b, c)

67

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

67




Error analysis involves evaluating:

• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)

67






• ∂x/∂b = Gb, sensitivity to non-retrieved parameters

67







• ∂x/∂c = Gc, sensitivity to retrieval method parameters

67







• ∂x/∂c = Gc, sensitivity to retrieval method parameters

and understanding the effect of replacing f by a numerical Forward Model F.

68

THE FORWARD MODEL

We often need to approximate the forward function by a Forward Model:

F(x, b) ' f(x, b)

Where F models the physics of the measurement, including the instrument, as well as we can.

68

THE FORWARD MODEL


F(x, b) ' f(x, b)


• It usually has parameters b which have experimental error

68

THE FORWARD MODEL


F(x, b) ' f(x, b)



• The vector b is not a target for retrieval

68

THE FORWARD MODEL


F(x, b) ' f(x, b)



• The vector b is not a target for retrieval

• There may be parameters b′ of the forward function that are not included in the forward model:

F(x, b) ' f(x, b, b′)

69

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

69



x = R(f(x, b, b′) + ε, b, xa, c)


Replace f by F + ∆f :

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

69



x = R(f(x, b, b′) + ε, b, xa, c)



x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

where ∆f is the error in the forward model relative to the real physics:

∆f = f(x, b, b′)− F(x, b)

69



x = R(f(x, b, b′) + ε, b, xa, c)



x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)


∆f = f(x, b, b′)− F(x, b)

Linearise F about x = xa, b = b:

69



x = R(f(x, b, b′) + ε, b, xa, c)



x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)


∆f = f(x, b, b′)− F(x, b)

Linearise F about x = xa, b = b:

x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)

where Kx = ∂F/∂x (the weighting function) and Kb = ∂F/∂b.

70

LINEARISE THE TRANSFER FUNCTION II


Linearise R with respect to its first argument, y:

70

LINEARISE THE TRANSFER FUNCTION II


Linearise R with respect to its first argument, y:

x = R[F(xa, b), b, xa, c] + Gy[Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε]

where Gy = ∂R/∂y (the contribution function)

71

CHARACTERISATION

Some rearrangement gives:

x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing

+Gyεy error (10)

where

71

CHARACTERISATION



+Gyεy error (10)

where

A = GyKx = ∂x/∂x

the averaging kernel,

71

CHARACTERISATION



+Gyεy error (10)

where

A = GyKx = ∂x/∂x

the averaging kernel, and

εy = Kb(b− b) + ∆f(x, b, b′) + ε

is the total error in the measurement relative to the forward model.

72

BIAS

This is the error you would get on doing a simulated error-free retrieval of the a priori.

72

BIAS


A priori is what you know about the state before you make the measurement. Any reasonable

retrieval method should return the a priori given a measurement vector that corresponds to it, so

the bias should be zero.

72

BIAS


A priori is what you know about the state before you make the measurement. Any reasonable

retrieval method should return the a priori given a measurement vector that corresponds to it, so

the bias should be zero.

But check your system to be sure it is. . .

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

73





You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

73





You can either:


or• consider the retrieval as an estimate of the true state, with an error contribution due to

smoothing.

73





You can either:



smoothing.

The error analysis is different in the two cases, because in the second case there is an extra term.

73





You can either:



smoothing.


• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.

73





You can either:



smoothing.


• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.

• The width is a measure of the resolution of the observing system

• The area (generally between zero and unity) is a simple measure of the amount of real

information that appears in the retrieval.

74

STANDARD EXAMPLE: AVERAGING KERNEL

75

STANDARD EXAMPLE: RETRIEVAL GAIN

76

THE OTHER TWO PARAMETERS

“The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components

of c.”

We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.

76



x = R(f(x, b, b′) + ε, b, xa, c)


of c.”


The linear expansion in x, b and ε gave:

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

76



x = R(f(x, b, b′) + ε, b, xa, c)


of c.”




We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.

76



x = R(f(x, b, b′) + ε, b, xa, c)


of c.”




We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.

This has the consequence that, at least to first order, ∂R/∂c = 0 for any reasonable retrieval.

77

THE OTHER TWO PARAMETERS II


77



It also follows that:∂R∂xa

=∂x∂xa

= In − A

For any reasonable retrieval.

77




=∂x∂xa

= In − A


Thus the reasonableness criterion has the consequence that there can be no inverse model

parameters c that matter, other than xa.

77




=∂x∂xa

= In − A




Incidentally:

The term in square brackets should not depend on b either.

77




=∂x∂xa

= In − A




Incidentally:

The term in square brackets should not depend on b either.

This implies that Gb + GyKb = 0, which is no more than:

∂x∂b

+∂x∂y∂y∂b

= 0

78

ERROR ANALYSIS

Some further rearrangement gives for the error in x:

x− x = (A− I)(x− xa) smoothing+Gyεy measurement error

where the bias term has been dropped, and:

εy = Kb(b− b) + ∆f(x, b, b′) + ε

78

ERROR ANALYSIS

Some further rearrangement gives for the error in x:

x− x = (A− I)(x− xa) smoothing+Gyεy measurement error

where the bias term has been dropped, and:

εy = Kb(b− b) + ∆f(x, b, b′) + ε

Thus the error sources can be split up as:

x− x = (A− I)(x− xa) smoothing+GyKb(b− b) model parameters+Gy∆f(x, b, b′) modelling error

+Gyε measurement noise

Some of these are easy to estimate, some are not.

79

MEASUREMENT NOISE

measurement noise = Gyε

This is the easiest component to evaluate.

ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a

known covariance matrix.

79

MEASUREMENT NOISE





The covariance of the measurement noise is:

Sn = GySεGTy

Note that Sn is not necessarily diagonal, so there will be errors which are correlated between

different elements of the state vector.

79

MEASUREMENT NOISE





The covariance of the measurement noise is:

Sn = GySεGTy

Note that Sn is not necessarily diagonal, so there will be errors which are correlated between

different elements of the state vector.

This is true of all of the error sources.

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

80

SMOOTHING ERROR




The mean should be zero.

80

SMOOTHING ERROR





The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

80

SMOOTHING ERROR





The covariance is:


= (A− I)E{(x− xa)(x− xa)T}(A− I)T

80

SMOOTHING ERROR





The covariance is:


= (A− I)E{(x− xa)(x− xa)T}(A− I)T

= (A− I)Sa(A− I)T

80

SMOOTHING ERROR





The covariance is:


= (A− I)E{(x− xa)(x− xa)T}(A− I)T


where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded

as the covariance of a climatology.

80

SMOOTHING ERROR





The covariance is:


= (A− I)E{(x− xa)(x− xa)T}(A− I)T




To estimate the smoothing error, you need to know the climatological covariance matrix.

80

SMOOTHING ERROR





The covariance is:


= (A− I)E{(x− xa)(x− xa)T}(A− I)T




To estimate the smoothing error, you need to know the climatological covariance matrix.

To do the job properly, you need the real climatology, not just some ad hoc matrix that has been

used as a constraint in the retrieval. The real climatology is often not available. Much of the

smoothing error can be in fine spatial scales that may never have been measured.

81

STANDARD EXAMPLE: ERROR COMPONENTS

Standard Deviations:

——— A Priori

· · · · Smoothing error

— — Retrieval noise

— · — Total retrieval error

82

FORWARD MODEL ERRORS

Forward model parameters

forward model parameter error = GyKb(b− b)

This one is easy. (In principle)

82





If you have estimated the forward model parameters properly, their individual errors will be

unbiassed, so the mean error will be zero.

82







The covariance is:

Sf = GyKbSbKTb GT

y

where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}

82







The covariance is:

Sf = GyKbSbKTb GT

y

where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}

However remember that this is most probably a systematic, not a random error.

83

FORWARD MODEL ERRORS II

Modelling error

modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))

83


Modelling error


Note that this is evaluated at the true state, and with the true value of b, but hopefully its

sensitivity to these quantities is not large.

83


Modelling error


Note that this is evaluated at the true state, and with the true value of b, but hopefully its

sensitivity to these quantities is not large.

This can be hard to evaluate, because it requires a model f which includes the correct physics. If Fis simply a numerical approximation for efficiency’s sake, it may not be too difficult, but if f is not

known in detail, or so horrendously complex that no proper model is feasible, then modelling error

can be tricky to estimate.

This is also usually a systematic error.

84

ERROR COVARIANCE MATRIX

An Error Covariance Matrix S is defined as

Sij = E{εiεj}

Diagonal elements are the familiar error variances.

84



Sij = E{εiεj}


The corresponding probability density function (PDF), if Gaussian, is:

P (y) ∝ exp

„−

1

2(y − y)

TS−1(y − y)

«

84



Sij = E{εiεj}


The corresponding probability density function (PDF), if Gaussian, is:

P (y) ∝ exp

„−

1

2(y − y)

TS−1(y − y)

«Contours of the PDF are

(y − y)TS−1

(y − y) = const

i.e. ellipsoids, with arbitrary axes.

85

CORRELATED ERRORS

• How do we understand an error covariance matrix?

• What corresponds to error bars for a profile?

85

CORRELATED ERRORS



We would really like a PDF to be of the form

P (z) ∝Yi

exp(−z2i /2σ

2i )

i.e. each zi to have independent errors.

85

CORRELATED ERRORS




P (z) ∝Yi

exp(−z2i /2σ

2i )


This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of

eigenvectors li. Then

P (x) ∝ exp

„−

1

2(x− x)

TLΛ−1LT (x− x)

«

85

CORRELATED ERRORS




P (z) ∝Yi

exp(−z2i /2σ

2i )


This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of

eigenvectors li. Then

P (x) ∝ exp

„−

1

2(x− x)

TLΛ−1LT (x− x)

«

So if we put z = LT (x− x) then the zi are independent with variance λi.

86

ERROR PATTERNS

• Error covariance is described in state space by an ellipsoid

86

ERROR PATTERNS


• We have, in effect, transformed to the principal axes of the ellipsoid

86

ERROR PATTERNS


• We have, in effect, transformed to the principal axes of the ellipsoid

• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in

terms of Error Patterns ei = λ12i li such that the total error is of the form

x− x =Xi

βiei

where the error patterns are orthogonal, and the coefficients β are independent random variables

with unit variance.

87

STANDARD EXAMPLE: NOISE ERROR PATTERNS

There are only eight, same as the rank of the problem.

88

STANDARD EXAMPLE: SMOOTHING ERROR PATTERNS

The largest ten out of one hundred, the dimension of the state vector.

89

Unsolved Problem

89

Unsolved Problem

How do you describe an error covariance to the naive data user?

90

End of Section

clive d rodgers - european space...

Documents