principles of retrieval theory - esa earth observation data - earth...

Post on 03-Jul-2018

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

PRINCIPLES OF RETRIEVAL THEORY

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

2

ATMOSPHERIC REMOTE SENSING: THE INVERSE PROBLEM

Topics

1. Introduction

2. Bayesian approach

3. Information content

4. Error analysis and characterisation

3

ADVERTISEMENT

C. D. Rodgers, Inverse Methods for Atmospheric Sounding: Theory and Practice,World Scientific Publishing Co., 2000.

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...• When you measure some function of the quantity you really want, you have a retrieval

problem.• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:• Formulate the problem properly:! Describe the measurement in terms of some Forward Model! Don’t forget experimental error!

• Finding a solution, inverting the forward model! Algebraic! Numerical! No unique solution! No solution at all

• Finding the ‘best’ solution! Uniqueness - a unique solution may not be the best...! Accuracy! E!ciency

• Understanding the answer

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible• Ill-conditioned or ill-posed problems• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’• the measurement improves that knowledge• the measurement may not be enough by itself to completely determine the unknown

• Ill-posed problems

• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.• Which of an infinite manifold of solutions do you want?

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile• We can only make a finite number of measurements and calculations• Express the unknown in terms of a finite number of parameters• They do not all have to be of the same type• Arrange them as a vector for computational purposes• Examples:

! Temperature on a set of pressure levels, with a specified interpolation rule.! Fourier coe!cients for a set of waves

7

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if theforward model is not linear

Measurement Space

– Measurement space is the space of measurement vectors, dimension m.

State Space

– State space is the space of state vectors, dimension n.

Generally the two vector spaces will have di"erent dimensions.

8

MATHEMATICAL CONCEPTS III

Forward Function and Model

– The Forward Function f(x) maps from state space onto measurement space, depending on thephysics of the measurement.

– The Forward Model F(x)is the best we can do in the circumstances to model the forwardfunction

Inverse or Retrieval Method

– The inverse problem is one of finding an inverse mapping R(y):

Given a point in measurement space, which point or set of points in state space could havemapped into it?

9

NOTATION

I have tried to make it mnemonic as far as possible:

Matrices Bold upper caseVectors Bold lower caseState vectors xMeasurement vectors yCovariance matrices S (based on !, and not wanting to use !)Measurement error !Forward model F(x) (Really ought to be f(x) it’s a vector)Jacobian K (originally a Kernel of an integral transform)Gain matrix G (I’ve used up K, which might stand for Kalman gain)Averaging Kernel A

Di"erent vectors, matrices of the same type are distinguished by superscripts, subscripts, etc:

a priori xa (background)estimate xfirst guess x0

‘true’ value x (no subscript)

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + !

K is the ‘weighting function’ matrix, ! is measurement error or noise.

• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –around 0 to 70 km.

• Eight channels (elements of y).

• State vector is notionally temperature at 100 levels.

• Measurement error (when considered) is 0.5 K.

11

STANDARD WEIGHTING FUNCTIONS

12

EXACT RETRIEVAL SIMULATION

The state vector is a set ofeight coe!cients of a degree sevenpolynomial.

(a) Original profile: US standardatmosphere

(b) Exact retrieval with noexperimental error

(c) Exact retrieval with simulated0.5 K error

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x ontothem.

They span a subspace called the row space, of dimension equal to the rank of K,p " min(n, m). If p < m then the weighting functions are not linearly independent.

Only those components of x in the row space can be measured.

The null space is the part of state space which is not in the row space.

14

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

1. p = m = n. Well posed.

The number of unknowns is equal to the number of measurements, and they are all independent.

2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

3. p = n < m. Overconstrained, ill-posed

More measurements than unknowns, so they could be inconsistent, but the unknowns are all inthe row space, so there is information about all of them.

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurementsare not independent, so they could be inconsistent, and the number of independent pieces ofinformation is less than the number of unknowns. Simple example:

y1 = x1 + x2 + "1 (1)

y2 = x1 + x2 + "2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they couldbe inconsistent.

3. p < n < m.

More measurements than the rank, so they could be inconsistent, more unknowns than the rank,so not all are defined by the measurement.

16

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

How do we identify the row and null spaces?

One straightforward way is Gram-Schmidt orthogonalisation, but . . .

17

SINGULAR VECTOR DECOMPOSITION [>>]

Is the neatest way of doing the job.

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to findeigenvectors l and scalar eigenvalues # which satisfy

Al = #l

If A is a coordinate transformation, then l has the same representation in the untransformed andtransformed coordinates, apart from a factor #.

This is the same as (A# #I)l = 0, a homogeneous set of equations, which can only have asolution if |A# #I| = 0,giving a polynomial equation of degree n, with n solutions for #. Theywill be complex in general.

An eigenvector can be scaled by an arbitrary factor. It is conventional to normalise them so thatlT l = 1 or l†l = 1 (Hermitian adjoint)

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = L"

where " is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = "LT

Multiply by R = (LT )#1 AT = R"LT

Postmultiply by R ATR = R"

Thus:

R is the matrix of eigenvectors of AT .

AT has the same eigenvalues as A.

In the case of a Symmetric matrix, S = ST we must have L = R, so that LTL = LLT = I orLT = L#1, and the eigenvectors are orthogonal.

In this case the eigenvalues are real.

20

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, inn-space.

The normal to the surface is the vector $(xTSx)/$x, i.e. Sx, and x is the radius vector, so

Sx = #x

is the problem of finding points where the normal and the radius vector are parallel. These arewhere the principal axes intersect the surface.

At these points, xTSx = 1, so xT#x = 1 or:

# =1

xTx

So the eigenvalues are the reciprocals of the squares of the lengths of the principal axes.

21

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitraryorthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

Consider using the eigenvectors of S to transform the equation for the quadratic surface:

xTL"LTx = 1 or yT"y = 1 orX

#iy2i = 1

where y = LTx or x = Ly. This transforms the surface into its principal axis representation.

22

EIGENVECTORS - USEFUL RELATIONSHIPS

Asymmetric Matrices Symmetric Matrices

AR = R" SL = L"LTA = "LLT = R#1, RT = L#1 LT = L#1

LRT = LTR = I LLT = LTL = IA = R"LT =

P#irilTi S = L"LT =

P#ililTi

AT = L"RT =P

#ilirTi

A#1 = R"#1LT S#1 = L"#1LT

An = R"nLT Sn = L"nLT

LTAR = " LTSL = "LTAnR = "n LTSnL = "n

LTA#1R = "#1 LTS#1L = "#1

|A| =Q

i #i |A| =Q

i #i

23

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and ncolumns can be constructed:

Kv = #uKTu = #v (3)

where v, of length n, and u, of length m, are called the singular vectors of K.

This is equivalent to the symmetric problem:

„O KKT O

« „uv

«= #

„uv

«

From (3) we can get

KTKv = #KTu = #2vKKTu = #Kv = #2u (4)

so u and v are the eigenvectors of KKT (m$m) and KTK (n$ n) respectively.

24

SINGULAR VECTOR DECOMPOSITION II

Care is needed in constructing a matrix of singular vectors, because individual u and v vectorscorrespond to each other, yet there are potentially di"erent numbers of v and u vectors.

If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK willhave p non-zero eigenvalues.

The surplus eigenvectors will have zero eigenvalues, and can be discarded and we can write:

„O KKT O

« „UV

«=

„UV

«!

where ! is p$ p, U is m$ p, and V is n$ p.

There will be n + m# p more eigenvectors of the composite matrix, all with zero eigenvalue.

25

SINGULAR VECTORS - USEFUL RELATIONSHIPS

KV = U!KTU = V!

UTKV = VTKTU = "K = U!VT

KT = V!UT

VTV = UTU = Ip

KKTU = U!2

KTKV = V!2 (5)

26

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job.

Express K as

mKn = mUp!pVTn

where the subscripts indicate the sizes of the matrices.

27

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job

Express K as

mKn = mUp!pVTn

where the subscripts indicate the sizes of the matrices.

Then the forward model (no noise) becomes:

my1 = mKnx1 = mUp!pVTnx1

so that

pUTmy1 = p!pV

Tnx1

ory% = !x%

where y% = UTy and x% = VTx are both of order p.

The rows of VT , or the columns of V (in state space) are a basis for the row space of K. Similarlythe columns of U (in measurement space) are a basis of its column space.

28

SINGULAR VECTOR DECOMPOSITION II

We can also see that an exact solution is

x% = !#1y%

orx = V!#1UTy

This is only a unique solution if p = n. If p < n any multiples of vectors with zero singular valuescan be added, and still satisfy the equations.

29

SVD OF THE STANDARD WEIGHTING FUNCTIONS

30

EXACT RETRIEVAL SIMULATION

The state vector is a set ofeight coe!cients of a degree sevenpolynomial.

(a) Original profile: US standardatmosphere

(b) Exact retrieval with noexperimental error

(c) Exact retrieval with simulated0.5 K error

31

APPROACHES TO INVERSE PROBLEMS

• Bayesian Approach

• What is the pdf of the state, given the measurement and the a priori ?

• Optimisation Approaches:

• Maximum Likelihood• Maximum A Posteriori• Minimum Variance• Backus-Gilbert - resolution/noise trade-o"

• Ad hoc Approaches

• Relaxation• Exact algebraic solutions

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we makethe measurement.

• P (y) is the a priori p.d.f. of the measurement.

• P (x, y) is the joint a priori p.d.f. of x and y.

• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental errorand the forward function.

• P (x|y) is the p.d.f. of the state given the measurement - this is what we want to find.

33

BAYES THEOREM

The theorem states:P (x, y) = P (x|y)P (y)

and of courseP (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

P (y)

If we have a prior p.d.f. for x, P(x), and we know statistically how y is related to x via P (y|x),then we can find an un-normalised version of P (x|y), namely P (y|x)P (x), which can benormalised if required.

34

BAYES THEOREM GEOMETRICALLY

35

APPLICATION OF THE BAYESIAN APPROACH

We need explicit forms for the p.d.f’s:

• Assume that experimental error is Gaussian:

# ln P (y|x) =1

2(y # F(x))TS#1

" (y # F(x)) + const

where F(x) is the Forward model:y = F(x) + "

and S" is the covariance matrix of the experimental error, ":

S" = E{""T} = E{(y # F(x))(y # F(x))T}

• Assume that the a priori p.d.f. is Gaussian (less justifiable):

# ln P (x) =1

2(x# xa)

TS#1a (x# xa) + const

i.e. x is distributed normally with mean xa and covariance Sa.

36

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

#2 ln P (x|y) = [y # F(x)]TS#1" [y # F(x)] + [x# xa]

TS#1a [x# xa] + const

• If we want a state estimate x rather than a p.d.f., then we must calculate some function ofP (x|y), such as its mean or its maximum

x =

ZP (x|y)x dx or

dP (x|y)

dx= 0

37

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

#2 ln P (x|y) = [y #Kx]TS#1" [y #Kx] + [x# xa]

TS#1a [x# xa] + c1

This is quadratic in x, so has to be of the form:

#2 ln P (x|y) = [x# x]T S#1[x# x] + c2

Equate the terms that are quadratic in x:

xTKTS#1" Kx + xTS#1

a x = xT S#1x

givingS#1 = KTS#1

" K + S#1a

• ‘The Fisher Information Matrix’.

38

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II

Equating the terms linear in xT gives:

(#Kx)TS#1" (y) + (x)TS#1

a (#xa) = xT S#1(#x)

This must be valid for any x. Cancel the xT ’s, and substitute for S#1:

KTS#1" y + S#1

a xa = (KTS#1" K + S#1

a )x

giving:x = (KTS#1

" K + S#1a )#1(KTS#1

" y + S#1a xa)

The mean x and the covariance S define the full posterior pdf.

39

A GEOMETRIC INTERPRETATION OF THE SOLUTION

40

AN ALGEBRAIC INTERPRETATION OF THE SOLUTION

The expected value is:

x = (KTS#1" K + S#1

a )#1(KTS#1" y + S#1

a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.For example G = KT (KKT )#1.

Replace y by Kxe in (1):

x = (KTS#1" K + S#1

a )#1(KTS#1" Kxe + S#1

a xa)

Overconstrained case

The least squares solution xl satisfies KTS#1" Kxl = KTS#1

" y.

Inserting this in (1) gives:

x = (KTS#1" K + S#1

a )#1(KTS#1" Kxl + S#1

a xa)

41

AN INTERPRETATION OF THE SOLUTION II

x = (KTS#1" K + S#1

a )#1(KTS#1" Kxe + S#1

a xa)

orx = (KTS#1

" K + S#1a )#1(KTS#1

" Kxl + S#1a xa)

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa usingrelative weights KTS#1

" K and S#1a respectively – their Fisher information matrices.

• This is exactly like the familiar combination of scalar measurements x1 and x2 of an unknownx, with variances !2

1 and !22 respectively:

x = (1/!21 + 1/!2

2)#1(x1/!2

1 + x2/!22)

42

End of Section

top related