a variational approach for sharpening high dimensional...

Westfälische Wilhelms-Universität Münster

Institut für Numerische und Angewandte Mathematik

A Variational Approach for Sharpening

High Dimensional Images

Diplomarbeit

submitted by

Michael Möller

Advisors

Prof. Dr. Martin Burger

Prof. Dr. Andrea Bertozzi

Prof. Dr. Todd Wittman

Münster, 17.06.2009

2

Abstract

Earth observing satellites usually not only take ordinary red-green-blue images, but provide us withseveral images including the near-infrared and infrared spectrum. These images are called multispectral,for about 4-7 dierent bands, or hyperspectral, for higher dimensional images of up to 210 bands. Theadditional bands greatly help in classication and identication tasks. The drawback of the additionalspectral information is that each spectral band has rather low spatial resolution. This is the reasonwhy many multispectral satellites such as Quickbird or the Landsat7 satellite include a panchromaticimage at high spatial resolution. A panchromatic image is a grayscale image that spans a wide range offrequencies. This panchromatic image can be used to enhance the spatial resolution of a multispectralimage in a technique called pan-sharpening. Pan-sharpening has been an active research eld for manyyears but only little work has been done to extend this procedure to higher dimensional imagery.

In this thesis we propose a new pan-sharpening technique called Variational Wavelet Pan-sharpening(VWP) that combines wavelet fusion, the edges of the panchromatic image, and spectral correlationpreserving terms as an energy minimization problem. In particular, we focus on preserving the spectralinformation present in the multispectral imagery.

We show the existence and uniqueness of a minimizer for our energy functional and derive threedierent numerical schemes for the minimization process. Results are presented on Quickbird data andevaluated against other pan-sharpening methods with the help of several image quality metrics. Finally,we show the extension of our pan-sharpening method to sharpening hyperspectral imagery with a (notnecessarily panchromatic) master image at high resolution.

A summary of our work on pan-sharpening and the extension to hyperspectral images can be foundin [MWB08] and [MWB09] respectively.

CONTENTS

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1 Pan-Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Variational Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2. Overview of Wavelets and Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Multiresolution Analysis (MRA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Wavelets in MRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.3 Computation of the Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Orthogonal Wavelets in Two Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Extension of Orthonormal MRA to Two Dimensions . . . . . . . . . . . . . . . . . 172.2.2 The Fast Wavelet Transform in Two Dimensions . . . . . . . . . . . . . . . . . . . 19

3. Recent Pan-Sharpening Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1 IHS Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Brovey Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 PCA Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Wavelet Pan-Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4.1 Possible Problems in Wavelet Image Fusion . . . . . . . . . . . . . . . . . . . . . . 253.5 P+XS Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.6 The Linear Combination Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4. Variational Wavelet Pan-Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1 Energy Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Geometry Forcing Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.2 Wavelet Matching Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1.3 Color Preserving Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1.4 Spectral Correlation Preserving Term . . . . . . . . . . . . . . . . . . . . . . . . . 354.1.5 The Alternate Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5. Analysis of the Energy Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.1 Existence of a Minimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1.1 The Sub-Level-Sets of J(u) are Bounded . . . . . . . . . . . . . . . . . . . . . . . . 415.1.2 J(u) is Weakly Lower Semi-Continuous . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Uniqueness of the Minimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3.1 The Subdierential ∂J(u) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.3.2 The Euler-Langrange Equation of the Regularized Functional . . . . . . . . . . . . 51

5.4 The Condition div(θ) ∈ L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6. Numerical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1 Gradient Descent Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.1.1 Explicit Time Stepping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.1.2 ADI Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.2 Split Bregman Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2.1 Iterative Regularization Using Bregman Distances . . . . . . . . . . . . . . . . . . 626.2.2 Derivation of the Split Bregman Method for Minimizing the VWP Energy . . . . . 63

4 Contents

6.3 Stopping Criteria for Minimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 676.4 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7. Image Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1 Spectral Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.1.1 Relative Dimensionless Global Error in Synthesis (ERGAS) . . . . . . . . . . . . . 697.1.2 Spectral Angle Mapper (SAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1.3 Spectral Information Divergence (SID) . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.4 Universal Image Quality Index (Q-average) . . . . . . . . . . . . . . . . . . . . . . 707.1.5 Root Mean Squared Error (RMSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.6 Relative Average Spectral Error (RASE) . . . . . . . . . . . . . . . . . . . . . . . . 707.1.7 Correlation Coecient (CC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.2 Spatial Quality Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.2.1 Filtered Correlation Coecients (FCC) . . . . . . . . . . . . . . . . . . . . . . . . 71

8. Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738.1 Comparison of Dierent Pan-Sharpening Methods . . . . . . . . . . . . . . . . . . . . . . 738.2 Comparison of VWP and AVWP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.3 Comparison of Dierent Minimization Methods for VWP . . . . . . . . . . . . . . . . . . 81

8.3.1 Alternating Directions Implicit vs. Explicit method . . . . . . . . . . . . . . . . . 818.3.2 Dependence of the Gradient Descent Methods on the Regularization Parameter ε . 828.3.3 Split Bregman vs. Gradient Descent Methods . . . . . . . . . . . . . . . . . . . . . 83

9. Extension to Hyperspectral Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879.1 Hyperspectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879.2 Acquisition of a Master Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

LIST OF FIGURES

1.1 RGB and false color image of an urban scene . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Main idea of pan-sharpening illustrated on the color bands . . . . . . . . . . . . . . . . . 10

2.1 Fourier transform of a separable scaling function and three separable wavelet functionscalculated from a one-dimensional Daubechies 4 wavelet, source [Fug] . . . . . . . . . . . 19

2.2 Second level wavelet decomposition of an image . . . . . . . . . . . . . . . . . . . . . . . . 192.3 Concept of the two-dimensional fast wavelet decomposition . . . . . . . . . . . . . . . . . 20

3.1 Low resolution multispectral image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 PCA transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Panchromatic image as a dataset after the histogram matching . . . . . . . . . . . . . . . 233.4 Reconstruction of the sharpened image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.5 Concept of wavelet image fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.6 Aliasing eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.7 Aliasing due to wavelet image fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.8 Fusing one dimensional signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.9 Reconstructed signal using the stationary wavelet transform . . . . . . . . . . . . . . . . . 273.10 Pan-sharpened image using the stationary wavelet transform . . . . . . . . . . . . . . . . 273.11 Principle of ΠS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.12 Response of the multispectral bands and the panchromatic image to the dierent wave-

length for the IKONOS satellite system, source: [CKCK] . . . . . . . . . . . . . . . . . . . 303.13 Response of the multispectral bands and the panchromatic image to the dierent wave-

length for the Quickbird satellite system, source: [OGAFN05] . . . . . . . . . . . . . . . . 30

4.1 Examples of enforcing higher curvature in 1d . . . . . . . . . . . . . . . . . . . . . . . . . 334.2 Example of enforcing higher curvature in 2d . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Matching wavelet coecients for a discrete wavelet transform . . . . . . . . . . . . . . . . 344.4 Construction of the matching image for AVWP . . . . . . . . . . . . . . . . . . . . . . . . 37

6.1 Idea of the method of steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8.1 First example image for the comparison of dierent pan-sharpening methods . . . . . . . 748.2 Second example image for the comparison of dierent pan-sharpening methods . . . . . . 758.3 Third example image for the comparison of dierent pan-sharpening methods . . . . . . . 768.4 Inuence of the parameter in VWP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.5 Comparison of VWP and AVWP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808.6 Decay of the energy in VWP and AVWP for the ADI and explicit method . . . . . . . . . 828.7 Inuence of the additional ε regularization in the ADI method . . . . . . . . . . . . . . . . 838.8 Inuence on the ε regularization of θ in Split Bregman . . . . . . . . . . . . . . . . . . . . 848.9 Energy decay for ADI and Split Bregman . . . . . . . . . . . . . . . . . . . . . . . . . . . 858.10 Split Bregman fusion result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

9.1 Selected scene of the Urban image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.2 First example for sharpening hyperspectral images . . . . . . . . . . . . . . . . . . . . . . 889.3 Second example for sharpening hyperspectral images . . . . . . . . . . . . . . . . . . . . . 899.4 Third example for sharpening hyperspectral images . . . . . . . . . . . . . . . . . . . . . . 899.5 Fourth example for sharpening hyperspectral images . . . . . . . . . . . . . . . . . . . . . 909.6 Pixel to investigate the spectral signature of . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6 List of Figures

9.7 Spectral response of pixel (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909.8 Spectral response of pixel (2) and (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919.9 Normed spectral response of pixel (2) and (3) . . . . . . . . . . . . . . . . . . . . . . . . . 91

Acknowledgements

I would like to thank

Martin Burger and Andrea Bertozzi for their advise and support throughout my whole thesis. Thankyou for giving me the opportunity to do the research for my thesis at the University of California, LosAngeles.

Todd Wittman for all his guidance and the time he spent answering my questions.

Hem Wadhar, Paul Jones, Jan Hegeman, Ralf Engbers and Franziska Schneider for proof-reading, usefulmathematical discussion and their Latex advise.

Melissa Strait, Sheida Rahmani and Daria Merkurev for provinding the Matlab code for the image qualitymetrics.

Luminita Vese, Stanley Osher, Tom Goldstein, Ernest Esser, Julia Dobrosotskaya and Jarome Darbonfor their discussions on image processing, wavelets, minimization problems and functional analysis.

My family, Maria, Gerd, Jan and Meike, my girlfriend Theresa as well as Marianne, Charifa, Holger andFinn.

All my friends from Münster and Los Angeles and everyone who supported and motivated me throughoutthe thesis.

This work was supported by the US Department of Defense, ONR grant N000140810363, NSF grantACI-0321917 and NSF grant DMS-0601395.

This thesis is dedicated to the memory of my beloved mother, Maria Möller.

1. INTRODUCTION

1.1 Pan-Sharpening

Satellite images can be important for several dierent detection and classication tasks. Modern satellitessuch as Quickbird and Landsat-7 satellites not only take ordinary color images with sensors for red, greenand blue (RGB), but also include near-infrared or infrared bands. Usually multispectral images consistof four to seven spectral bands that can be of great help for identication tasks, since the human eye failsto capture this information. For instance, vegetation responds much stronger in the infrared band thanman-made materials. This dierence can be used for the detection of vegetation and camouage in thesatellite images. Multispectral imagery is used for many other military applications such as perspectiveviews (with the help of terrain elevation data) and relocatable target graphics. Further applications areestimating water depth, soil moisture content, or seeing the presence of res ([Wik, Pri]). One advantageof multispectral imagery is illustrated in Figure 1.1. The left image shows a common RGB image of anurban scene while we display the near-infrared, red, green bands on the right. All vegetation has a strongresponse in the near-infrared channel, such that in the right image the vegetation appears red opposedto its natural color. The identication of vegetation is much easier and a green man-made object in aforest would immediately be visible in the false color image.

Fig. 1.1: RGB and false color image of an urban scene

The downside of having more spectral information is that these images can only be taken at ratherlow spatial resolution. It is an optics problem in sensor design that the more precise the frequency range asensor records is the lower is its spatial resolution. Therefore, many satellite imaging systems produce a so-called panchromatic image to accompany the multispectral imagery. A panchromatic image is a grayscaleimage which is taken over a wide range of frequencies. It usually spans the range of all multispectralbands, but has much higher spatial resolution. For example, the Quickbird satellite produces four-bandmultispectral images with 2.4m resolution and panchromatic images with 0.6m resolution.

At a resolution of 2.4 m it might become hard to see and identify certain objects. An analyst mightrun into problems because of the low spatial resolution of the multispectral data: when the multispectralanalysis indicates a car (as an example of a man made object) on the lawn it will only be a 2 pixel object.No shape information is available, so it becomes almost impossible to distinguish between noise (i.e. afalse alarm) and an object of interest.

The panchromatic image has a four times higher resolution and therefore sixteen times more spatialinformation. Shapes are much easier to identify in the panchromatic image, but here, of course, nospectral information is available. An image with both good spectral and good spatial resolution would be

10 1. Introduction

desirable. The technique of using the panchromatic image to enhance the resolution of the multispectralimage is called sensor fusion or pan-sharpening.

The idea of pan-sharpening is illustrated in Figure 1.2.

Fig. 1.2: Main idea of pan-sharpening illustrated on the color bands

The goal of pan-sharpening is to combine the high spatial resolution of the panchromatic image withthe precise spectral information of the multispectral image. The resulting image should have high visualquality to aid in detection and classication tasks. However, the pan-sharpened image should also containthe same spectral (color) information as the original multispectral data for precise identication of targets.This becomes especially important as the number of bands increases. Currently, hyperspectral sensorscan take images in up to two hundred and ten dierent frequencies. Since every material has a uniquespectral signature these bands can even be used to identify exactly the material a certain object is madeof. Again, having additional shape information would greatly help an analysist and decrease the rate offalse alarms. Therefore, the pan-sharpened image should possess both high spatial and spectral quality.

In this thesis we propose a new method for pan-sharpening that naturally extends to higher dimen-sional imagery and especially focuses on preserving the spectral information. Our new and very exibleapproach is a variational method that involves the usage of wavelets. We will therefore give an introduc-tion to wavelets and their extension to two dimensions in Chapter 2. In Chapter 3 recent pan-sharpeningmethods are presented before we describe our variational wavelet pan-sharpening (VWP) energy modelin Chapter 4 and examine basic properties like the existence and uniqueness of a minimizer as well as op-timality conditions in Chapter 5. The optimality conditions will lead us to the numerical implementationin Chapter 6. To evaluate the quality of the results VWP gives in comparison to other pan-sharpeningmethods, we will take eight image quality metrics into account which we will briey introduce in Chapter7. Numerical results for pan-sharpening will be presented in Chapter 8 before we show the extension ofour method to hyperspectral imagery in Chapter 9. Finally, we draw conclusions and suggest furtherareas of research in this eld in Chapter 10.

1.2 Variational Image Processing

Variational image processing has experienced a greatly increasing popularity during the last 30 years. Alarge number of methods and applications have been proposed. In this section we will explain the ideaof variational image processing. The theoretical background will be given in Chapter 5 where we analyzeour energy functional. This section is just a brief introduction to the ideas and concepts of variationalmodels. For more detailed information we refer to [CS05] and [AK06].

The idea behind variational methods is based on simple calculus extremum problems. For example,take the following problem: we want know the height and the radius of a cylindrical can to have minimalmaterial needs for a given volume of the can. Mathematically speaking this means we want to minimizethe surface of the can under the constraint that the volume V is given, i.e. nd the minimizer ofA(h, r) = 2πrh+ 2πr2 such that V (h, r) = πr2h equals a given volume V0.

This example is, of course, easy to solve exactly (and the minimum surface is reached for r0 =(V0

2π )1/3, h0 = 2r0). Now we can further extend the problem: the designer of the cans does not like theecient shape of the cans. Although the surface for a given volume is minimal for h0 = 2r0 he would liketo make the can look more fancy by stretching it out and decides that for the can to be better lookingit should have height h = 3r. Clearly having the most ecient shape and choosing h = 3r is not possible.One has to decide how important the minimal surface is in comparison to the preferred shape h = 3r.

1.2. Variational Image Processing 11

To do so one could minimize a combination of both constraints and weight the importance of the shapeterm with a parameter λ, i.e.

(r0, h0) = argminr,h

[A(r, h) + λ(h− 3r)2] s.t. V (h, r) = V0. (1.1)

The term (h− 3r)2 now ensures that the actual height is close to the desired height 3r. The larger λ getsthe more important it will be for this term to be small and h will approach 3r.

The example of nding a can with small surface close to a desired shape of course is constructed forillustration purposes. The idea behind variational image processing methods, however, is the same. Forcertain problems, we design a so-called energy (functional) depending on an image where a low valueof the energy functional corresponds to a good quality image. The main dierence is that in the abovecan example we were minimizing for two variables r, h ∈ R. Images are functions themselves that mapan image domain Ω onto R (every point of our image gets a certain intensity). The minimization willtherefore not be over Rn but over a function space (typically a Banach space). The energy one minimizesbecomes a functional (a function depending on a function).

The rst step of creating a variational method is to design the energy functional. In our above canexample we had two easy constraints, namely the surface should be minimal and the height of the canshould be close to three times the radius which then lead to minimizing [A(r, h)+λ(h−3r)2]. Let us lookat a problem where variational methods are used for image processing: assume we have a noisy imageu0, which consists of the true image f and some Gaussian noise n with variance σ.

To denoise this image variationally the challenge is to nd a good energy functional which we wantto minimize and get the denoised image as the argument that minimizes the functional. Here the choiceof the energy is much less obvious than for our simple can example.

One goal is that our denoised image should be close to the image we were given, since we still want topreserve the main features of the image. On the other hand we want to reduce the noise. A characteristicof noise is an extremely fast changing, alternating gradient. We therefore want to punish large gradientvalues. The idea could then be to dene the energy functional

J(u) =∫

(u− u0)2 dx+ λ

∫|∇u| dx. (1.2)

This energy functional is the famous ROF-model or model of total variation (TV) [ROF92] which hasbeen applied in various modications to a great number of image processing tasks. The rst term assurethat the nal image is close to the image we were given, the second term reduces the noise of the image.As for our previous example we have a parameter (or Lagrange multiplier) λ that weights the importanceof the two terms. Decreasing λ will lead to a result that is closer to the original image which also meansthat it will be more noisy. Increasing λ will force the L1 norm of the gradient to be smaller. This willreduce the noise but also the texture in the image giving a cartoon like image.

Notice that opposed to the example of the shape of a can we have several more problems to consider.The rst question is in which function space we want to nd the minimizer. What is an appropriateclass of functions/images we should be looking at? (For the can it was clear that the radius and theheight of the can had to be real numbers.) Secondly, how do we nd the minimizer? For the can problemwe were in R so we could take derivatives, for a functional we have to dene rst what a derivative is.Furthermore, we need to investigate the questions of existence and uniqueness of minimizers. Finally,for general variational methods we will not be able to calculate the minimizer analytically (as in the canexample), but we will have to develop a numerical scheme for the minimization.

The modeling of the energy will be done in Chapter 4 before in Chapter 5 we address the theoreticalquestions mentioned above.

12 1. Introduction

2. OVERVIEW OF WAVELETS AND MULTIRESOLUTION ANALYSIS

Our new approach to pan-sharpening includes wavelets. In this chapter we will summarize the theorybehind wavelets and show how wavelets extend to two dimensions.

2.1 Introduction

The idea of wavelet transforms is to analyze a signal's frequency information locally according to thescale at which we are looking at the signal. When we view our signal through a large window we aremore likely to see the gross features while we will notice small features when we look through a smallerwindow. The result in wavelet analysis is to see both the forest and the trees, so to speak ([Gra95]).

A wavelet ψ is a function with zero average.∫ +∞

−∞ψ(t) dt = 0. (2.1)

This so called mother wavelet ψ is dilated with a scale parameter c and translated by u to form waveletatoms ([Mal98]) given by

ψu,s(t) =1√sψ

(t− us

). (2.2)

The wavelet transform of a signal f ∈ L2(R) then is the response of all dierent wavelet atoms to thesignal:

Wf(u, s) =∫ +∞

−∞f(t)

1√sψ∗(t− us

)dt. (2.3)

Unlike the Fourier transform where signals are analyzed by superposition of sinusoidal functions, thewavelet transform allows us to use functions of local support for the analysis. Therefore, wavelets aremuch better suited for signals with sharp spikes and discontinuities.

2.1.1 Multiresolution Analysis (MRA)

The idea behind MRA is to analyze a signal at a resolution that contains only the relevant details for aparticular task. A signal f ∈ L2(R) is approximated on a discrete grid. One can think of a MRA as acollection of embedded grids on which the signal can be approximated. To shorten this introduction wedirectly focus on orthogonal MRA.

Denition 2.1.1. A sequence Vjj∈Z of closed subspaces of L2(R) is an orthonormal MRA if thefollowing six properties are satised:

1. (Nestedness) ∀j ∈ Z, Vj+1 ⊂ Vj.

2. (Shift invariance) ∀(j, k) ∈ Z2, f(t) ∈ Vj ⇔ f(t− 2jk) ∈ Vj.

3. (Dyadic Similarity) ∀j ∈ Z, f(t) ∈ Vj ⇔ f( t2 ) ∈ Vj+1.

4. (Separation) limj→+∞ Vj =⋂+∞j=−∞ Vj = 0.

5. (Completeness) limj→−∞ Vj =⋃+∞j=−∞ Vj = L2(R).

6. (Translation seed) There exists φ such that φ(t− n)n∈Z is an orthonormal basis of V0.

14 2. Overview of Wavelets and Multiresolution Analysis

To get a better understanding of this denition, let us interpret the dierent properties heuristically.The nestedness (1) is a causality property which implies that the information of our signal at a certainscale are sucient to calculate an approximation at the next coarser scale. Functions in Vj are shiftinvariant to shifts proportional to 2j (2). This encourages us to think of the Vj as uniform grids withintervals of 2j length. When one enlarges a function by a factor of two, more details can be seen.Vice-versa scaling down by a factor of two means going to the next coarser resolution. These ideas areexpressed in the dyadic similarity (3). When the resolution decreases we nally loose all details of f (4)and when it is increased we gain details. In the limit our approximation converges to the original signalf (5).

The existence of such an MRA with an orthonormal basis of V0 as in (6) is not obvious. It was shownin [Mal98, p. 225, Theorem 1] that an orthonormal basis can be obtained when a Riesz basis is given.Having such an orthonormal basis the dyadic similarity implies that for

φj,k = 2−j/2φ(t− 2jn

2j

), j, k ∈ Z (2.4)

the set φj,kk∈Z becomes an orthonormal basis for Vj for all j. This allows us to give an easy formula forthe approximation of a signal f ∈ L2(R) at a desired resolution. Because of the completeness property,f can be arbitrarily well approximated by its projection onto Vj :

fj = Pjf =∑k

〈f, φj,k〉φj,k. (2.5)

φ is called father wavelet or (more commonly) scaling function.

2.1.2 Wavelets in MRA

In the framework of multiresolution analysis wavelets correspond to the details lost going from a certainapproximation level Vj to the next coarser one Vj+1. Due to the dyadic similarity, it is sucient to lookat a pair of consecutive subspaces: V1 ⊂ V0. Projecting a signal f0 ∈ V0 onto V1, f1 = P1f0, means wipingout some details that are impossible to detect in V1 ([CS05]). Let W1 denote the range of (I − P1)|V0 ,i.e. the space of details that can be seen in V0 but not in V1. Then

V0 = V1 ⊕W1, P1W1 = 0. (2.6)

This procedure can easily be generalized to spaces Wj that suce Vj = Vj+1 ⊕Wj+1. These spaces Wj

inherit the dyadic similarity of the MRA.

ψ(t) ∈Wj ⇔ ψ

(t

2

)∈Wj+1. (2.7)

Furthermore, the completeness of the Vj yields

L2(R) = VJ ⊕∑j≤J

Wj =+∞∑j=−∞

Wj . (2.8)

A signal f ∈ L2(R) therefore is the sum of all its details at all scales.Without going into the details of the proof, one can construct a so called mother wavelet ψ such that

the family of dilated and translated mother wavelet functionsψj,n(t) =

1√2jψ

(t− 2jn

2j

)n∈Z

(2.9)

forms an orthonormal basis of Wj .

2.1.3 Computation of the Wavelet Transform

By the dyadic similarity we know that 2−1/2φ(t/2) ∈ V1 and because of the nestedness, 2−1/2φ(t/2) isalso an element of V0. Since φ(t− n)n∈Z is an orthonormal basis of V0, we can express 2−1/2φ(t/2) asthe expansion

2.1. Introduction 15

2−12φ

(t

2

)=

+∞∑n=−∞

h[n]φ(t− n) (2.10)

with

h[n] =⟨

2−12φ

(t

2

), φ(t− n)

⟩(2.11)

This h[n] can be interpreted as a discrete lter. One can prove that the scaling function of a MRAnever has zero average and can therefore be normed to a one average. This transfers to the lter implyinga lowpass condition for h[n].

An analog argumentation can be made for the wavelet function: the function ψ(t/2) is an element ofW1. Since V0 = V1⊕W1, ψ(t/2) is also an element of V0 and can therefore be expressed by the followingexpansion:

2−12φ

(t

2

)=

+∞∑n=−∞

g[n]φ(t− n) (2.12)

with

g[n] = 2−12

⟨ψ

(t

2

), φ(t− n)

⟩. (2.13)

g[n] can be interpreted as a high pass lter. Heuristically this also makes sense: The scaling functiongives an approximation of the original signal and corresponds to a lowpass lter. The wavelet functionsgive the missing information and should therefore correspond to a highpass lter containing the highfrequency detail information.

To gain a better understanding why the projection onto Vj corresponds to a lowpass lter and whythe coecients from the projection onto Wj correspond to a highpass ltering we can look at the decom-position in the Fourier domain.

The projection of f ∈ L2(R) onto Vj is given by the expansion in the scaling orthogonal basis.

PVjf =+∞∑

n=−∞〈f, φj,n〉φj,n. (2.14)

The inner products

aj [n] = 〈f, φj,n〉 (2.15)

are the discrete approximation of f at the scale 2j . This L2 product can be written as a convolution

aj [n] =∫ +∞

−∞f(t)

1√2jφ

(t− 2jn

2j

)dt = f ∗ φj(2jn) (2.16)

with φj(t) =√

2−jφ(2−jt).For the interpretation of these coecients it is helpful to look at them in the Fourier domain, because

the convolution becomes a simple multiplication there. Doing so, it turns out that the Fourier transformφ of our scaling function φ typically has its energy concentrated in [−π, π].

The Fourier transform of each function φj(t) can be computed via ˆφj(w) =√

2j φ∗(2jw) and will havealmost all of its energy concentrated in the interval [−2−jπ, 2−jπ]. The multiplication in the Fourierdomain will therefore have the eect that all Fourier coecients outside the interval [−2−jπ, 2−jπ] willbe (almost) zero. By multiplying with ˆφj(w) the high frequencies are discarded which is the reason whythe discrete approximation aj [n] can be interpreted as a low-pass ltering of f sampled at intervals 2j .


An analog argument can be made for the detail coecients dj [n] = 〈f, ψj,n〉. Again we look at theFourier transform of the projection function (in this case the mother wavelet ψ). Typically, the energy

of ψ is essentially concentrated in [−2π, −π]∪ [π, 2π] and therefore lters low and very high frequencies.Now we have an impression of the eect the projections onto Vj and Wj have and how to associate

lters g[n] and h[n] with certain wavelet and scaling functions, but how can we use our knowledge toderive a scheme for fast calculations of wavelet decompositions and reconstructions?

The most common algorithm is the fast orthogonal wavelet transform. It computes each level ofdecomposition from the approximation coecients of the previous level using the lters h[n] and g[n].

Theorem 2.1.2. Let the approximation coecients of the projection onto Vj be aj [n] = 〈f, φj,n〉 andthe detail coecients of the projection onto Wj be dj [n] = 〈f, ψj,n〉. We further denote x[n] = x[−n].

The next level of decomposition can be calculated with the help of the previous using the lters h[n]and g[n] as follows:

aj+1[p] =+∞∑

n=−∞h[n− 2p]aj [n] = aj ∗ h[2p], (2.17)

dj+1[p] =+∞∑

n=−∞g[n− 2p]aj [n] = aj ∗ g[2p]. (2.18)

(2.19)

Proof. Because of the nestedness of the Vj one can express φj+1,k with the help of φj,k.

φj+1,k =+∞∑

n=−∞〈φj+1,p, φj,n〉φj,n. (2.20)

We change the variable t to t′ = 2−jt− 2p and obtain:

〈φj+1,p, φj,n〉 =∫ +∞

−∞

1√2j+1

φ

(t− 2j+1p

2j+1

)1√2jφ∗(t− 2jn

2j

)dt

=∫ +∞

−∞

1√2φ

(t′

2

)φ∗(t′ − n− 2p)dt′

=⟨

1√2φ

(t′

2

), φ∗(t′ − n− 2p)

⟩= h[n− 2p].

Similarly, Wj+1 is also a subset of Vj . Thus, ψj+1,p can also be expressed by the φj,n:

ψj+1,p =+∞∑

n=−∞〈ψj+1,p, φj,n〉φj,n. (2.21)

Doing the same calculation as above proves that

ψj+1,p =+∞∑

n=−∞g[n− 2p]φj,n. (2.22)

These two decomposition formulas show that aj+1 and dj+1 can be computed by ltering with h org followed by a subsampling of factor two respectively.

Vice-versa ner scale approximation can be computed from the detail and approximation coecientsof the coarser scale.

Theorem 2.1.3. With x[n]= x[p] if n=2p or 0 if n=2p+1 the following reconstruction formula holds:

aj [p] =+∞∑

n=−∞h[p− 2n]aj+1[n] +

+∞∑n=−∞

g[p− 2n]dj+1[n]

= aj+1 ∗ h[p] + dj+1 ∗ g[p]. (2.23)

2.2. Orthogonal Wavelets in Two Dimensions 17

Proof. Vj+1 andWj+1 are orthogonal complements and their direct sum gives Vj . That is the reason whythe union of their orthogonal basis is an orthogonal basis of Vj and φj,p can be expressed as the followingexpansion:

φj,p =+∞∑

n=−∞〈φj,p, φj+1,n〉φj+1,n

++∞∑

n=−∞〈φj,p, ψj+1,n〉ψj+1,n. (2.24)

The relations 〈φj+1,p, φj,n〉 = h[n−2p] and 〈ψj+1,p, φj,n〉 = g[n−2p] from the proof of the decompositionformula then give Equation 2.23.

Numerically one just needs the lters h and g to a certain wavelet basis. Then the decompositioncan be done by ltering with g and h followed by a downsampling. To reconstruct the previous level ofapproximation, one upsamples the coecients (by inserting zeros between each pair of sampling values)and lters the upsampled coecients with h and g respectively. Adding these results up gives thereconstruction.

This procedure is called fast orthogonal wavelet transform. In applications we usually deal with nite,digital signals. The fact that the signal is digital is easy to incorporate in our calculations since beingdigital means nothing but having an approximation at a certain scale. We therefore just assume thatour input signal is given by the coecients of some VJ already. The simplest way of solving the problemthat the signal is nite is by periodization. In the algorithm this leads to circular convolutions with thecorresponding lters.

So far we only looked at one dimensional signals, i.e. f ∈ L2(R), but how can the wavelet transformbe extended to images, to two dimensional signals?

2.2 Orthogonal Wavelets in Two Dimensions

2.2.1 Extension of Orthonormal MRA to Two Dimensions

In this section we follow the concept of [Mal98] to introduce two-dimensional MRA. One idea for theextension could be the following: To any orthogonal wavelet basis ψj,n(j,n)∈Z2 of L2(R) one can associatea wavelet orthonormal basis of L2(R2) by

ψj1,n1(x1) ψj2,n2(x2)(j1,n1,j2,n2)∈Z4 . (2.25)

Using this basis would mean analyzing a signal at two dierent scales along the two dierent directionswhich is inappropriate for most tasks in image processing. We would rather like to analyze both directionsat the same scale. This idea leads to a dierent choice of wavelet basis and the so called separablemultiresolutions.

We want to adapt the concept for MRA to the two dimensional case and do this in a straight forwardway: We call the orthogonal projection of an image f(x1, x2) at the resolution 2−j the approximation ona space V 2

j ⊂ L2(R2), which is the set of all approximations at that scale. These subspaces V 2j have to

fulll exactly the same properties as for MRA in the one dimensional case (Denition 2.1.1).If we only take separable multiresolutions into account then the two dimensional subspaces can be

decomposed into a tensor product of spaces:

V 2j = Vj ⊗ Vj . (2.26)

Any function in V 2j can be expressed as a linear expansion of the following form:

f(x1, x2) =+∞∑

m=−∞a[m]fm(x1)gm(x2), (2.27)

with fm ∈ Vj , gm ∈ Vj .


One can prove that there exists a orthonormal basis of each V 2j generated by dilation and translation of

the two dimensional scaling function φ2(x) = φ(x1)φ(x2) whereas φ is a one-dimensional scaling function.This basis is given by

φ2j,n(x) = φj,n1(x1)φj,n2(x2) =

12jφ

(x1 − 2jn1

2j

)φ

(x2 − 2jn2

2j

)n∈Z2

. (2.28)

We would, of course, like to keep the concepts we know from the one dimensional case, i.e. having awavelet subspace that stands for the details lost by going to the next coarser resolution:

V 2j = V 2

j+1 ⊕W 2j+1. (2.29)

Let W 2j+1 be the orthogonal complement of V 2

j+1 in V 2j = Vj ⊗ Vj . The following theorem shows how

to construct an orthonormal wavelet basis of each subset W 2j and L2(R2) ([Mal98]).

Theorem 2.2.1. Let φ be a scaling funtion and ψ be the corresponding wavelet generating a waveletorthornormal basis of L2(R). We dene three wavelets:

ψ1(x) = φ(x1)ψ(x2) , ψ2(x) = ψ(x1)φ(x2) , ψ3(x) = ψ(x1)ψ(x2) , (2.30)

and denote for 1 ≤ k ≤ 3

ψkj,n(x) =12jψk(x1 − 2jn1

2j,x2 − 2jn2

2j

). (2.31)

Then the wavelet family

ψ1j,n, ψ

2j,n, ψ

3j,nn∈Z2 (2.32)

is an orthonormal basis of W 2j and

ψ1j,n, ψ

2j,n, ψ

3j,n(j,n)∈Z3 (2.33)

is an orthonormal basis of L2(R2).

This theorem shows how to design a two-dimensional wavelet basis. In one dimension the projectiononto the wavelet basis function gave us the details of the signal and acted like a high pass lter. Inthe two dimensional case we have three wavelet basis functions. To analyze their practical meaning wecan look at their Fourier transform just like we did for the one dimensional functions. Since we haveseparable wavelet basis, the Fourier transform of each ψk is just the product of the Fourier transforms ofthe corresponding ψ and φ. Figure 2.1 from [Fug] shows the magnitude of the Fourier transform of thethree wavelet functions and the scaling function.

The energy of the scaling function is as expected concentrated in the low frequency area and thereforeacts like low pass lter. Like in the one dimensional case a projection with φ gives an approximation ofour original data. The energy of ψ1 and ψ2 is distributed along certain directions: φ1(w1, w2) is large

at low horizontal and high vertical frequencies, whereas φ2(w1, w2) is large at high horizontal and lowvertical frequencies. Therefore, one can think of the coecients of ψ1 as the vertical details and ψ2 asthe horizontal details. ψ3 represents the diagonal details since φ3(w1, w2) is large at high vertical andhigh horizontal frequencies.

Figure 2.2 shows an example of a wavelet decomposition of an image.

2.2. Orthogonal Wavelets in Two Dimensions 19

Fig. 2.1: Fourier transform of a separable scaling function and three separable wavelet functions calculated froma one-dimensional Daubechies 4 wavelet, source [Fug]

Fig. 2.2: Second level wavelet decomposition of an image

2.2.2 The Fast Wavelet Transform in Two Dimensions

The one-dimensional fast wavelet transform can easily be extended to two dimensions.We denote for any j and any n = (n1, n2): aj [n] = 〈f, φ2

j,n〉 and dkj [n] = 〈f, ψkj,n〉 for 1 ≤ k ≤ 3.For any pair of one-dimensional lters y[m] and z[m] we write the product lter yz[m] = y[n1]z[n2] andy[m] = y[−m]. Let h[m] and g[m] be the conjugated mirror lters associated with the one-dimensionalfunctions φ and ψ as in (2.11) and (2.13).

Similar to the one-dimensional case the coecients of the next level of decomposition can be calculatedfrom the previous approximation coecients by two-dimensional convolutions with corresponding lters.Thanks to the separable wavelets these two-dimensional convolutions can be factored into one-dimensionalconvolutions along the rows and columns of the image.

aj+1[n] = aj ∗ hh[2n], (2.34)

d1j+1[n] = aj ∗ hg[2n], (2.35)

d2j+1[n] = aj ∗ gh[2n], (2.36)

d3j+1[n] = aj ∗ gg[2n]. (2.37)


A fast way of computing these coecients is to convolve the rows of aj with h and g and subsamplethe rows of each result by two. Then the columns of each of the two ltering results is convolved with hand g and subsampled again. This concept is illustrated in Figure 2.3.

Fig. 2.3: Concept of the two-dimensional fast wavelet decomposition

The reconstruction can be done in an analog way: We denote y[n] = y[n1, n2] as the image withtwice the side length as y[n] achieved by inserting zeros every other row and column. The approximationcoecients aj can be recovered from the approximation and detail coecients, aj+1 and d

kj+1, 1 ≤ k ≤ 3

by the following formula.

aj [n] = aj+1 ∗ hh[n] + d1j+1 ∗ hg[n] + d2

j+1 ∗ gh[n] + d3j+1 ∗ gg[n]. (2.38)

These two schemes allow the decomposition of images into approximation and detail coecients witha perfect reconstruction, a fast algorithm and thanks to the subsampling without having to storemore data than the amount of data in the original image. The above wavelet transform is completelynon-redundant, nevertheless, there are some task in image processing, e.g. image fusion, where thisnon-redundancy might cause problems.

In the next chapter we will describe recent pan-sharpening methods including wavelet pan-sharpeningin 3.4 and go into the details of these non-redundancy problems.

3. RECENT PAN-SHARPENING TECHNIQUES

Several methods have been proposed for pan-sharpening multispectral imagery. Many techniques expressthe panchromatic image as a linear combination of the multispectral bands, including the intensity-hue-saturation (IHS) and Brovey methods. Other methods project the images into a dierent space likePrincipal Component Analysis (PCA). Several authors have proposed using the wavelet transform toextract geometric edge information from the panchromatic image. Recently, Ballester et. al. proposed avariational method called P+XS image fusion that explicitly forces the edges of the pan-sharpened imageto line up with those in the panchromatic image ([BCIV06]). Because of the huge variety of dierentmethods and hybrid methods we will focus on summarizing the basic and most popular pan-sharpeningmethods in this chapter. The methods presented here are the methods we used for the evaluation andcomparison of our proposed method in the numerical results section.

Before we jump into the description of the actual methods, let us give a mathematical formulation ofthe problem and clarify the notation. Let us denote the true panchromatic image on the image domainΩ by Pan : Ω→ R and the true multispectral image by ~Mul : Ω→ RN . We are given the discretizationof a panchromatic image on a ne grid Ω1 with I1 × J1 pixel. This discretization shall be denoted byP ∈ RI1×J1 . The discretization of the multispectral image ~M ∈ RI2×J2×N is given on a coarser grid Ω2

with I2 × J2 pixel and consists of N dierent bands. For our Quickbird data we usually have I1 = 4I2,J1 = 4J2 and N = 4. The problem of pan-sharpening is the following: Find the discretization of the truemultispectral image on the ner grid Ω1. This fused image is denoted by ~u ∈ RI1×J1×N .

All methods involve calculations with the panchromatic image and the multispectral bands. In thecontinuous model with Pan and Muli, this is well dened, but in the discrete case P and Mi live ondierent grids. To give sense to expressions like P −Mi we must perform some interpolation, also calledupsampling, rst. Whenever calculations with the multispectral and panchromatic image are proposedMi is extended to the grid Ω1 of P by bilinear interpolation and we denote the upsampled image by ↑Mi.

3.1 IHS Image Fusion

The most popular pan-sharpening method is the intensity-hue-saturation (IHS) fusion technique. It hasbeen used as a standard procedure in many commercial packages ([CKCK]). The idea is, to transform animage from the red-green-blue (RGB) into the intensity-hue-saturation (IHS) space. The intensity imageis replaced by the panchromatic image and the inverse transform is applied.

More mathematically, the rst step is the transformation from the RGB to the IHS color space:

Iv1

v2

=

13

13

13

−√

26

−√

26

2√

26

1√2− 1√

20

↑M1

↑M2

↑M3

. (3.1)

To reduce the spectral distortion one applies so called histogram matching which assures mean andstandard derivation of the panchromatic image matches the mean and standard derivation of the intensityimage. This normalization is

P =(P − µ(P ))

σ(P )· σ(I) + µ(I), (3.2)

where µ and σ denote the mean and the standard derivation respectively. After that, P has the samemean and the same standard derivation as I. As a next step the intensity image I is replaced by the

toshiba

高亮

toshiba

高亮

22 3. Recent Pan-Sharpening Techniques

panchromatic image and the inverse transform is applied:u1

u2

u3

=

1 − 1√2

1√2

1 −1√2

−1√2

1√

2 0

Pv1

v2

. (3.3)

These two steps are equal to the fusion formulau1

u2

u3

=

↑M1 + (P − I)↑M2 + (P − I)↑M3 + (P − I)

. (3.4)

Some work has been done to also include the near-infrared band and further improve the spectralquality of this fusion method [THHC04]. The classical four-band IHS is

u1

u2

u3

=

↑M1 + (P − I)↑M2 + (P − I)↑M3 + (P − I)

, (3.5)

with I = 14

∑4i=1 ↑ Mi. In later papers I was chosen to be I =

∑4i=1 αi ↑ Mi for dierent mixing

coecients αi which were determined experimentally for the IKONOS system ([CKCK]).

To see the general assumption of all IHS methods let us look at the case where we already have thetrue high resolution image as our M true

i . Then our fusion method should keep these high resolution

images, i.e. ui = M truei , which would imply that P =

∑4i=1 αi M

truei . The main assumption for IHS

therefore seems to be that the panchromatic image is a linear combination of the dierent multispectralbands. This assumption is made by many fusion methods. In Section 3.6 we discuss this assumption.

3.2 Brovey Image Fusion

The Brovey fusion method is based on a multiplication with a ratio. The multispectral bands are nor-malized and each band is multiplied with the panchromatic image ([DYKS07]),

ui =↑Mi

1N

∑Ni=1 ↑Mi

· P. (3.6)

If we again assume that we have the true multispecral image, the fusion method should not changethis image and we obtain the same result as for IHS: P = 1

N

∑Ni=1 αiM

truei . The panchromatic image is

assumed to be a linear combination of the dierent bands.

3.3 PCA Image Fusion

Principal Component Analysis (PCA) is a statistical method and uses an orthogonal linear transformto project the data into a basis in which the greatest variance is in the rst coordinate (called the rstprincipal component), the second greatest variance on the second coordinate, and so on. Mathematically,the data is normalized to have zero mean, then the covariance matrix and its eigenvalues and eigenvectorsare calculated. The eigenvector with the highest eigenvalue is the rst principal component ([Shl05,She92]).

For the task of pan-sharpening the PCA transform is applied to all multispectral bands. The idea isthat the principal component with the highest variance should contain most of the spatial information(since edges have large changes in intensity and therefore cause large variance). After a histogrammatching the principal component is replaced by the panchromatic image to enhance the edges andincrease the spatial quality of the image. To illustrate this procedure we take a low resolution multispectralimage shown in Figure 3.1.

3.3. PCA Image Fusion 23

Fig. 3.1: Low resolution multispectral image

As mentioned above PCA is a statistical method. The image is not viewed in the spatial domain,but as a huge dataset of pixels and their intensity values. Since our multispectral image has four bandswe also get four datasets. The next step is to apply the PCA transform. The concept of an image as adataset and the corresponding PCA transform is shown in Figure 3.2.

Fig. 3.2: PCA transform

The rst principal component PC1 corresponds to the highest variance (the blue data points in Figure3.2). Next we take the panchromatic image and match its histogram to the one of the rst principalcomponent, i.e.

P =(P − µ(P ))

σ(P )· σ(PC1) + µ(PC1), (3.7)

as shown in Figure 3.3. After that the rst principal component is replaced by this panchromatic imageand the inverse PCA transform is applied to obtain the sharpened image (Figure 3.4).

Fig. 3.3: Panchromatic image as a dataset after the histogram matching

toshiba

高亮


Fig. 3.4: Reconstruction of the sharpened image

PCA does not explicitly assume the panchromatic image to be a linear combination of the multispectralbands, but it is still a linear method. In Section 8 we will see that PCA is a fast method with goodspatial resolution improvement, but it also causes spectral distortion.

3.4 Wavelet Pan-Sharpening

There have been several approaches to wavelet image fusion for various elds of application. In general,wavelet image fusion is based on three steps: First, the two images are transfered into the waveletdomain and therefore decomposed into their horizontal, vertical and diagonal geometric details and anapproximation image with the remaining information. Depending on the desired level of decomposition,this process can be repeated with the approximation image. Second, one applies a fusion rule choosingcertain wavelet coecients from each of the two image decompositions. Finally, the inverse wavelettransform is performed ([DGS05, HCB02, SLLS06]).

For the task of pan-sharpening several dierent methods and hybrid methods of wavelet fusion withother methods were proposed. For our evaluation, we follow the idea of Zhou et al. in [ZCS98] and doa second level decomposition of the panchromatic image and each band. As a fusion rule we take thedetail coecients of the panchromatic image (which should contain the geometry of the image) and theapproximation image of each band (which should contain the colors). This process is illustrated in Figure3.5.

Fig. 3.5: Concept of wavelet image fusion

As pointed out in [CH03], [OGAFN05], [HZ95] and [AABG02] the choice of wavelet transform (re-dundant or non-redundant) heavily inuences the fusion result. In the next section, we discuss why usinga non-redundant representation, i.e. subsampling each decomposition step, might become a problem forimage fusion.

toshiba

高亮

3.4. Wavelet Pan-Sharpening 25

3.4.1 Possible Problems in Wavelet Image Fusion

What is aliasing?

The usual way of discretizing an analog signal is to record its sample values at certain intervals T .An approximation of the true signal can then be recovered by interpolation. In 1935 the mathematicianEdmund Taylor Whittaker proved a theorem that gives a sucient condition on the support of the Fouriertransform f of a signal to compute f(t) exactly by its sample values f(nT )n∈Z . Shannon rediscoveredthis theorem and applied it to communication theory ([Mal98]).

Theorem 3.4.1 (Shannon, Whittaker). If the support of f is included in [−π/T, π/T ] then

f(t) =+∞∑

n=−∞f(nT )hT (t− nT ) (3.8)

with

hT (t) =sin(πt/T )πt/T

. (3.9)

For any application this means that we get a perfect reconstruction of our original signal, if f doesnot change too much with respect to the size of our sampling interval T . The function f should containonly low frequencies.

This eect can become a problem in image processing when images are subsampled, as illustrated bythe following example taken from [ali].

The left image in Figure 3.6 shows a section of a brick wall, where the original image has a size of2048 × 1024 pixel. After subsampling the original image by a factor of twelve the image should looklike the middle image, but actually performing the subsampling gives us the right image. It is dicultto identify the brick wall at all. A lot of tiny bricks are almost completely obscured by the dominanthorizontal mortar bands and clearly the repeating pattern is at a frequency very dierent from that ofthe bricks.

Fig. 3.6: Aliasing eects

This eect is called aliasing. By shrinking the size of our original image by twelve our interpolationinterval becomes too small for the frequency at which the brickrows repeat and the Whittaker conditionis violated. The sampling interval is very close to the brick spacing such that our sampling might fall ona mortar pixel several times in a row. This is the reason why the subsampled image looks wrong.

Why might the fast discrete wavelet transform be inappropriate for wavelet image fusion?

We have seen above that the discrete fast wavelet transform gives a perfect reconstruction although itdownsamples the signal every level of decomposition. This is very remarkable when we think of the worstcase, when a signal [1, 0, 1, 0, 1, 0] has to be recovered from [0, 0, 0]. The aliasing eects have to becanceled out during the reconstruction using the detail wavelet information. The property of aliasingcancellation and therefore perfect reconstruction puts very strict constraints on the lters (and is alsothe reason why most of the wavelets look rather complicated). As proved in Chapter 2 there are MRAwhose discrete wavelet transform (DWT) does yield perfect reconstruction. However, this kind of wavelet

toshiba

高亮


transform is very unstable towards manipulation of the wavelet coecients. Any wavelet image fusionmodel will take some coecients from one image and some from another and get the reconstructed imageby applying the inverse wavelet transform to the selected coecients. In this case, aliasing eects do notnecessarily cancel.

An example of this instability is the fact that the DWT is not translation-invariant. Subsampling thesignal [1,0,1,0,1,0, ...] it makes a huge dierence if one drops all ones (approximation [0,0,0, ...]) or allzeros (approximation [1,1,1, ...]) which would correspond to a shift by one pixel of the image.

This eect can be seen in wavelet pan-sharpening. The result of a wavelet pan-sharpened image usinga non-redundant orthogonal wavelet transform is shown in Figure 3.7. The aliasing eects can be seenaround the edges of the images, as some staircasing occurs.

Fig. 3.7: Aliasing due to wavelet image fusion

To get a better understanding of the fusing process, we look at a very simple one-dimensional exampleof a peak signal and perform the same process, as we would for pan-sharpening an image. The left andthe middle image in Figure 3.8 are the original and the low resolution or blurred signal respectively. Theblurred signal has a slightly smaller slope to the peak. We decompose the original and the blurry signalvia a third level discrete wavelet transform. Our fusion rule is to take all detail coecients of the originalsignal and the approximation coecients from the blurry signal. (In this case the sharpening of coursedoes not make any sense, since the original signal is given. We just want to illustrate what happensduring the sharpening process.) With our new wavelet coecients we perform the inverse transform andget the result on the right of Figure 3.8.

Fig. 3.8: Fusing one dimensional signals

One can see that the upper part of the peak is very similar to the original signal, but the bottom isa very bad approximation. Notice that the bottom part is even worse than in the blurred signal. Thereconstruction is non-zero in parts where even the blurred signal is not! Furthermore, the reconstructionis not smooth, but has two steps. This is exactly the staircasing eect we see in wavelet image fusion

3.4. Wavelet Pan-Sharpening 27

using the fast orthogonal wavelet transform as shown above. Therefore, a dierent method for injectinghigh frequency components, a method that gives more realistic reconstructions is desirable.

Why can this problem be solved using the stationary wavelet transform?

Many authors have pointed out that the discrete wavelet transform is not very well suited for image fusion([CH03, OGAFN05, HZ95, AABG02]). To avoid aliasing eects and to gain translational invariance onecan use the a trous or stationary wavelet transform. The motivation and the theory behind this kind ofwavelet transform is almost the same as for the orthogonal wavelet transform described above. The onlydierence is that the signal is not subsampled after each decomposition. We keep more information thanneeded to reconstruct the original signal - the stationary wavelet transform is redundant. To analyze thesignal at dierent scales although it is not downsampled we have to upsample the lters instead. Zerosare inserted in the lters every other pixel. The name a trous comes from this insertion of zeros and isFrench for with holes.

In this transform we do not subsample the images which gives us translational invariance. Aliasingdoes not occur. The stationary wavelet transform is therefore much more robust against manipulatingcertain coecients since it averages over the redundant information during the reconstruction process.Doing the same experiment of decomposing and fusing a peak function with the a trous transformationwe get the result shown in Figure 3.9.

Fig. 3.9: Reconstructed signal using the stationary wavelet transform

One can see that the resulting function is much smother than the orthogonal wavelet reconstruction.The bottom of the signal is less broad and seems to give a much better approximation of the originalsignal. This is the reason why the stationary wavelet transform is much better suited for the task of imagefusion. This huge dierence in smoothness can also be seen by comparing the wavelet pan-sharpenedimage using the redundant transform with our previous example.

Fig. 3.10: Pan-sharpened image using the stationary wavelet transform

toshiba

高亮

toshiba

高亮

toshiba

高亮


We can conclude that the stationary wavelet transform clearly outperforms the orthogonal wavelettransform, although we have to admit that a trous transform also has some disadvantages: For eachlevel of decomposition, we gain three times the amount of data our original image had. Therefore, thetransform itself takes much longer to calculate and much more memory is needed. This especially increasesthe runtime for variational methods incorporating wavelet and spatial terms, when transformations fromthe wavelet to the spatial domain need to be done at each iteration.

3.5 P+XS Image Fusion

The P+XS method recently proposed by Ballester et al. ([BCIV06]) uses a variational method to performpan-sharpening. Their energy functional consists of three terms:

1. Linear combination matching term

This term is obtained from the assumption that the panchromatic image is a linear combinationof the dierent bands of the multispectral image with some mixing coecients αi. That means if~u(x) is the true high resolution multispectral image one assumes that P (x) = α1u1(x) +α2u2(x) +α3u3(x) + α4u4(x). Therefore, as a rst term of the energy functional one uses the square of the

L2 norm of the dierence between the linear combination∑4n=1 αnun and the panchromatic image

P (x) ∫Ω

(4∑

n=1

αnun − P

)2

dx.

This assumption is common among various pan-sharpening methods and will be discussed in Section3.6.

2. Matching term for the color of the low resolution image

The second term matches the colors we know from the low resolution multispectral image Mn bythe assumption that every low resolution pixel is formed from the high resolution pixel by a low-passltering followed by a subsampling. Therefore, one would like to minimize the following term:

4∑n=1

∫Ω

ΠS((kn ∗ un)− ↑Mn)2dx.

Here kn denotes a convolution kernel. In the P+XS paper, the convolution kernel was said to beknown from the specic satellite data, but not explicitly given in the paper. We therefore used a5 × 5 pixel Gaussian kernel. ΠS is a Dirac comb that indicates which of the colored pixels in thehigh resolution image we actually know from the low resolution image. The principle of the discreteDirac comb is shown in Figure 3.11.

Fig. 3.11: Principle of ΠS

ΠS has to be calculated by a registration algorithm. The issue of registering the panchromatic andmultispectral image occurs for every method and will be addressed in Section 6.4. In the exampleabove, once we have the registration, the Dirac comb is implemented by taking every fourth pixelto be one.

toshiba

高亮

3.6. The Linear Combination Assumption 29

3. Matching term for the geometry of the panchromatic image

The third term is based on the assumption that the geometric information of an image is containedin its level sets, independent of their actual level ([CCM02]). The level sets of an image can berepresented by the vector eld consisting of all unit normal vectors of those level sets. If we denotethe unit normal vector eld to all level set in the panchromatic image by θ, P satises θ·∇P = |∇P |.

One aligns the normal vectors of their level sets to constrain that each multispectral band has thesame level sets as the panchromatic image. Therefore, every band of the restored image shouldsatisfy |∇un| − θ · ∇un = 0 which only holds for ∇un ‖ θ. The integral over the sum of these termsis added to the energy functional. This constraint can be weighted for each band separately byintroducing parameters γn. After integration by parts, this leads to the energy term.

N∑n=1

γn

∫Ω

(|∇un|+ div(θ) · un

)dx. (3.10)

As shown in [BCIV06] the vector eld θ can be calculated almost everywhere under certain, general

assumptions. In practice [BCIV06] calculated θ(x) = ∇P (x)|∇P (x)| if |∇P (x)| 6= 0 and θ(x) = 0 elsewhere.

For our data numerical experiments showed that this calculation of θ introduces a lot of noise since no

regularization is used for the derivatives of P , which is why we used θ(x) = ∇P (x)|∇P (x)|ε with |∇P (x)|ε =√

(DxP )2 + (DyP )2 + ε2 for a small value of ε instead.

This rather intuitive explanation of aligning level sets has a strong link to the concept of usingBregman distances for regularization as proposed by [OBG+05]. We will discuss this connection in detailin Chapter 4.

Adding the three P+XS terms up gives the resulting energy functional

J(Xn) =N∑n=1

γn

∫Ω

(|∇un|+ div(θ) · un

)dx

+λ∫ω

( N∑n=1

αnun − P)2dx

+µN∑n=1

∫Ω

ΠS

((kn ∗ un)−Mn

)2dx. (3.11)

The P+XS model does a great job in sharpening the image and one can produce a variety of dierenttypes of images by adjusting the parameter. However, P+XS still incorporates the linear combinationassumption. Furthermore, the Dirac comb and the convolution kernel must be determined which aregenerally neither given nor obvious to implement.

3.6 The Linear Combination Assumption

We have mentioned before that many methods assume that there is a linear relationship between thepanchromatic image and the dierent bands of the form

P =N∑i=1

αiui. (3.12)

To get an idea of this kind of approximation we can look at the spectral response of the dierent sensorsfor dierent satellite systems like IKONOS (Figure 3.12) or Quickbird (Figure 3.13).

toshiba

高亮

toshiba

高亮


Fig. 3.12: Response of the multispectral bands and the panchromatic image to the dierent wavelength for theIKONOS satellite system, source: [CKCK]

Fig. 3.13: Response of the multispectral bands and the panchromatic image to the dierent wavelength for theQuickbird satellite system, source: [OGAFN05]

Both graphics show the eect of a certain wavelength of light (x-axis) on the intensity or illuminationof the image (y-axis) for the four multispectral, as well as the panchromatic sensors. The assumptionnow states that we can approximate the panchromatic signal by a linear combination of the four sensors.Looking at both graphs this assumption does not seem to be generally true. Since the panchromaticsensor covers frequencies which are not covered by any of the multispectral sensors, this assumptionmight not even be a good approximation. Furthermore, if the panchromatic sensor does not cover allof these wavelength, one might run into problems with higher dimensional imagery. In this case alinear combination cannot be used for sharpening these bands at all because the corresponding mixingcoecients would be zero. Modern hyperspectral images include over two hundred dierent bands andgo far into the infrared spectrum. A method that could extend to an arbitrary number of bands withouthaving high requirements at the panchromatic image would therefore be desirable.

4. VARIATIONAL WAVELET PAN-SHARPENING

In this chapter we propose a new variational method for pan-sharpening. We combine the ideas fromP+XS image fusion with wavelet pan-sharpening and add additional terms to preserve the spectral qualityof the multispectral image. Furthermore, we do not make any assumptions on the panchromatic imagesuch that our method easily extends to fusing high dimensional images with an arbitrary high resolutionimage. We will talk more about the extension to hyperspectral images in Chapter 9.

4.1 Energy Functional

Several groups demonstrated the great potential of combining PDE and wavelet methods [DB08, Mal02,CSZ06b]. Following the same idea we want to introduce a matching term to a wavelet fused image andenhance texture by incorporating the geometry matching term from the P+XS model. Carrying out partsof the minimization in the wavelet domain allows us to use dierent parameters for the dierent levelof the wavelet decomposition. Furthermore, we add two more terms that help to improve the spectralquality and preserve the relation between the dierent bands. In the following we will develop our energyfunctional by describing each term separately.

4.1.1 Geometry Forcing Term

To force the geometry of our sharpened multispectral bands to be the same as in the panchromatic image,we use the same term as the P+XS model which was described in Section 3.5. In particular, we want toalign all level lines of each multispectral band with the level lines of the panchromatic image. Since thisis the main idea for introducing the geometry of the panchromatic image into the multispectral bands,we want to summarize some results which allow the conclusion that to a large extent, the geometry ofan image is determined by its level lines [BCIV06, Ser82, CM99, ACMM01].

The data that needs to be examined and processed are digital images. The sensor of a camera transfersthe continuum of light energies to a nite interval of values by means of a nonlinear contrast functionto record such images ([BCIV06]). This contrast depends on sensor properties as well as on physicalconditions like illumination and the reection properties of the photographed object. These propertiesare generally unknown but should not inuence the observed geometry of the scene. Hence, we canassume that we know images only modulo an arbitrary contrast change. Therefore, an image u becomesa representative of a whole equivalence class of images. We consider all images v to be in this equivalenceclass that can be obtained from u via contrast change, for instance by v = g(u) for a continuous strictlyincreasing function g ([BCIV06]). We can conclude that an image is characterized by its upper level setsXλ = x|u(x) ≥ λ, because these level sets are invariant towards contrast changes. Furthermore, thecharacterization by level sets is complete, since an image can be recovered from its given level sets by

u(x) = supλ|x ∈ Xλ. (4.1)

This justies the conclusion that the geometric information of an image is contained in the family of levelsets of the image ([BCIV06]). The level sets themselves are given by their boundaries ∂Xλ. Ambrosioet al. showed in [ACMM01] that functions whose upper level sets are sets of nite perimeter (whichis particularly the case for functions of bounded variation) can be described by a countable family ofJordan curves with nite length. Furthermore, it is shown that for images in BV we can compute theunit normal vectors to these Jordan curves almost everywhere. To extend this representation of the

level lines to the whole domain we use the numerical regularization θ(x) = ∇P (x)|∇P (x)|ε with |∇P (x)|ε =√

(DxP )2 + (DyP )2 + ε2 for a small ε. The energy functional should now punish images u whose bandsun do not have the same unit normal vectors as θ. Just like the P+XS model we use

∫[|∇un|−θ ·∇un]dx

to enforce this constraint. Notice that∫

[|∇un| − θ · ∇un]dx = 0 holds if and only if ∇un is parallel to θ.

32 4. Variational Wavelet Pan-Sharpening

For an exact unit normal vector eld θ this would mean that the level lines of the two images coincide.By introducing the additional regularization parameter ε this is not exactly true but should be a goodapproximation for small ε.

The above paragraph motivates the term∫

Ω|∇un| dx+

∫Ωdiv(θ) · un dx from an intuitive geometric

point of view, namely that the level lines of the images should coincide to reect the same geometry.Another point of view is the more abstract question of regularization. By the idea of aligning theisocontours we got

∫Ω|∇un| dx, which is the famous total variation of un (assuming ∇un is in L1), and

an additional term which we can write as −∫

Ω(−div(θ)) · un dx.

A more rigorous form of the total (or bounded) variation of un is

|un|BV = supg∈C∞0 , ‖g‖∞≤1

∫Ω

un(∇ · g)dx, (4.2)

which equals∫

Ω|∇un| dx for un ∈ H1(Ω) but has the great advantage that it generally allows un to be

discontinuous - a property that is crucial for images with sharp edges. More details on this topic will begiven in Chapter 5.

For the second term we notice that (−div(θ)) is in the subdierential of the TV norm at P for asmooth panchromatic image with non-zero gradient. Again, using a more rigorous form we could rewritethe second term as

−〈p, un〉 with p ∈ ∂|P |BV . (4.3)

Now we add and substract |P |BV to our geometry enforcing term and rewrite one of these terms using|P |BV = 〈p, P 〉. The resulting term is

Dp(un, P ) := |un|BV − |P |BV − 〈p, un − P 〉, (4.4)

exactly the Bregman distance between un and P with respect to total variation. Bregman distances wereintroduced in [Bre67] and rst applied as a regularization in image processing in [OBG+05]. Recently, itwas shown that Bregman distances are useful for and naturally appear in error estimates for variationalmethods ([BO04, BRH07]).

Unlike the original P+XS model, we want to be able to weight both parts of the geometry matchingterm separately by introducing parameters γ and η. Using integration by parts this leads to the energyterm.

Eg =N∑n=1

[γ

∫Ω

|∇un| dx+ η

∫Ω

div(θ) · un dx]. (4.5)

At rst glance weighting the two parts with separate parameters seems a little odd and even contra-dicting the geometric derivation from above. We found during our numerical experiments that choosinga slightly higher parameter η > γ gives better spatial quality. In the next paragraph we will look at themeaning of using dierent parameters in a mathematical sense and further motivate our decision by somesimple examples.

The optimality condition for minimizing the convex functional

γ

∫Ω

|∇u| dx+ η

∫Ω

div(θ) · u dx (4.6)

is

div

(∇u|∇u|

)=η

γdiv(θ). (4.7)

First of all notice that the divergence of the normals (a quantity of the form div ∇a|∇a| ) has a geometric

meaning, i.e. the mean curvature of the level lines. The optimality condition yields that for η = γ weenforce the curvature of our desired image u to be the same as in our high resolution panchromatic imageP . If we now choose η to be greater than γ, we enforce the curvature of the desired image u to be even

4.1. Energy Functional 33

higher. The desired curvature is the one of the panchromatic image multiplied by ηγ > 1. To see the

eect of a higher curvature we implemented an evolution of the form

uk+1 = uk + τ

(div( ∇uk|∇uk|

)− η

γdiv(θ)

)(4.8)

for some example signals in one dimension where we took the data as our initialization u0. The resultsare shown in Figure 4.1. One can see that all results (green curves) have a higher contrast than the

Fig. 4.1: Examples of enforcing higher curvature. Left: Gaussian signal, middle: step function, right: sinusoidalsignal

original signals. The peaks and parts of high curvatures have larger values after the evolution. For thesecond example, the step function, the result can be conrmed analytically even for higher dimensions.In the continuous model the above evolution corresponds to the partial dierential equation

∂tu = −p+ αq (4.9)

with p(t) ∈ ∂J(u(t)

), u(0) = u0, q = p(0) and α = η

γ > 1. Let u0 be the characteristic function of theunit circle. Then one subgradient is just a multiple of the function itself, i.e. q = ιu0. We therefore usethe ansatz u(t) = f(t) u0 and p(t) = υ(t) u0. Since ‖p‖∗ = 1 we get

υ(t) =1‖u0‖∗

= ι (4.10)

and our PDE becomes

u0∂tf = −ιu0 + αιu0. (4.11)

On the parts where u0 6= 0 we obtain

∂tf = (α− 1)ι, (4.12)

which gives us

f(t) = (α− 1)ιt+ f(0), (4.13)

where f(0) = 1 because u0 was the characteristic function of the unit circle. On the parts where u0 = 0the subgradient q is also zero and u(t) = 0 = u0. Hence, we can conclude

u(t) =(1 + (α− 1)ιt

)u0. (4.14)

The contrast increases monotonically in time, which is consistent with the experimental results fromFigure 4.1. For images this might lead to clear edges with a higher contrast and better visual quality.Figure 4.2 shows some pan-sharpening results with η

γ = 1 and ηγ = 1.4 as well as the dierence between

the two images. We can see that the edges are enhanced in the image where we enforced a highercurvature. The right image shows that the dierence between the images is mainly concentrated aroundthe edge set. The colors do not seem to be greatly aected. However, one has to be careful not to distortthe spectral quality of the multispectral image by introducing too many spatial details. A more detaileddiscussion on this topic will be given in Chapter 7.


Fig. 4.2: Example of enforcing higher curvature. Left: Sharpened image with η = γ, middle: Sharpened imagewith η = 1.5γ, right: absolute value of the dierence between left and middle image

4.1.2 Wavelet Matching Term

The combination of wavelets and variational methods has recently been applied to many image processingtasks ([CSZ06b, DB08, Mal02]). We perform a second level wavelet decomposition of the panchromaticimage and each multispectral band, to match the colors of the low resolution multispectral image withsharper edges. Then the high level wavelet coecients are matched to the corresponding coecients ofthe panchromatic image, while the low level approximation coecients are matched to the low resolutionmultispectral band. Figure 4.3 illustrates the choice of the matching wavelet coecients.

Fig. 4.3: Matching wavelet coecients for a discrete wavelet transform

This choice of wavelet coecients is well known in the literature and the same rule is used for waveletpan-sharpening (see Section 3.4). Following our discussion on wavelet image fusion (Section 3.4.1) weuse stationary wavelets for better fusion results. The drawback of using stationary wavelets is the slowerspeed of the transform and the large increase in data.

To formalize the wavelet matching in a mathematical context for our energy functional let us brieyrecall the wavelet notation. For a one dimensional wavelet transform, let φ be a scaling function andψ the corresponding wavelet generating a wavelet orthonormal basis of L2(R). We dene the waveletsψ1(x) = ψ(x1)φ(x2), ψ2(x) = φ(x1)ψ(x2), ψ3(x) = ψ(x1)ψ(x2) as described in Section 2.2.1 and denotefor 1 ≤ k ≤ 3, j ∈ Z and n = (n1, n2) ∈ Z2

ψkj,n(x) =12jψk(x1 − 2jn1

2j,x2 − 2jn2

2j

). (4.15)

Further we dene a two dimensional scaling function by

φ2j,n(x) =

12jφ

(x1 − 2jn1

2j

)φ

(x2 − 2jn2

2j

). (4.16)

Then the approximation coecients of a two-dimensional function are given by the scalar product withφ2 and the three detail coecients (which can be seen as horizontal, vertical and diagonal details, Section


2.2.1) are given by the scalar product with ψk, k ∈ 1, 2, 3. We dene the approximation matchingcoecients for band i as

aij [n] = 〈↑Mi, φ2j,n〉, (4.17)

where ↑Mi denotes upsampling of the low resolution multispectral band. In our experiments we used bi-linear interpolation. The matching detail coecients are taken from the scalar product with the panchro-matic image and are equal for all dierent bands,

dk,j[n] = 〈P,ψkj,n〉, for 1 ≤ k ≤ 3. (4.18)

If we denote the desired approximation coecients for band i by αij [n] and the desired detail coecients

by βik,j[n] then we add the following term to our energy functional

Ew =∑n

c0(aiL[n]− αiL[n])2φ2j,n(x) (4.19)

+∑n

L∑j=1

3∑k=1

cj(dk,j[n]− βik,j[n])2ψkj,n(x),

where c0 is the parameter for the approximation coecient matching, cj , 1 ≤ j ≤ L are the parameters forthe dierent levels of detail coecient matching, and L is the level of decomposition. In our experimentswe used L = 2. Notice that we assumed that our continuous representations of the images are elementsof V 2

0 = span(φ20,n) since this holds for the discrete formulation anyways. For c0 = c1 = c2 this term

would become a least squares match to a wavelet fused image. In our variational context we choose theseparameters according to the type of image we would like to produce. For high spatial quality we want tointroduce the edge information of the panchromatic image and therefore increase c1 and c2. Vice-versawe would choose larger values for c0 for better spectral quality.

4.1.3 Color Preserving Term

Using just the rst two terms (4.5) and (4.20) yields very good spatial results, but we also want to enforcespectral quality. To preserve the spectral information within each band we would like to preserve thecolors of the resized multispectral image at those parts of the image that have no edges or texture. Weadd the following term to our energy functional:

Ec = ν

N∑i=1

∫Ω\Γ

(ui− ↑Mi)2 dx. (4.20)

Γ denotes the set of edges and texture in the panchromatic image which can be determined by anyappropriate edge detector. In our experiments we calculated Γ = exp

(− d|∇P |2

)with a suitable constant

d. This edge detector has been used successfully in many image processing applications such as thePerona-Malik model [PM90]. Unlike variational segmentation algorithms like Mumford-Shah [MS89], theedge-set does not have to be evolved since it is computed from the panchromatic image before.

4.1.4 Spectral Correlation Preserving Term

So far, we constrain the colors within each band, but none of the terms couples the dierent bands. Asmentioned earlier, a single pixel's spectral signature can be used for classication tasks. As the numberof bands increases the classication becomes more specic and for a large number of bands even thematerial an object is made of can be determined precisely. In this case it is crucial to preserve thefrequency information from the original low resolution multispectral image. To achieve this, we proposethat every possible ratio of two dierent spectral bands of our pan-sharpened high resolution image shouldequal the ratio of the same bands of the original multispectral image. We would like to obtain at everypixel uiuj = ↑Mi

↑Mj⇒ ui· ↑Mj−uj · ↑Mi = 0. Therefore, we add the sum of the squares of the corresponding

L2 norms to our energy functional:

Es = µ

N∑i,j=1,i<j

∫Ω

(ui· ↑Mj − uj · ↑Mi)2 dx. (4.21)


Another way of looking at this term is that it minimizes the spectral angle between each pixel frequencyvector in the low resolution and in the sharpened multispectral image. For an arbitrary xed pixel, let~a be the frequency vector in the low resolution and ~b the frequency vector in the corresponding highresolution image. The minimizer of the above energy term is a(i) · b(j) − a(j) · b(i) = 0 ∀i, j. This

can be rewritten as a(i) = a(j)b(j) · b(i) = 0 ∀i, j which proves that ~a ‖ ~b. This implies that the spectral

angle arccos(<~a,~b>

‖~a‖·‖~b‖

)is zero. The spectral angle is widely used to compare spectral information. Besides

the well known spectral quality metric SAM ([YGB92]), the spectral angle is also used in hyperspectralimaging for material comparison and classication ([Shi03]).

The Total VWP Energy

The tting terms Es (4.21) and Ec (4.20) are new, whereas Eg (4.5) is a simple modication of the P+XSmethod and Ew (4.20) puts the ideas of wavelet fusion similar to [ZCS98] with the ideas of [CH03] in avariational setting. The total energy functional can then be written as

E(u) = Ew + Eg + Ec + Es. (4.22)

This energy functional contains two dierent types of terms: three terms in the spatial domain andone matching term in the wavelet domain. Any minimization method will have to alternate between thewavelet and the spatial domain each iteration. This will slow down the whole algorithm signicantly,especially in the case of stationary wavelets. We therefore propose an alternate energy, which can beminimized entirely in the spatial domain.

4.1.5 The Alternate Energy

The variational wavelet pan-sharpening method on the alternate energy (AVWP) is based on two ideas.First, if we choose our wavelet matching coecients ck equal for each level k, then the whole term is justa matching to a wavelet fused image. Second, away from the edges the matching to the low resolutionmultispectral image gives us the best color values we can get for our image. Therefore, we combine termsEw and Ec to one matching term that matches the low resolution image away from edges and the waveletfused image on the edges. Again, the edge detection method we used for our algorithm is exp

(− d|∇P |2

).

Denoting the wavelet fused image for the nth band with Wn and the new matching image with Zn wehave:

Zn = exp(− d

|∇P |2

)·Wn +

(1− exp

(− d

|∇P |2

))· ↑Mn. (4.23)

The terms Ec and Ew in our original energy are then replaced by

Ea = ν

N∑n=1

∫Ω

(un − Zn)2 dx. (4.24)

The construction of this matching image is illustrated in Figure 4.4.


Fig. 4.4: Construction of the matching image for AVWP

AVWP minimizes the energy

E(u) = Eg + Es + Ea. (4.25)

After initializing the matching image, this energy can be minimized in the spatial domain only and willtherefore allow for much faster computation. To get a better impression of what the dierence betweenthe alternate energy and the original energy is, we can look at the optimality condition, i.e.

2ν(un − [χΓWn + (1− χΓ)Mn]) = 0 (4.26)

for the alternate energy and

2ν(un − (1− χΓ)Mn) + 2c(un −Wn) = 0 (4.27)

for the original energy where we denote χΓ = exp(− d|∇P |2

)and chose all matching wavelet coecients

equal c := c0 = c1 = c2. The matching of the multispectral image is exactly the same whereas we reducedthe wavelet matching to the edge set in the alternate energy. Choosing the parameter cj dierent forthe dierent level of wavelet decomposition gives dierent importance to the constraint that the detailcoecients of the fused image should be close to the panchromatic and the approximation coecientsshould be close to the one of the multispectral image.

In the next section we will discuss some theoretical aspects of this energy, including the existence anduniqueness of a minimizer as well as the derivation of optimality conditions which we will use to constructnumerical minimization methods in Chapter 6.

5. ANALYSIS OF THE ENERGY FUNCTIONAL

In this chapter we will discuss some basic properties of our energy functional (4.25). For the sake ofsimplicity we focus on the alternate energy in the spatial domain. In the results section we will seethat using the original energy (4.22) gives slightly better results, but is much slower. In particular, thealternate energy is much better suited for large amounts of data, i.e. hyperspectral images.

In the rst section of this chapter we will prove the existence, and in the second section the uniquenessof a minimizer before we derive an optimality condition in the third section. This condition can nally beused to construct numerical methods for nding the minimizer. Throughout this chapter we will followthe concept of similar proofs for TV denoising from [Bur07].

Before we start with the actual proof, we want to briey discuss which functional space we choose tolook for a minimizer. At the very least, we assume that the function u is integrable, i.e. u ∈ L1(Ω), butthis can not be the only regularity assumption. A smooth (even constant) function in L1 can have thesame norm as a very noisy image and we want to be able to distinguish between noise and clear images.In particular, our energy functional involves the alignment of level lines where some kind of gradient orvariation has to be dened. More specically, by having the term

∫Ω|∇u|dx in our energy we implicitly

made an assumption on the regularity of the function u we are looking for: we take the L1 norm ofthe gradient of u. Of course, classical dierentiability would be a very strong assumption it appearsmuch more reasonable to choose the Sobolev space W 1,1(Ω) = u ∈ L1(Ω)| ∂u∂xi ∈ L1 ∀i, where ∂u

∂xiis

understood in a weak sense, which means there exists a function g such that

〈u, ∂φ∂xi〉L2 = −〈g, φ〉L2 ∀φ ∈ C∞0 . (5.1)

We then identify ∂u∂xi

with g. The choice of W 1,1(Ω) as a space to minimize in would give sense to the

integral over the norm of the gradient. Unfortunately, the Sobelev space W 1,1 is too small to contain alldesired images. It is a well know fact that (sharp) edges are a crucial feature in images to visualize andunderstand their geometry ([Can86]). Mathematically speaking, edges correspond to jumps in our image.We can therefore conclude that jump functions like the simple one-dimensional example of a Heavisidefunction

H(x) =

0 for x < 01 for x ≥ 0

(5.2)

should denitely be in the function space we choose to nd a minimizer. It is easy to show, that thesekind of jump functions are not elements of Sobolev spaces. Their derivatives correspond to Dirac δ-distributions. To include the jump functions, we have to enlarge our function space and therefore alsochange our original energy functional

∫Ω|∇u|dx to a weaker formulation that makes sense for a wider

class of images. Hence, we dene the so called total variation of a function as

|u|BV = supφ∈C∞0 (Ω,Rd), ‖φ‖∞≤1

∫Ω

u div(φ) dx. (5.3)

and choose the space of bounded variation

BV (Ω) = u ∈ L1 | |u|BV <∞ (5.4)

as the space in which we search for a minimizer. Note, that for smooth functions u with a non-zero gradientwe can choose φ = − ∇u|∇u| and apply integration by parts to obtain that in this case |u|BV =

∫Ω|∇u|dx.

40 5. Analysis of the Energy Functional

5.1 Existence of a Minimizer

The proof of the existence of a minimum will be based on the fundamental theorem of optimization. Theonly assumptions that will be made are that Ω is an open, bounded domain with a piecewise smoothboundary and Zn, div(θ) ∈ L2(Ω). The assumption Zn ∈ L2(Ω) is a weak assumption, since it is verynatural to assume that a reasonable image is at least in L2. Even an image with Gaussian noise is anelement of L2. The assumption div(θ) ∈ L2(Ω) will be discussed in a little more detail at the end of thischapter.

Theorem 5.1.1. Let J : (X, τ) → R ∪ +∞ be a functional on a topological space X with topology τwhich satises the following two conditions

• Lower semi-continuity:

For any sequence uk ∈ X with ukτ→ u ∈ X

J(u) ≤ lim infk

J(uk). (5.5)

• Compactness of sub-level-sets:

∃α ∈ R such that

Sα = u ∈ X | |J(u)| ≤ α (5.6)

is nonempty and compact.

Then there exists a minimizer u ∈ X, i.e.

J(u) = infu∈X

J(u). (5.7)

Proof. If there is an α such that the sub-level set Sα is nonempty, the inmum over J(u) is nite andthere exists a minimizing sequence uk with J(uk)→ infu J(u). For k suciently large uk ∈ Sα. Since Sαis compact there must be a subsequence ukn that converges to some u ∈ Sα. The lower semi-continuitythen implies

J(u) ≤ lim infkn

J(ukn) = infuJ(u), (5.8)

which proves the existence of a minimizer.

As discussed earlier, we look at the space of bounded variation, BV (Ω), as the Banach space we wantto nd a minimizer in for each band. The fact that BV (Ω) indeed is a Banach space can be proved withthe help of the L1 lower semi-continuity (the proof can be found in [CS05]).

Let us start and try to prove that our energy functional J : BV N (Ω)→ R suces the second conditionof Theorem 5.1.1. We denote the whole energy functional by J and the part relevant for one band un byJn.

First of all note that J is bounded from below.

J(u) = ν

N∑n=1

‖un − Zn‖2L2+ µ

N∑i,j=1,i<j

∫Ω

(ui· ↑Mj − uj · ↑Mi)2 dx

+N∑n=1

[γ|un|BV + η〈div(θ), un〉L2

]. (5.9)

We use µ∑Ni,j=1,i<j

∫Ω

(ui· ↑ Mj − uj · ↑ Mi)2 dx ≥ 0, γ|un|BV ≥ 0 and expand the L2 norm of the

5.1. Existence of a Minimizer 41

matching term to obtain

J(u) ≥ ν

N∑n=1

(‖un‖2 − 2〈un, Zn〉+ ‖Zn‖2) +N∑n=1

〈ηdiv(θ), un〉L2

= ν

N∑n=1

(‖un‖2 − 2

⟨un,(Zn −

η

2νdiv(θ)

)⟩+ ‖Zn‖2

+‖Zn −η

2νdiv(θ)‖2 − ‖Zn −

η

2νdiv(θ)‖2

)= ν

N∑n=1

(‖un − Zn +

η

2νdiv(θ)‖2 + ‖Zn‖2 − ‖Zn −

η

2νdiv(θ)‖2

)

≥ −νN∑n=1

‖Zn −η

2νdiv(θ)‖2. (5.10)

Similarly, each Jn is bounded from below by Jn ≥ −ν‖Zn − η2νdiv(θ)‖2.

5.1.1 The Sub-Level-Sets of J(u) are Bounded

First of all notice that there is an α such that Sα = u ∈ X | J(u) ≤ α is not empty. J(0) for instanceis just J(0) = µ

∑Nn=1 ‖Zn‖22 such that for large enough α, 0 ∈ Sα. For a u ∈ Sα we know (following the

above calculation) that each part Jn(un) has to be bounded as well.

α ≥ J(u)

= Jn(un) +N∑

i6=n, i=1

‖ui − Zi‖2L2+ µ

N∑i,j=1,i<j,i,j 6=n

∫Ω


+N∑

,i6=n, i=1

[γ|ui|BV + η〈div(θ), ui〉L2 ]

≥ Jn(un)− νN∑

i 6=n, i=1

‖Zi −η

2νdiv(θ)‖2.

⇒ Jn(un) ≤ αn := α+ ν

N∑i 6=n, i=1

‖Zi −η

2νdiv(θ)‖2. (5.11)

To prove the compactness of Sα let us rst show that Sα is bounded in the BV norm. The Inequality5.11 implies that it is sucient to show, that each un ∈ Sαn is bounded.

‖un‖2 is Bounded

As a rst step we look at a un ∈ Sαn (for a large enough α) and try to get an estimate for ‖un‖2. Sinceun ∈ Sαn we have that

|J(un)| =∣∣∣∣γ|un|BV + η

∫Ω

div(θ) · un dx

+µN∑

i=1,i6=n

∫Ω

(ui· ↑Mn − un· ↑Mi)2 dx

+ν∫

Ω

(un − Zn)2 dx

∣∣∣∣≤ αn. (5.12)


All terms except for η∫

Ωdiv(θ) · un dx are positive. We apply the inverse triangle inequality to obtain

αn ≥ γ|un|BV + µ

N∑i=1,i6=n

∫Ω

(ui· ↑Mn − un ·↑Mi)2 dx

+ν∫

Ω

(un − Zn)2 dx− η∣∣∣ ∫

Ω

div(θ) · un dx∣∣∣. (5.13)

and (knowing that Zn, div(θ) ∈ L2) use

γ|un|BV ≥ 0 (5.14)

µ

N∑i=1,i6=n

∫Ω

(ui· ↑Mn − un· ↑Mi)2 dx ≥ 0 (5.15)

ν

∫Ω

(un − Zn)2 dx = ν(‖un − Zn‖2)2

≥ ν(‖un‖2 − ‖Zn‖2)2

= ν(‖un‖22 − 2‖un‖2‖Zn‖2 + ‖Zn‖22) (5.16)

η|∫

Ω

div(θ) · un dx| ≤ η‖div(θ)‖2 · ‖un‖2 (5.17)

and get

ν(‖un‖22 − 2‖un‖2‖Zn‖2 + ‖Zn‖22)− η‖div(θ)‖2 · ‖un‖2 ≤ αn. (5.18)

Using ν > 0, this leads to the fact that all un ∈ Sαn satisfy the quadratic inequality

‖un‖22 −(

2‖Zn‖2 +η

ν‖div(θ)‖2

)· ‖un‖2 −

(αnν− ‖Zn‖22

)≤ 0, (5.19)

which has a solution for large enough αn. The L2 norm of un is then bounded by

‖un‖2 ≤(‖Zn‖2 +

η

2ν‖div(θ)‖2

)+√(‖Zn‖2 +

η

2ν‖div(θ)‖2

)2 +αnν− ‖Zn‖22 =: C. (5.20)

‖un‖1 is Bounded

To go from the L2 norm of un to the L1 norm of un we need the assumption that our image domain isbounded to apply the following theorem.

Theorem 5.1.2. Let 0 < p < q ≤ ∞ and Ω ⊂ RN be open and bounded (|Ω| =∫

Ωdx <∞).

Then Lq(Ω) ⊂ Lp(Ω) and

‖u‖p ≤ |Ω|1p−

1q ‖u‖q ∀u ∈ Lq(|Ω|). (5.21)

Applying this to un we automatically get the boundedness of ‖un‖1 from the previous section by

‖un‖1 ≤ |Ω|12 ‖un‖2 ≤ |Ω|

12C. (5.22)

|un|BV is Bounded

We start with Inequality 5.13,

α ≥ γ|un|BV + µ

N∑i=1,i6=n

∫Ω

(ui· ↑Mn − un· ↑Mi)2 dx

+ν∫

Ω

(un − Zn)2 dx− η∣∣∣ ∫

Ω

div(θ) · un dx∣∣∣, (5.23)


but use dierent inequalities for our estimate this time:

µ

N∑i=1,i6=n

∫Ω

(ui· ↑Mn − un· ↑Mi)2 dx ≥ 0 (5.24)

ν

∫Ω

(un − Zn)2 dx ≥ 0 (5.25)

η∣∣∣ ∫

Ω

div(θ) · un dx∣∣∣ ≤ η‖div(θ)‖2 · ‖un‖2. (5.26)

This leads to

γ|un|BV − η‖div(θ)‖2 · ‖un‖2 ≤ α. (5.27)

Assuming γ > 0 we get

|un|BV ≤α+ η‖div(θ)‖2 · ‖un‖2

γ, (5.28)

and using the previous result that un is bounded in the L2 norm (Inequality 5.20) we conclude that|un|BV is bounded,

|un|BV ≤α+ η‖div(θ)‖2 · C

γ. (5.29)

Note that the assumption div(θ) ∈ L2 is needed here.

Transition to a Weaker Topology

In summary we have shown that ‖un‖BV = |un|BV + ‖un‖1 is bounded for any un ∈ Sαn , which impliesthat the set Sα of the whole energy functional is bounded. We know that in nite dimensional spaces a setis compact if and only if it is bounded and closed. Unfortunately, this is not true for innite dimensionalfunctional spaces like BV. However, we can use a weaker topology to obtain compactness and thereforedene the weak-∗ topology as follows.

Denition 5.1.3. Let X be a Banach space and X∗ it's dual space. We say a sequence vk ∈ X∗ convergesto an element v ∈ X∗ in the weak-∗ sense, vk

∗ v, if and only if 〈vk, u〉 → 〈v, u〉 ∀u ∈ X

In the weak-∗ topology the implication that boundedness implies pre-compactness holds, as theBanach-Alaoglu theorem states ([ABM05]).

Theorem 5.1.4. Let X∗ be the dual of an Banach space X and let C be a real number with C > 0.Then the set

v ∈ X∗ | ‖v‖X∗ ≤ C (5.30)

is compact in the weak-∗ topology.

Notice that this theorem states the compactness of a subset of a dual space. Our goal will therefore beto construct a space whose dual is BV such that we can apply the Banach-Alaoglu theorem. We shouldfurther mention that BV is a non-reexive Banach space, such that the dual of BV is not the space wewish to construct here. For our total energy functional we need a space whose dual space is BV N , butonce we have a Y with Y ∗ = BV we obtain (Y N )∗ = BV N .

For the construction of such a space Y , examining the space

Y0 := (c,∇ • φ)|c ∈ R, φ ∈ C(Ω,Rd), φ|∂Ω • ~n = 0= (c,∇ • φ)|c ∈ R, φ ∈ C0(Ω,Rd) (5.31)

with the norm

‖(c, ψ)‖Y = max|c||Ω|

, infφ, ψ=∇•φ

‖φ‖∞. (5.32)


We dene Y to be the completion of Y0 is this norm:

Y = Y0. (5.33)

Y is complete and hence a Banach space. We will show that the dual of Y can be identied with the BVspace.

First inclusion: BV (Ω) → Y ∗

We rst observe that every function u ∈ BV denes a linear functional on Y0 given by

lu : Y0 → R

(c, ψ) 7→ uc+∫

Ω

uψ dx, (5.34)

whereas u := 1|Ω|∫

Ωu dx. Since any (c, ψ) ∈ Y0 can be written as (c,∇ • φ) ∈ Y0 we know

|lu(c, ψ)| =∣∣∣∣ 1|Ω|

(∫Ω

u dx)c+

∫Ω

u∇ • φ dx∣∣∣∣

≤ 1|Ω|‖u‖1|c|+

∣∣∣∣∫Ω

u∇ • φ dx∣∣∣∣ . (5.35)

φ is of course bounded and we notice that C∞0 (Ω,Rd) is dense in C0(Ω,Rd) such that taking the supremumof all φ ∈ C∞0 (Ω,Rd) will give the same result as taking the supremum of all φ ∈ C0(Ω,Rd). Thereforewe get ∣∣∣∣∫

Ω

u∇ • φ dx∣∣∣∣ ≤ ‖φ‖∞ sup

φ∈C∞0 (Ω,Rd), ‖φ‖∞≤1

∫Ω

u∇ • φ dx

= ‖φ‖∞|u|BV . (5.36)

If we now use Inequality 5.36 for Inequality 5.35 we obtain

|lu(c, ψ)| ≤ 1|Ω||c| ‖u‖1 + ‖φ‖∞|u|BV

≤ max|c||Ω|

, ‖φ‖∞

(‖u‖1 + |u|BV ). (5.37)

Because this equation must hold for any φ we can take the inmum and get

|lu(c, ψ)| ≤ ‖(c, ψ)‖Y ‖u‖BV . (5.38)

lu is bounded (or continuous) on Y0 and can therefore be uniquely extended to a continuous linearfunctional on Y .

We will now show that the constructed mapping

I : BV → Y ∗

u 7→ lu (5.39)

is injective.Let u1, u2 ∈ BV (Ω) such that lu1 = lu2 . Then

0 = (lu1 − lu2)(c, ψ)

= (u1 − u2)c+∫

Ω

(u1 − u2)∇ • φ dx ∀φ ∈ C0(Ω,Rd), ψ = ∇ • φ, ∀c ∈ R

(5.40)

Since the above equation holds for arbitrary c and ∇•φ we can use c = 0 and take the supremum over allφ with ‖φ‖∞ ≤ 1 to get |u1−u2|BV = 0. The function v := u1−u2 has zero total variation, which impliesthat v is constant almost everywhere. Taking c = 1 and φ = 0 in the above equation gives us v = 0. v isa constant function with zero mean, hence v = 0 = u1 − u2, which proves that lu1 = lu2 ⇒ u1 = u2 inBV (Ω) and I is injective.

To identify BV with Y ∗ we still need to show, that I is also surjective or in other words we need aninclusion Y ∗ ⊂ BV .


Second inclusion: I : BV → Y ∗ is onto

By denition the dual space of Y consists of all continuous, linear functionals of the form l1(c)+ l2(ψ)with l1 : R→ R and l2 : G→ R, G := ψ|ψ = ∇ • φ, φ ∈ C0(Ω,Rd).

A linear functional on R is easy to characterize by

l1(c) = κ · c for a κ ∈ R. (5.41)

Let us look at l2 : G→ R. For the characterization of this functional we need the following theorem.

Theorem 5.1.5. Let Ω ⊂ Rd be a smooth and bounded domain. Let k, p, d ∈ N satisfy k · p > d. ThenW k,p(Ω) ⊂ C(Ω). Furthermore, for (k − j) · p > n we have the embedding W k,p(Ω) ⊂ Cj(Ω).

In our case d = 2, such that for p ≥ 3 we can embed W 1,p0 (Ω; Rd) = v ∈ Lp(Ω; Rd)|∇v ∈

Lp(Ω; Rd×d), v|∂Ω = 0 in C0(Ω; Rd). l2 is therefore also a continuous, linear functional on ∇ • φ|φ ∈W 1,p

0 (Ω; Rd).For any ψ := ∇ • v with v ∈W 1,p

0 we have∫Ω

ψdx =∫

Ω

∇ • v dx

= −∫∂Ω

v • ~n ds

= 0, (5.42)

such that we can conclude

Lp(Ω) := u ∈ Lp(Ω)|∫

Ω

u dx = 0

⊂ ∇ • φ|φ ∈W 1,p0 (Ω; Rd)

⊂ ∇ • φ|φ ∈ C0(Ω,Rd). (5.43)

Therefore, l2 is also a continuous, linear functional on Lp(Ω). Since Lp(Ω) by denition is a subset of Lp

we can apply the Riesz-representation-theorem for Lp spaces.

Theorem 5.1.6 (Riesz-representation-theorem for Lp spaces). Let 1 < p < ∞, L ∈ (Lp(Ω))∗. Thenthere is a unique ω ∈ Lp′(Ω), p′ = p

p−1 such that

L(v) =∫

Ω

ω · v dx ∀v ∈ Lp(Ω). (5.44)

The theorem allows us to write

l2 |Lp(Ω)(ψ) =∫

Ω

ω · ψ dx ∀ψ ∈ Lp(Ω). (5.45)

We have Lp(Ω) ⊂ ∇ • φ|φ ∈ C0(Ω,Rd), which implies that the opposite relation is true for their dual

spaces: ∇ • φ|φ ∈ C0(Ω,Rd)∗ ⊂ Lp′

(Ω). Hence, we can extend this representation of l2 to the wholespace ∇ • φ|φ ∈ C0(Ω,Rd) and get

l2(ψ) =∫

Ω

ω · ψ dx ∀ψ ∈ ψ = ∇ • φ|φ ∈ C0(Ω,Rd). (5.46)

We now dene u := κ+ ω as a function in L1(Ω) (the constant function κ is a L1 function since Ω isbounded and ω is a Lp

′(Ω) function which is a subset of L1(Ω) for bounded Ω) and look at the expression

uc+∫

Ω

u∇ • φdx =1Ω

∫Ω

(κ+ ω)dx+∫

Ω

(κ+ ω)∇ • φdx

=1Ω

[∫

Ω

κdx︸︷︷︸=κ|Ω|

+∫

Ω

ωdx︸︷︷︸=0

] + κ

∫Ω

∇ • φdx︸︷︷︸=0

+∫

Ω

ω∇ • φdx︸︷︷︸=l2(∇•φ)

. (5.47)


κ is just a constant so the rst integral gives us κ|Ω|; the |Ω| cancels out such that κc = l1(c) remains.

Since ω ∈ Lp′

, it has zero mean and the second integral vanishes. The same happens to the integral over∇ • φ which is easy to see by integration by parts and the last integral gives l2(ψ) for ψ = ∇ • φ. Thisleads to

l1(c) + l2(∇ • φ) = uc+∫

Ω

u∇ • φdx. (5.48)

We started with an arbitrary functional l1 + l2 ∈ Y ∗ and showed that there is a u ∈ L1 such thatl1(c) + l2(∇ • φ) = uc +

∫Ωu∇ • φ dx or in the notation from above, that there is a u ∈ L1 such that

I(u) = lu = l1+l2. If we can now show that u is of bounded variation we know that for every (l1+l2) ∈ Y ∗there exists an u ∈ BV such that I(u) = l1 + l2 which proves that I is surjective.

We know that l1(c) + l2(∇ • φ) is a linear, continuous functional on Y , so it has to be bounded onbounded subsets of Y . We look at the subset ‖(c,∇ • φ)‖Y = 1 and can conclude that

supc∈R,∇•φ, max |c||Ω| ,‖φ‖∞=1

(uc+∫

Ω

u∇ • φdx) < +∞ (5.49)

The rst summand is positive so we can conclude that sup∇•φ, ‖φ‖∞=1∫

Ωu∇•φdx = |u|BV <∞. Since

we know already that u ∈ L1, we get u ∈ BV (Ω) which ends the prove that I is surjective.

In summary we have constructed an isomorphism from BV (Ω) to Y ∗ such that we can identify BVwith the dual of Y and get a weak-∗ topology on BV .

Conclusion 5.1.7. We have a weak-∗ topology on BV (Ω). We say un ∈ BV (Ω) converges to u ∈ BV (Ω)in the weak-∗ sense, un

∗ u if and only if

un · c+∫

Ω

un∇ • φ dx→ u · c+∫

Ω

u∇ • φ dx ∀c ∈ R, ∀φ ∈ C0(Ω,Rd). (5.50)

Therefore, the Banach-Alaoglu theorem can be applied and we nally conclude the compactness ofthe sub-level sets of our energy functional J(u) in the weak-∗ topology.

5.1.2 J(u) is Weakly Lower Semi-Continuous

We continue our proof of the existence of a minimizer by showing that the rst condition for the fun-damental theorem of optimization, the lower semi-continuity, is met by our energy functional. This, ofcourse, has to be examined in the same weak-∗ topology we used for the compactness above.

|un|BV is lower semi-continuous

Let un ∗ u in BV (Ω). We recall the denition |u|BV = supφ∈C∞0 (Ω,Rd),‖φ‖∞≤∞

∫Ωu∇ • φ and pick a

sequence φk for which∫

Ωu∇ • φkdx converges against the supremum: |u|BV = limk

∫Ωu∇ • φkdx. The

weak-∗ convergence gives us∫Ω

u∇ • φk dx = limn

∫Ω

un∇ • φk dx

= lim infn

∫Ω

un∇ • φk dx

≤ lim infn

supφ∈C∞0 ,‖φ‖∞≤∞

∫Ω

un∇ • φ dx = lim infn|un|BV (5.51)

Now we take the limit of k →∞ and obtain

|u|BV ≤ lim infn|un|BV (5.52)

which shows that |u|BV is weak-∗ lower semi-continuous.


∫Ω

(un − Zn)2dx is lower semi-continuous

The above expression is the square of the L2 norm of un − Zn. Since the square does not aect thecontinuity we just have to show that the L2 norm ‖ · ‖2 is weak-∗ lower semi continuous in BV. For thatwe make use of the following theorem from [ABM05].

Theorem 5.1.8. Let Ω be a 1-regular open bounded subset of RK . For all p, 1 ≤ p ≤ KK−1 , the embedding

BV (Ω) → Lp(Ω) (5.53)

is continuous. More precisely, there exists a constant C which depends only on Ω, p and K, such that forall u in BV (Ω), (∫

Ω

|u|p dx) 1p

≤ C|u|BV (Ω). (5.54)

This means that in our case (for a two-dimensional Ω) BV is continuously embedded in L2 and wecan interpret BV as a subset of L2. Let us recall that BV was the dual of Y . We therefore have Y ∗ ⊂ L2,which gives us the reverse subset interpretation for the original sets (L2)∗ = L2 ⊂ Y . (Notice that L2 isa Hilbert space which gives us (L2)∗ = L2.)

With this knowledge we can approach the lower semi-continuity of the L2 norm. Since L2 is a Hilbertspace we can write

‖un − Zn‖2 = supφ∈L2, ‖φ‖2=1

〈un − Zn, φ〉2. (5.55)

We pick a sequence (φq) ⊂ L2 such that 〈un − Zn, φq〉 → ‖un − Zn‖2. From the paragraph before weknow we can interpret L2 as a subset of Y and therefore apply the weak-∗ convergence.

For a weak-∗ convergent sequence (un)k ∗ un we get that

〈un, φq〉2 = limk→∞

〈(un)k, φq〉2

= lim infk→∞

〈(un)k, φq〉2

≤ lim infk→∞

supφ∈L2, ‖φ‖2=1

〈(un)k, φ〉2

= lim infk→∞

‖un‖2. (5.56)

This nally proves the weak-∗ lower semi-continuity of the L2 norm. Notice that for a two-dimensionalimage domain Ω we did not need any further assumptions since Theorem 5.1.8 provided a continuousembedding in L2. For higher dimensions we cannot apply this theorem and therefore need to take theintersection of BV with L2 as a smaller space.∫

Ωdiv(θ)undx is Lower Semi-Continuous

Since we have div(θ) ∈ L2 ⊂ Y the weak-∗ convergence implies

〈un, div(θ)〉 = limk→∞

〈(un)k, div(θ)〉

= lim infk→∞

〈(un)k, div(θ)〉 (5.57)

and we even get equality.∫Ω

(unMi − uiMn)2dx is Lower Semi-Continuous

As for the other L2 term, we use that it is sucient to show that ‖unMi − uiMn‖L2 is lower semi-continuous and that we can write

‖unMi − uiMn‖L2 = supφ∈L2, ‖φ‖2≤1

〈φ, unMi − uiMn〉. (5.58)


Let φk ∈ L2 be a sequence such that 〈φk, unMi − uiMn〉 → ‖unMi − uiMn‖L2 . Then

〈φk, unMi − uiMn〉 = 〈Miφk, un〉 − 〈Mnφk, ui〉 (5.59)

Mi, Mn and φk are all L2 functions. As seen in 5.1.2 every L2 function on a bounded domain is alsoan L1 function. Therefore, the products Miφk and Mnφk must be in L2 (as Cauchy-Schwartz inequalityshows). Since we have two inner products with L2 functions and L2 ⊂ Y , we can apply the weak-∗

convergence and obtain

〈Miφk, un〉 − 〈Mnφk, ui〉 = limm→∞

〈Miφk, (un)m〉 − 〈Mnφk, (ui)m〉

= lim infm→∞

〈Miφk, (un)m〉 − 〈Mnφk, (ui)m〉

= lim infm→∞

〈φk, (un)mMi − (ui)mMn〉. (5.60)

The same procedure as for the other cases, taking the supremum after the lim inf and let k →∞ provesthe weak-∗ lower semi-continuity of

∫Ω

(unMi − uiMn)2dx.We have shown that all parts of our energy functional are weak-∗ lower semi-continuous. Since the sum

of weak-∗ lower semi-continuous functions again is weak-∗ lower semi-continuous, the second conditionfor the fundamental theorem of optimization follows and nally proves the existence of a minimizer.

5.2 Uniqueness of the Minimizer

To obtain uniqueness of the minimizer, we show strict convexity of our energy functional. A functionalJ : Ω→ R ∪ ∞ is called strictly convex if and only if

J(αu+ (1− α)v) ≤ αJ(u) + (1− α)J(v) ∀u, v ∈ Ω, ∀α ∈ [0, 1], (5.61)

and the equality only holds for u = v or α ∈ 0, 1. Once we have shown strict convexity, we get theuniqueness by the following theorem.

Theorem 5.2.1. Let J : Ω→ R∪∞ be strictly convex, than there is at most one minimum of J whichis a global one.

Proof. 1. Any minimum is a global minimum.Let u be a local minimum. Then there is an ε > 0 such that J(u) < J(z) ∀z ∈ Bε(u). If we assume thatthere is a v ∈ Ω, v 6= u such that J(v) < J(u) then

J(αv + (1− α)u) < αJ(v) + (1− α)J(u) < J(u) ∀α ∈ (0, 1). (5.62)

For α small enough αv + (1− α)u is in Bε(u), which then contradicts J(u) < J(z) ∀z ∈ Bε(u).2. There is at most one global minimum.Let u 6= v be two global minima of J, J(u) = J(v) =: c. Then

J(αu+ (1− α)v) < αJ(u) + (1− α)J(v) = c, (5.63)

which is a contradiction to the assumption that u and v were global minima.

To avoid long formulas we examine each term of our energy functional separately. It is easy to seethat for strict convexity it is sucient to show that one of the terms is strictly convex and the others areconvex.

|un|BV +∫Ωdiv(θ)undx is Convex

The second part,∫

Ωdiv(θ)undx, is linear in un such that the convexity of this part is trivial. For the

total variation part we get

|αu+ (1− α)v|BV = supφ∈C∞0 (Ω,Rd), ‖φ‖∞≤1

∫Ω

αu+ (1− α)v ∇ • φ dx

≤ α supφ∈C∞0 (Ω,Rd), ‖φ‖∞≤1

∫Ω

u ∇ • φ dx

+(1− α) supφ∈C∞0 (Ω,Rd), ‖φ‖∞≤1

∫Ω

v ∇ • φ dx

= α|u|BV + (1− α)|v|BV . (5.64)

5.3. Optimality Conditions 49

∫Ω

(un − Zn)2dx is Strictly Convex

Let u 6= v and α ∈ (0, 1). Then∫Ω

(αu+ (1− α)v − Zn)2dx =∫

Ω

(α(u− Zn) + (1− α)(v − Zn))2dx. (5.65)

We dene u = (u− Zn) and v = (v − Zn) for which of course u 6= v still holds. We obtain∫Ω

(αu+ (1− α)v)2dx =∫

Ω

α2u2 + 2α(1− α)uv + (1− α)2v2dx (5.66)

and since u 6= v, we know 0 <∫

Ω(u− v)2dx =

∫Ω

(u2 − 2uv + v2)dx, which implies that∫Ω

(2uv)dx <∫

Ω

(u2 + v2)dx. (5.67)

We can use this in the above inequality since the factor α(1− α) > 0 ∀α ∈ (0, 1). We get∫Ω

α2u2 + 2α(1− α)uv + (1− α)2v2dx <

∫Ω

α2u2 + α(1− α)(u2 + v2) + (1− α)2v2dx

=∫

Ω

αu2 + α(1− α)v2 + (1− 2α+ α2)v2dx

=∫

Ω

αu2 + (1− α)v2dx

= α

∫Ω

u2dx+ (1− α)∫

Ω

v2dx, (5.68)

which proves that the delity term of our energy functional is strictly convex.∫Ω

∑i 6=n(unMi − uiMn)2dx is Convex

We examine that ∫Ω

∑i 6=n

([αun + (1− α)vn]Mi − [αui + (1− α)vi]Mn)2dx

=∫

Ω

∑i 6=n

(α (unMi − uiMn)︸︷︷︸

:=t1

+(1− α) (vnMi − viMn)︸︷︷︸t2

)2

dx

≤ α

∫Ω

∑i 6=n

t21dx+ (1− α)∫

Ω

∑i 6=n

t22dx, (5.69)

where the last step is justied by the previous calculation for the convexity of the delity term. Sinceuj 6= vj ∀j does not necessarily imply that t1 = (unMi − uiMn) 6= (vnMi − viMn) = t2, we do not getstrict convexity in this case.

Nevertheless, we have shown that all terms of our energy functional are convex and moreso the delityterm is strictly convex. By Theorem 5.2.1 we can therefore conclude that for a positive delity term weightν > 0 the minimizer is unique.

We have shown the existence and uniqueness of a minimizer in this section. In the next section wewould like to derive an optimality condition for the minimizer of our energy functional that allows us tonumerically compute the solution of the minimization problem.

5.3 Optimality Conditions

The type of optimality condition that is useful or available for a certain problem depends on the regularityof the functional we are trying to minimize. Analog to the dierentiability for functions in Rn, we wantto dene certain limits for general functions on Banach spaces. Let us rst dene Gateaux and Frechet-dierentiability.


Denition 5.3.1. Let J : X → R∪∞ be a functional on a Banach space X. We say J is dierentiablein the direction of v, if

dJ(u; v) := limt0

J(u+ tv)− J(u)t

(5.70)

exists and is nite. If for a given u, J is dierentiable in all directions, the set of all these directionalderivatives dJ(u; ·) is called Gateaux-derivative in u.

If for a certain u ∈ X J is Gateaux dierentiable and there is a linear functional J ′(u) ∈ X∗ suchthat

J ′(u)v = dJ(u; v) ∀v ∈ X and (5.71)

‖J(u+ v)− J(u)− J ′(u)v‖‖v‖

→ 0 for ‖v‖ → 0 (5.72)

then J is called Frechet-dierentiable in u and J ′(u) is the Frechet-derivative.

For a Gateaux dierentiable functional J we can show that a minimum u satises

dJ(u, v) ≥ 0 ∀v ∈ X. (5.73)

If the functional J is even Frechet dierentiable we know that dJ(u, v) = dJ(u,−v) which leads tothe optimality condition J ′(u) = 0 which is similar to the rst optimality condition we know from realanalysis.

In our case, J is not Frechet dierentiable, since the total variation term |u|BV is not Frechet dier-entiable. To nd an optimality condition for non-dierentiable functionals we dene the subdierentialof J : X → R ∪ ∞ on a Banach space X with its dual X∗ to be

∂J(u) := p ∈ X∗|J(u) + 〈p, v − u〉 ≤ J(v), ∀v ∈ X. (5.74)

The minimizer can then be characterized by the following theorem from [ABM05].

Theorem 5.3.2. An element u ∈ Ω is a minimum of a convex functional J : Ω→ R ∪ ∞ if and onlyif 0 ∈ ∂J(u).

5.3.1 The Subdierential ∂J(u)

One can show that the subdierential of our energy functional is the sum of the rst derivatives of thedierentiable terms and the subdierential of the total variation similar to [ET99].

∂J(u) =

2µN∑

j=1,j 6=n

(un· ↑Mj − uj · ↑Mn) ↑Mj + 2ν(un − Zn) + ηdiv(θ) + p

(5.75)

for p ∈ ∂|un|BV . The subdierential of |un|BV can be written as

∂|u|BV = p ∈ BV ∗(Ω) | ‖p‖ = 1, 〈p, u〉 = |u|BV (5.76)

and one can conclude that ∃g ∈ L∞(Ω; Rd), ‖g‖L∞ ≤ 1 with

2µN∑

j=1,j 6=n

(un· ↑Mj − uj · ↑Mn) ↑Mj + 2ν(un − Zn) + ηdiv(θ) +∇ • g = 0, (5.77)

and 〈∇ • g, u〉 = |u|BV . This is a rather complicated characterization which is not necessarily useful forthe actual calculation as we know nothing about g, especially the relation between u and g.

5.3. Optimality Conditions 51

5.3.2 The Euler-Langrange Equation of the Regularized Functional

The problems of the optimality condition arise from the non-dierentiability of the total variation term. Indomains where the gradient of the function u exists and is not zero, |∇u| 6= 0, |u|BV becomes dierentiableand the subgradient (which then equals the Frechet derivative) is

p(x) = −div(∇u(x)|∇u(x)|

). (5.78)

To create numerical methods one can additionally regularize the total variation term such that we can ap-ply the above formula for |∇u| 6= 0. The fact that u has to be dierentiable is automatically solved by thefact that in practice u is not a function but only the discretization of a BV function represented by a dis-crete matrix. Since such a discrete matrix could even be the interpolation of a C∞ function, the dierentia-bility should not be a problem. For the non-zero assumption we dene |∇u(x)|ε :=

√(∂xu)2 + (∂xu)2 + ε2

for a small ε which makes |∇u(x)|ε positive but still a good approximation for the gradient of u. If wereplace our original energy term

∫Ω|∇u|dx by the regularized term

∫Ω|∇u|εdx we can apply 5.78 to 5.77

to get the so-called Euler-Lagrange equation

0 = 2µN∑

j=1,j 6=q

(uq· ↑Mj − uj · ↑Mq) ↑Mj

+2ν(uq − Zq)

+ηdiv(θ)− γdiv(∇uq|∇uq|ε

)(5.79)

for all bands q as an optimality constraint.

So far we have just considered the alternate energy in the spatial domain. To complete the theoreticalpart of this thesis we would like to derive a condition similar to 5.79 for the optimality of the originalenergy that includes the term in the wavelet domain. For the total variation term we again use theadditional regularization to obtain an easy formula for the derivative. The spectral matching term Es(4.21) stays the same and the matching term away from edges Ec (4.20) has a similar derivative to thematching term of the alternate energy (4.24) only on the domain Ω\Γ.

The only really new term we have to worry about is the part in the wavelet domain:

Ew =∑n

c0(aiL[n]− αiL[n])2φ2j,n +

∑n

L∑j=1

3∑k=1

cj(dk,j[n]− βik,j[n])2ψkj,n.

It would be desirable to be able to express the optimality condition of this matching term in the spatialdomain or the other terms in the wavelet domain to avoid iterative calculations of wavelet decompositionsand reconstructions. Unfortunately, this is not possible: it is shown in [CSZ06a] in Theorem 3.1 thatThe space of BV(R2) cannot be characterized by size properties on wavelet coecients, such that thetwo models of total variation and wavelet representation cannot be used in the same space. Nevertheless,it is possible to derive an optimality condition. Following the derivation of the Euler-Lagrange equationin [CSZ06a] we get

δEwδuq

= 2∑n

c0(aqL[n]− αqL[n])φ2j,n

+ 2∑n

L∑j

3∑k=1

cj(dk,j[n]− βqk,j[n])ψkj,n. (5.80)

This is straight forward in the way that including a wavelet term does not aect the other parts of theenergy. It equals the derivative with respect to the wavelet coecients in the wavelet domain. The


complete optimality condition for the original energy can then be written as

0 = 2µN∑

j=1,j 6=q


+2νχΩ\Γ(uq −Mq) + ηdiv(θ)− γdiv( ∇uq|∇uq|ε

)+2∑n

c0(aqL[n]− αqL[n])φ2j,n

+2∑n

L∑j

3∑k=1

cj(dk,j[n]− βqk,j[n])ψkj,n (5.81)

for all bands q, where χΩ\Γ is the characteristic function of Ω \ Γ.

5.4 The Condition div(θ) ∈ L2

The assumption that div(θ) is an element of L2 is a smoothness condition for P , since it imposes somethingabout the derivative, or in the more general form of the energy with Bregman distances, something aboutthe subgradient ∂|P |BV of the panchromatic image. Generally, the subgradient with respect to the BVsemi-norm would be an element of BV ∗, which means it would just be a distribution and not necessarilya function. Many authors have pointed out that a condition p ∈ L2 for a p in the subdierential of thetotal variation (a so called source condition) yields the square integrability of the mean curvature of theisocontours and discontinuity set. This is obvious for p = div(θ) = ∇ · ∇P|∇P | where we immediately have

the curvature, but it can also be shown for more discontinuous P where the expression ∇ · ∇P|∇P | is notnecessarily dened anymore. See [BRH07] for details.

The square integrability of the mean curvature is indeed a restriction of the possible images. Thecharacteristic function of a square for instance would not be a valid panchromatic image, because itscurvature is not an L2 function. The four edges of the square have innite curvature. In practice weavoid this problem with the additional regularization of choosing θ = ∇P√

P 2x+P 2

y+ε2for an ε > 0. Since we

match the geometry of our fused image to the one of the regularized panchromatic image we can expectthe result to be smooth as well. The eect of the ε-regularization will be investigated in the numericalresults Section 8.3.2.

It was shown in [BO04] that the set of images satisfying a source condition like p ∈ L2 for a p ∈ ∂|P |BVis equal to the set f |∃g with f = argminf

λ2 ‖f − g‖

2 + |f |BV . This could motivate a dierent choice of

the subgradient p ∈ ∂|P |BV . Instead of using p = div

(∇P√

P 2x+P 2

y+ε2

)one could perform BV -denoising

on P such that the optimality condition leads to p = λ(P − P ), where P denotes the minimizer of theBV -functional. This way the choice of λ rather than ε would determine the amount of details of geometryused for the sharping. Since P has to satisfy a source condition anyways, this selection of the subgradientmight be more natural and could be a topic for further research.

In the next chapter we will derive two numerical schemes to solve the above equation by gradientdescent. To avoid the additional regularization of the total variation term we further present a morerecent method using splitting and Bregman distances to minimize the energy functional without additionalregularization.

6. NUMERICAL IMPLEMENTATION

In this chapter we will develop three numerical methods to determine the minimizer of our energy func-tionals. For the sake of simplicity and clarity we only present the numerical methods on the alternateenergy. Each of the presented methods can easily be extended to the original energy by including thewavelet matching term in an alternating minimization. The minimization of the wavelet matching termby itself is easy since the optimization problem can be solved exactly. One can simply alternate betweentaking an implicit time step towards the matching wavelet coecient and minimizing the remainingspatial terms with any of the three methods described below.

6.1 Gradient Descent Methods

To develop numerical schemes for nding a minimizer we start with the optimality condition (5.79),

0 = 2µN∑

j=1,j 6=q


+2ν(uq − Zq)

+ηdiv(θ)− γdiv( ∇uq|∇uq|ε

). (6.1)

This needs to be solved for each band uq. To do so, we apply a so-called gradient descent or method ofsteepest descent. The idea of this method is to introduce an articial time variable and move in directionof the steepest descent of the energy functional which is given by its negative derivative. Since we haveproved that our energy functional is strictly convex we can iterate and reduce our energy functionalevery step until we reached the minimum. This is illustrated in Figure 6.1. The black ovals representthe isocontours of the energy, the red arrows symbolize the steps one would take minimizing the energywith a gradient descent method. We always move perpendicular to the contour lines until we reach the

Fig. 6.1: Idea of the method of steepest descent

minimum (or at least approximate the minimum really well). Notice that the direction we move in isnot always optimal. Especially for the rst steps the red arrows do not point towards the minimum.Mathematically speaking, for our problem, we solve the following partial dierential equation (PDE) to

54 6. Numerical Implementation

steady state:

− d

dtuq = 2µ

N∑j=1,j 6=q


+2ν(uq − Zq)

+ηdiv(θ)− γdiv( ∇uq|∇uq|ε

). (6.2)

Solving this PDE gives rise to the question of discretization. We will see that the time discretization willbe especially important in terms of stability and time step restrictions. In the next subsection we presenta simple numerical method by explicit time stepping.

6.1.1 Explicit Time Stepping

We discretize in time rst using

d

dtuq ≈

(uq)k+1 − (uq)k

δt(6.3)

in an explicit way.

− (uq)k+1 − (uq)k

δt= 2µ

N∑j=1,j 6=q

(ukq · ↑Mj − ukj · ↑Mq) ↑Mj

+2ν(ukq − Zq)

+ηdiv(θ)− γdiv

(∇ukq|∇ukq |ε

). (6.4)

Solving for uk+1q gives us

uk+1q = ukq − 2δtµ

N∑j=1,j 6=q


−2δtν(ukq − Zq)

−δtηdiv(θ) + δtγdiv

(∇ukq|∇ukq |ε

). (6.5)

Updating uq does not require the solution of an equation. It is explicitly given by the above formula whichmakes it easy to implement and each iteration of this method will computationally be rather cheap. Thedrawback is that an explicit method does not have very good convergence properties and puts restrictionson the time step δt we take. To specify the restrictions and show, that for a small enough time step ourmethod is stable, we make use of a theorem from [Bur07].

Theorem 6.1.1. Let the functions Gi,j : RN×N → R be continuously dierentiable for all i, j ∈1, ..., N. Further let

∂Gi,j∂ui,j

(u) ≤ 0 ∀i, j = 1, ..., N, u ∈ RNxN , (6.6)∑(l,m)6=(i,j)

∂Gi,j∂ul,m

(u) ≥ 0 ∀u ∈ RNxN . (6.7)

Then the explicit time stepping scheme

uk+1i,j = uki,j + τGi,j(uk) (6.8)

is monotone and therefore stable in the supremum norm if the time step τ meets

τ maxi,j

supv|∂Gi,j∂ui,j

(v)| ≤ 1. (6.9)

6.1. Gradient Descent Methods 55

Proof. Let u, v ∈ RNxN such that uki,j ≤ vki,j for all i, j. We want to show that for this case in the next

iteration this equation still holds: uk+1i,j ≤ vk+1

i,j for all i, j, which means the method is monotone. Firstnotice that because of the dierentiability of G the fundamental theorem of calculus and a transform ofvariable tells us that

(Gi,j(uk)−Gi,j(vk)) =∫ 1

0

∇Gi,j((1− σ)vk + σuk

)• (uk − vk) dσ. (6.10)

We can use this formula and write

uk+1i,j − v

k+1i,j = uki,j − vki,j + τ

(Gi,j(uk)−Gi,j(vk)

)= uki,j − vki,j +

∫ 1

0

τ∇Gi,j((1− σ)vk + σuk

)• (uk − vk) dσ

=∫ 1

0

[uki,j − vki,j + τ∇Gi,j((1− σ)vk + σuk

)• (uk − vk)] dσ

=∫ 1

0

[uki,j − vki,j + τ

∑l,m

Gi,jul,m

((1− σ)vk + σuk

)· (ukl,m − vkl,m)

]dσ

=∫ 1

0

[(1 + τ

Gi,jui,j

((1− σ)vk + σuk)

)· (uki,j − vki,j)

+τ∑

(l,m)6=(i,j)

Gi,jul,m

((1− σ)vk + σuk

)· (ukl,m − vkl,m)

]dσ. (6.11)

Condition (6.9) then ensures that

(1 + τ

Gi,jui,j

((1− σ)vk + σuk

))≥ 0. (6.12)

We know that uki,j ≤ vki,j such that (uki,j−vki,j) ≤ 0, which means that the entire rst term is non-positive.

For the second term we know∑

(l,m)6=(i,j)Gi,jul,m

((1− σ)vk + σuk) ≥ 0 by condition (6.7) and by applying

the relation (uki,j − vki,j) ≤ 0 again we get that the second term is less than or equal to zero. We thereforehave shown

uk+1i,j − v

k+1i,j ≤ 0, (6.13)

which proves that the method is monotone.

According to Equation (6.5) we can dene the G in our case to be

G = −2µN∑

j=1,j 6=q


−2ν(ukq − Zq)

−ηdiv(θ) + γdiv

(∇ukq|∇ukq |ε

). (6.14)


Or rewritten

G =

−2ν − 2µN∑

j=1,j 6=q

(↑Mj)2

· ukq︸︷︷︸l(ukq )

+ γdiv

(∇ukq|∇ukq |ε

)︸︷︷︸

d(ukq )

+ 2µN∑

j=1,j 6=q

(ukj · ↑Mj) ↑Mq − ηdiv(θ)︸︷︷︸C

. (6.15)

For the sake of clarity and simplicity we will leave out the indices in the following calculations, i.e.denote u = ukq . Notice that the last term C is negligible for the stability since it is a constant with respectto u. The rst term is easy to treat since it is linear in u.

∂li,j∂ui,j

(u) = −2ν − 2µN∑

j=1,j 6=q

(↑Mj)2 ≤ 0. (6.16)

∂li,j∂ul,m

(u) = 0 ≥ 0 ∀(l,m) 6= (i, j). (6.17)

The only complicated term we have to worry about is the part d(u) coming from the total variation. So

far, we have just written d(u) = γdiv(∇u|∇u|ε

), but the actual discrete d (we need here) depends on the

discretization of the derivatives. In terms of stability central dierences might not be the best choice.However, it makes the expression symmetric in both coordinates opposed to using forward dierences forthe rst and backward dierences for the second derivative, which is the reason why we will analyze thestability by taking central dierences for both derivatives. The discrete di,j then becomes

di,j(u) = γDc

(Dcu

|Dcu|ε

)

=γ

2Dc

(ui+1,j − ui−1,j)√14 (ui+1,j − ui−1,j)2 + 1

4 (ui,j+1 − ui,j−1)2 + ε2

+γ

2Dc

(ui,j+1 − ui,j−1)√14 (ui+1,j − ui−1,j)2 + 1

4 (ui,j+1 − ui,j−1)2 + ε2

. (6.18)

To keep the following expressions as short as possible we denote

Si,j :=

√14

(ui+1,j − ui−1,j)2 +14

(ui,j+1 − ui,j−1)2 + ε2. (6.19)

Applying the second central dierences we get

di,j(u) =γ

4

[(ui+2,j − ui,j)

Si+1,j− (ui,j − ui−2,j)

Si−1,j

]+γ

4

[(ui,j+2 − ui,j)

Si,j+1− (ui,j − ui,j−2)

Si,j−1

]. (6.20)

Now we have to calculate the derivatives of this term to see if our necessary conditions for the above


theorem are met. It is easy to verify that the derivative of di,j with respect to ui,j is given by

∂di,j∂ui,j

(u) =γ

4

[(ui+2,j − ui,j)2

4S3i+1,j

− 1Si+1,j

+(ui,j − ui−2,j)2

4S3i−1,j

− 1Si−1,j

+(ui,j+2 − ui,j)2

4S3i,j+1

− 1Si,j+1

+(ui,j − ui,j−2)2

4S3i,j−1

− 1Si,j−1

]. (6.21)

To prove that this derivative is indeed smaller than or equal to zero, we look at each term separately. Weneed that

(ui+2,j − ui,j)2

4S3i+1,j

− 1Si+1,j

≤ 0. (6.22)

Since Si+1,j > 0 it is sucient to show that

(ui+2,j − ui,j)2 − 4S2i+1,j ≤ 0, (6.23)

which is easily done by using the denition of Si+1,j .

(ui+2,j − ui,j)2 − 4S2i+1,j = (ui+2,j − ui,j)2 − 4

[14

(ui+2,j − ui,j) +14

(ui+1,j+1 − ui+1,j−1) + ε2]

= −(ui+1,j+1 − ui+1,j−1)2 − ε2

≤ 0. (6.24)

The same equation for the other three terms in (6.21) follows similarly.Now we have to look at all derivatives of di,j with respect to ul,m for (l,m) 6= (i, j). First of all this

derivative is zero if l /∈ i− 2, i− 1, i, i+ 1, i+ 2 or m /∈ j − 2, j − 1, j, j + 1, j + 2. The cases m = jare simple:

∂di,j∂ui+2,j

=γ

4

[1

Si+1,j− (ui+2,j − ui,j)2

4S3i+1,j

]≥ 0, (6.25)

∂di,j∂ui+1,j

=γ

4

[(ui,j − ui,j−2)2

4S3i,j−1

]≥ 0, (6.26)

∂di,j∂ui−1,j

= 0, (6.27)

∂di,j∂ui−2,j

= 0, (6.28)

where the rst inequality follows by what we have shown above. The cases l = i follow by symmetry.Left to show is that

∂di,j∂ui+1,j+1

+∂di,j

∂ui+1,j−1+

∂di,j∂ui−1,j+1

+∂di,j

∂ui−1,j−1≥ 0. (6.29)

By the symmetric structure of di,j (or by a long computation) we can see that the remaining four termscancel out

∂di,j∂ui+1,j+1

+∂di,j

∂ui+1,j−1+

∂di,j∂ui−1,j+1

+∂di,j

∂ui−1,j−1= 0, (6.30)

which then all together proves that the second necessary condition for Theorem 6.1.1 is met,∑(l,m) 6=(i,j)

∂Gi,j∂ul,m

(u) ≥ 0 ∀u ∈ RN×N . (6.31)


We can now apply the above theorem and conclude that the explicit method is stable in the maximumnorm if

τ maxi,j

supV

∣∣∣∣∂Gi,j∂ui,j(u)∣∣∣∣ ≤ 1. (6.32)

To get an estimate for this worst case scenario we make use of the fact, that our data is normalized suchthat each pixel value is in the interval [0, 1]. The biggest dierence between two pixel values therefore isone: (ui,j − ul,m)2 ≤ 1. Looking for the supremum of all derivatives we will have to look at terms of theform

(ui+2,j − ui,j)2

4S3i+1,j

− 1Si+1,j

. (6.33)

We have shown above (Inequality 6.22) that this expression is less than or equal to zero. The rst termis positive and the second term negative. The biggest absolute value is therefore achieved when the rstterm is zero and the second term of maximal value. The smallest value we can get from one of the squareroots is reached when the pixel dierence is zero, i.e. Si,j ≥ ε. This allows us to state

∣∣∣∣∣ (ui+2,j − ui,j)2

4S3i+1,j

− 1Si+1,j

∣∣∣∣∣ ≤∣∣∣∣ 1Si+1,j

∣∣∣∣ ≤ 1ε. (6.34)

Using this inequality we get

∣∣∣∣∂Gi,j∂ui,j(u)∣∣∣∣ =

∣∣∣∣− 2ν − 2µ(N∑

p=1,p6=q

(↑Mp)2)i,j

+γ

4

[ (ui+2,j − ui,j)2

4S3i+1,j

− 1Si+1,j

+(ui,j − ui−2,j)2

4S3i−1,j

− 1Si−1,j

+(ui,j+2 − ui,j)2

4S3i,j+1

− 1Si,j+1

+(ui,j − ui,j−2)2

4S3i,j−1

− 1Si,j−1

]∣∣∣∣≤ 2ν + 2µ

N∑p=1,p6=q

(↑Mp)2

i,j

+γ

ε(6.35)

and using that the largest possible value in the multispectral image is also 1 we can nally derive a timestep restriction of

τ ≤ 12ν + 2µN + γ

ε

. (6.36)

The factor that mainly restricts the time step therefore is the additional regularization parameter ε. Thetime step restriction behaves like 1/ε which shows that the smaller ε gets the closer will the additionallyregularized TV term be to the original actual TV term, but also the smaller our time step has to beand the slower the minimization process will be. This tradeo will be examined more closely in Chapter8.3.2. For high dimensional imagery (N large) the band coupling term 2µN can also become a factor.


6.1.2 ADI Method

To obtain a faster convergence we would like to improve our method by making it implicit and thereforechoosing a dierent time discretization. We start with the gradient descent Equation 6.2,

− d

dtuq = 2µ

N∑j=1,j 6=q


+2ν(uq − Zq)

+ηdiv(θ)− γdiv(∇uq|∇uq|ε

), (6.37)

but this time we apply an alternating directions minimization (ADI) method similar to the ideas ofDouglas, Peaceman and Rachford in [PHHR55, JP55]. In time we discretize the terms which are linearin u implicitly. It is more dicult to use an implicit discretization for the total variation term. Thedenominator should not be implicit because it would turn the equation we have to solve to get u at thenew time step into a non-linear equation. Hence, we lag the denominator by one step. The numerator isdivided into two parts. We discretize the derivatives in the numerator of the TV term implicitly in thex-direction and explicitly in the y-direction for half of a time step and vice-versa for another half of atime step.

This way we get a linear equation for each row of our image for the implicit x-discretization and foreach column for the implicit y-discretization both with tridiagonal matrices.

1. Half a time step implicit in x- and explicit in y-direction

ut+1q = utq +

δt

2·[Dx(Ct ·Dxu

t+1q ) +Dy(Ct ·Dyu

tq)

−2µN∑

j=1,j 6=q

(ut+1q · ↑Mj − utj · ↑Mq) ↑Mj

−2ν(ut+1q − ↑ Zq)− η · div(θ)

]. (6.38)

2. Half a time step implicit in y- and explicit in x-direction

ut+1q = utq +

δt

2·[Dx(Ct ·Dxu

tq) +Dy(Ct ·Dyu

t+1q )

−2µN∑

j=1,j 6=q

(ut+1q · ↑Mj − utj · ↑Mq) ↑Mj

−2ν(ut+1q − ↑ Zq)− η · div(θ)

]. (6.39)

With the notation Ctq = γ|∇utq|ε

. We bring all terms that include ut+1q on the left and all others on

the right-hand side. Since the steps in the two directions are completely analog we show the followingformulas for the x-direction only.(

1 + 2δt

2ν − δt

2Dx(Ct ·Dxu

t+1q ) + 2

δt

2µ

N∑j=1,j 6=q

(↑Mj)2

)ut+1q

= utq +δt

2·Dy(Ct ·Dyu

tq)

+2δt

2µ

N∑j=1,j 6=q

utj · ↑Mq ↑Mj

+2δt

2ν ↑ Zq −

δt

2η · div(θ). (6.40)


Using backward dierences for the rst Dx and forward dierences for the second Dx we obtain thediscretization

[Dx(Ct ·Dxut+1q )]i,j = Cti,ju

t+1q; i,j+1

−(Cti,j + Cti,j−1)ut+1q; i,j

+Cti,j−1ut+1q; i,j−1 (6.41)

at every pixel. This way our above Equation 6.40 becomes a linear equation for each row of our image.For an n×m image we dene the tridiagonal matrix Ai by

Ai =

bi1 ci1 0 0 ... 0ai2 bi2 ci2 0 ... 0. . . . . .. . . . . .. . . . . .0 ... 0 aim−1 bim−1 cim−1

0 ... 0 0 aim bim

, (6.42)

with

ai1 = 0, (6.43)

aij = − δt C(i, j − 1),

cim = 0,cij = − δt C(i, j + 1),

bij = 1 + 2δtν + 2µδtN∑

k=1,k 6=q

(Mk(i, j))2 − cij − aij .

Since C(i, j) and all weight parameters are positive, it is obvious that the above matrix is diagonaldominant for any i, which allows us to solve each linear equation very eciently with the Thomasalgorithm ([Cd72]) (also called tridiagonal matrix algorithm (TDMA)) in O(m · n) steps. Given a linearequation Au = f with a tridiagonal, diagonal dominant matrix A and the same notation for A as above(lower secondary diagonal ~a, main diagonal ~b, upper secondary diagonal ~c) the Thomas algorithm is

c1 =c1b1

, f1 =f1

b1

ck =ck

bk − ck−1ak, fk =

fk − fk−1akbk − ck−1ak

Un = fn , Uk = fk − ckUk+1. (6.44)

The fast solution with this algorithm is possible since the rows and columns of the image are not coupled,which is a result of doing just one direction implicitly. The procedure for the step in y-direction is similar.Here we solve a linear equation for each column of the image. Notice that for both steps we assumedNeumann boundary conditions for our image.

The numerical costs for calculating one time step of the ADI scheme are higher than for the explicitmethod, but the time step for the ADI method can be chosen larger and the convergence properties aremuch better. In Section 8 we show that the ADI method is much faster than the explicit time steppingscheme for both the original and alternate energies.

For comparison purposes a CFL bound similar to 6.36 would be desirable. We can get such a boundin a similar fashion to what we did before. As explained above, the equation we solve every time step is1 + 2

δt

2ν − δt

2Dx(Ct ·Dxu

t+1q ) + 2

δt

2µ

N∑j=1,j 6=q

(↑Mj)2

ut+1q

= utq +δt

2·Dy(Ct ·Dyu

tq) + 2

δt

2µ

N∑j=1,j 6=q

utj · ↑Mq ↑Mj + 2δt

2ν ↑ Zq −

δt

2η · div(θ). (6.45)


If we dene

B =

2δt

2ν − δt

2Dx(Ct ·Dx) + 2

δt

2µ

N∑j=1,j 6=q

(↑Mj)2

(6.46)

G = Dy(Ct ·Dyutq) + 2µ

N∑j=1,j 6=q

utj · ↑Mq ↑Mj + 2ν ↑ Zq − η · div(θ) (6.47)

we can rewrite the above equation as

(1 +B)ut+1q = utq +

δt

2G(utq), (6.48)

which has the same form as in the explicit case except for the additional matrix (1+B) on the left handside.

To adapt Theorem 6.1.1 to this case, let us summarize some results from [Bur06].

Theorem 6.1.2. Let A ∈ Rn×n such that Ai,j < 0 for all i 6= j and

0 6= Ai,i ≥ −∑i 6=j

Ai,j . (6.49)

Then A−1 exists. Furthermore let b, u ∈ RN with b ≤ 0 and u such that Au = b, then u ≤ 0. A matrixA with these properties is called monotone matrix (M-matrix).

The name M-matrix comes from the fact, that the inverse of an M-matrix preserves inequalities,f ≥ g ⇒ A−1f ≥ A−1g.

Lemma 6.1.3. The matrix B =(

1 + 2 δt2 ν −δt2 Dx(Ct ·Dx) + 2 δt2 µ

∑Nj=1,j 6=q(↑Mj)2

)is an M-matrix.

Proof. The diagonal terms 1 + 2 δt2 ν and 2 δt2 µ∑Nj=1,j 6=q(↑Mj)2 are greater or equal to zero, such that we

only have to examine − δt2 Dx(Ct ·Dx). Notice that the dierential operator Dx is here, of course, meantin a discrete way. By Equation 6.41 we have

−δt2Dx(Ct ·Dxu

t+1q ) = −Cti,jut+1

q; i,j+1 + (Cti,j + Cti,j−1)ut+1q; i,j − C

ti,j−1u

t+1q; i,j−1. (6.50)

For γ > 0, Ctq = γ|∇utq|ε

> 0, such that the secondary diagonals of B are zero or negative, the main

diagonal is positive and the desired Inequality 6.49 is met. This shows that B indeed is an M-matrix

Therefore, we can use Theorem 6.1.1 with a slight modication in the proof. Analog to the previousproof we show the monotonicity under the exact same assumptions via

vki,j ≥ uki,j → B(uk+1i,j − v

k+1i,j ) ≤ 0 B M-matrix⇒ vk+1

i,j ≥ uk+1i,j (6.51)

and get the same stability condition τ maxi,j supV∣∣∣∂Gi,j∂ui,j

(u)∣∣∣ ≤ 1.

The functionG for the ADI method basically consists of constant terms and the TV part in y-direction.Heuristically speaking we have only half as many derivatives from the TV part as for the fully explicitmethod and the uq depending parts of the matching term and the correlation preserving term are on theleft hand side now and not included in G. A calculation similar to the explicit case then leads to thetime restriction

δt ≤ 4ε

γ. (6.52)

The time restriction still linearly depends on the additional regularization parameter epsilon, but nev-ertheless the condition improved. We got rid of the factors with ν and µ and gained a factor of four inthe term with ε (where we have to admit that it is eectively only a factor of two since the other factorof two comes from `taking half a time step implicitly in x-direction'). Even though the asymptotic for


small ε is similar to the asymptotic of the explicit case, the factor of two does improve the speed sincewe might need only half as many iterations. Also, the bound does not depend on the other parameterany more and in particular is independent of the total number of bands, which might be an advantagefor hyperspectral images.

Nevertheless, the ADI method still has some undesirable drawbacks: The time step is still restrictedand depends on the additional regularization of the total variation ε. The smaller ε is, the closer weget to a non-dierentiable problem and the method becomes inecient. The bigger ε is, the betterthe convergence will be because of the possibility to choose a larger time step, and also because of theadditional smoothness. The problem with choosing a large ε is that the dierence between the energywe original wanted to minimize and the one we actually minimize becomes large, which results in blurryedges. We will examine this problem in more detail in Chapter 8.

To avoid using the additional regularization and get a more ecient method we use the Split Bregmanmethod which was recently proposed by Goldstein and Osher in [GO08].

6.2 Split Bregman Method

Before we start describing the actual method to minimize our energy let us summarize some of the resultsof Osher, Burger, Goldfarb, Xu and Yin on regularization using Bregman distances ([OBG+05]).

6.2.1 Iterative Regularization Using Bregman Distances

In [OBG+05] the authors proposed a new method for image denoising by iterating between the calculationof all normals (of the TV denoised image) and aligning the normals in the current image with the denoisednormals from step one. This idea led to a general iterative regularization using Bregman distances. Fora smooth and strictly convex functional E(u) the Bregman distance is dened as

DpE(u, v) = E(u)− E(v)− 〈p, u− v〉 (6.53)

where p is in the subdierential of E at v. Because of the smoothness this element in the subgradient isunique. Dp

E(u, v) is not a distance in the usual sense, since it is generally not symmetric and does notsatisfy the triangle inequality. Still, it is a measure of similarity between u and v since Dp

E(u, v) ≥ 0 andDpE(u, v) = 0 for u = v. For a non dierentiable E it is not clear how to use the above denition for

arbitrary u and v, because the subdierential might contain more than one element. One could insteaddene a multi valued version of the Bregman distance. In [OBG+05] a unique subgradient is selected bythe numerical minimization algorithm.

Assume we have two convex energy functionals E and H. To solve is

minuE(u) + λH(u). (6.54)

It was suggested in [OBG+05] to solve iteratively

uk+1 = minDpE(u, uk) + λH(u)

= minuE(u)− 〈pk, u− uk〉+ λH(u). (6.55)

Assuming H is dierentiable the subgradient at the new location can be updated by

pk+1 = pk −∇H(uk+1) (6.56)

In [OBG+05] the following convergence results were shown for images in BV.

Theorem 6.2.1. Assume that there exists a minimizer u ∈ BV (Ω) of H such that J(u) <∞. Then

1. H is monotonically decreasing: H(uk+1) ≤ H(uk)

2. The iteration converges to a minimizer of H: H(uk) ≤ H(u) + J(u)k

3. The regularization functional of the iterates stays bounded: J(uk) ≤ 5J(u).

This theorem shows that Bregman iteration is especially well suited for constraint optimization prob-lems, since it converges to the minimizer of H without having to take λ → ∞ as in other minimizationmethods. Following [GO08] we will now show how one can use the above Bregman iteration for creatinga fast and exact minimization algorithm for our energy by splitting the total variation term from the restof the energy functional.

6.2. Split Bregman Method 63

6.2.2 Derivation of the Split Bregman Method for Minimizing the VWP Energy

The starting point for the Split Bregman method has to occur before the Euler-Lagrange equation, sincethis is the point we needed the additional regularization. We look at our original minimization problemfor the nth band:

E(un) = ν

∫Ω

(un − Zn)2 dx

+µN∑

j=1,j 6=n

∫Ω

(un· ↑Mj − uj · ↑Mn)2 dx

+[γ

∫Ω

|∇un| dx+ η

∫Ω

div(θ) · un dx]. (6.57)

The key idea is to split the total variation term from the other terms such that the non-dierentiable L1

part becomes decoupled from the rest. To shorten the equations in the derivation of the Split Bregmanformulation we denote the energy functional without the total variation term by R(un).

R(un) = ν

∫Ω

(un − Zn)2 dx

+µN∑

j=1,j 6=n

∫Ω

(un· ↑Mj − uj · ↑Mn)2 dx

+η∫

Ω

div(θ) · un dx. (6.58)

Instead of minimizing 6.57 we introduce a new variable dn = ∇un and look at the constraint optimizationproblem

minun,dn

E(un, dn) = minun,dn

γ|dn|L1 +R(un) such that dn = ∇un, (6.59)

which of course is equivalent to minimizing 6.57. Now we have a constrained optimization problem,which can be solved with Bregman iteration as described in the previous section. We convert 6.59 intoan unconstrained problem, by adding a delity term that enforces d to be close to ∇u.

minun,dn

γ|dn|L1 +R(un) +λ

2‖dn −∇un‖22 = min

un,dnE(un, dn) +H(un, dn) (6.60)

with H(un, dn) = λ2 ||dn−∇un||

22. Now we are exactly at the point where we can use Bregman iteration to

solve the above equation. Theorem 6.2.1 then tells us that for any choice of λ the iteration will convergeto a minimizer of H, i.e. dn = ∇un. The problem is now dependent on the two variables u and p so wealso need the Bregman formulation for both variables,

(uk+1n , dk+1

n ) = argminun,dn

DpE(un, ukn, dn, d

kn) +

λ

2‖dn −∇un‖22

= argminun,dn

E(un, dn)− 〈pku, un − ukn〉 − 〈pkd, dn − dkn〉+λ

2‖dn −∇un‖22,

pk+1u = pku − λ∇T (∇uk+1

n − dk+1n ),

pk+1d = pkd − λ(dk+1

n −∇uk+1n ). (6.61)

It is shown in [OBG+05] that this rather complicated formulation can be carried out in a simple two-stepprocedure without having to use the subgradients pu and pd. The Scheme (6.61) is equivalent to

(uk+1n , dk+1

n ) = argminun,dn

γ|√d2x + d2

y|L1 +H(u) +λ

2‖dn −∇un − bkn‖22,

bk+1n = bkn + (∇uk+1

n − dk+1n ). (6.62)

In [OBG+05] Bregman iteration was used to do TV denoising. The above two-step formulation is theequivalent to the `adding back the noise to the signal' algorithm.


The minimization for the two variables uk+1n and dk+1

n is done by alternating minimization. Tooptimize with respect to u for a given d we look at the optimality condition, i.e. that the rst variationof the energy with respect to u has to be zero. Since we split the non-dierentiable L1 part from the restof the energy we can compute the minimizer uk+1

n for a given dkn and get

0 = 2ν(un − Zn) + ηdiv(θ)

+2µN∑

j=1,j 6=q


−λdiv(dkn − bkn −∇un), (6.63)

which can be rewritten as(2ν + 2µ

N∑j=1,j 6=q

(↑ Hj)2 − λ∆)uk+1q

= 2νZq + ηdiv(θ) + 2µ ↑ Hq

( N∑j=1,j 6=q

uk∗j ↑ Hj

)− λdiv(dkq − bkq ). (6.64)

Notice that we decoupled the bands by using uk∗j = uk+1j for j < q and uk∗j = ukj for j > q on the right

hand side of the equation. After discretizing the above PDE we have to solve a linear equation. This isdone by a standard Gauss-Seidel algorithm as follows: The Laplace operator is discretized by

∆uk+1q ≈ uk+1

q (i, j − 1) + uk+1q (i, j + 1) + uk+1

q (i− 1, j) + uk+1q (i+ 1, j)− 4 · uk+1

q (i, j). (6.65)

Then we iterate Gauss-Seidel by

for i = 2 : I − 1for j = 2 : J − 1

uk+1q (i, j) =

A(i, j) + λ(uk+1q (i, j − 1) + uk+1

q (i, j + 1) + uk+1q (i− 1, j) + uk+1

q (i+ 1, j))

D(i, j)(6.66)

with

A = 2νZq + ηdiv(θ) + 2µ ↑ Hq(N∑

j=1,j 6=q

uk∗j · ↑ Hj)− λdiv(dkq − bkq ) and (6.67)

D = (2ν + 2µN∑

j=1,j 6=q

(↑ Hj)2 + 4λ). (6.68)

For the boundary conditions i ∈ 1, I or j ∈ 1, J the equations need to be adapted to include theNeumann boundary conditions.

As explored in [GO08] it is most ecient to use just one Gauss-Seidel iteration and also just one op-timization for d. More iterations might have to be used for high precision applications. The optimizationwe have to solve for d is the following (for the sake of simplicity we omit the index n):

dopt = arg mindγ∣∣∣√d2

x + d2y

∣∣∣L1

+λ

2‖d−∇u− bk‖22. (6.69)

The solution to this optimization problem can be given directly by the generalized shrinkage formula.

Lemma 6.2.2. The solution to the optimization problem 6.69 is given by

(dk+1q )x = max(sk − 1

λ, 0)∇xuk+1

q + (bkq )xsk

(6.70)

(dk+1q )y = max(sk − 1

λ, 0)∇yuk+1

q + (bkq )ysk

(6.71)

with sk =√|∇xuk+1

q + (bkq )x|2 + |∇yuk+1q + (bkq )y|2.

6.2. Split Bregman Method 65

Proof. The above minimization 6.69 decouples and after the discretization the above expression becomes

diopt = arg mindi

ψ(di) = arg mindi

γ√

(dix)2 + (diy)2 +λ

2(dix − cix)2 + (diy − ciy)2 (6.72)

with c = ∇u+ bk. We distinguish between two cases:

1. |~ci| =√

(cix)2 + (ciy)2 is smaller or equal to γλ .

ψ(di) = γ√

(dix)2 + (diy)2 +λ

2

[(dix)2 − 2dixc

ix + (cix)2 + (diy)2 − 2diyc

iy + (ciy)2

]= γ|~di|+ λ

2

[|~di|2 + |~ci|2 − 2〈~di, ~ci〉

]Cauchy−Schwarz

≥ λ

2|~di|2 +

λ

2|~ci|2 + |~di| − λ|~di| · |~ci|

≥ λ

2|~ci|2 + (γ − λ|~ci|)︸︷︷︸

≥0

|~di|

≥ λ

2|~ci|2 = ψ(0), (6.73)

which shows that in this case zero is a minimizer.

2. |~c| =√c2x + c2y is greater than γ

λ .

We look at the derivative of ψ(di) for di 6= 0 and set it to zero to nd a minimizer. This gives us

Dxψ(di) = γdix

|~di|+ λ(dix − cix) != 0 and (6.74)

Dyψ(di) = γdiy

|~di|+ λ(diy − ciy) != 0, (6.75)

which leads to

dix = cix −γ

λ

dix

|~di|and (6.76)

diy = ciy −γ

λ

diy

|~di|. (6.77)

If we look at the minimization problem

argmindi

γ√

(dix)2 + (diy)2 +λ

2(dix − cix)2 + (diy − ciy)2 (6.78)

we can see that any minimizer will certainly fulll sign(dix) = sign(cix) and sign(diy) = sign(ciy). We

can therefore replacedix|~di|

= cix|~ci| and

diy

|~di|= ciy|~ci| in the above equation and get that

dix = cix −γ

λ

cix|~ci|

and (6.79)

diy = ciy −γ

λ

ciy|~ci|

(6.80)

meet the necessary condition ∇ψ(di) = 0 and since our energy is strictly convex this is the unique,global minimizer. Notice that the above equation is only true for di 6= 0, which is given by the fact

that we are looking at the case |~c| > γλ . This means dix =

(1− γ

λ

1|~ci|

)︸︷︷︸

>0

cix, which yields that dix 6= 0

if cix 6= 0 and similarly diy 6= 0 if ciy 6= 0. Since |~c| > γλ > 0 at least one of the components of c must

be non-zero.


Putting the two cases together gives the desired formula.

Our nal algorithm can than be written like this:While(stopping condition), for all bands n,

1. calculate uk+1n by one step of Gauss-Seidel for Equation (6.64),

2. calculate dk+1n by shrinkage (6.71),

3. update bk+1 = b+ (∇uk+1 − dk+1).

The choice of the stopping condition is discussed in the next section. The comparison of the threealgorithms we presented will be given in Section 8.3.

In addition to the results in [GO08], we want to give the convergence speed of this method. Ourobservation holds not only for the application of Split Bregman to anisotropic TV regularization, butSplit Bregman methods in general, which is why we examine problems of the form

minu|φ(u)|L1 +H(u, f), (6.81)

with a quadratic H. As seen above the idea of Split Bregman is to introduce a new variable d = φ(u)and solve the resulting constraint problem

minu,d

J(u, d) = minu,d|d|L1 +H(u, f) such that d = φ(u) (6.82)

with Bregman iteration. To be able to give a convergence rate, we need the denition of a source condition.

Denition 6.2.3 (Source condition). We denote

L(u, d, b) = J(u, d)− 〈b, d− φ(u)〉. (6.83)

We say that (u, d) satises a source condition, if there exists a Lagrange multiplier b such that

L(u, d, b) ≤ L(u, d, b) ≤ L(u, d, b) ∀u, d, b. (6.84)

A theorem about the convergence speed of Bregman iteration proved in [BRH07] reads as follows forour Split Bregman problem.

Theorem 6.2.4. Let (u, d) be the solution to the problem ∇u = d, such that (u, d) satises a sourcecondition. Then the estimate

Dpuk ,pdk

J

((u, d), (uk, dk)

)≤ ‖b‖

2λk= O

(1k

)(6.85)

holds, where b is the Lagrange multiplier from the source condition.

This means we immediately get a convergence rate for (uk, dk) towards a solution of the originalminimization problem as soon as we have a source condition of the true solution.

Lemma 6.2.5. For the Split Bregman algorithm the true solution u of the minimization problem alwayssatises a source condition.

Proof. The existence of a Lagrange multiplier in our case means that for

L(u, d, b) := |d|L1 +H(u, f)− 〈b, d− φ(u)〉 (6.86)

there exists a b such that

L(u, d, b) ≤ L(u, d, b) ∀u, d. (6.87)

That means that a source condition is satised if and only if the following two conditions hold.

6.3. Stopping Criteria for Minimization Methods 67

1. Optimality condition for d

pd − b = 0, pd ∈ ∂|d|L1 . (6.88)

2. Optimality condition for u

q + φ∗(b) = 0, q ∈ ∂H(u, f). (6.89)

Using Equation (6.88) in (6.89) we obtain

q + φ∗(pd) = 0 with q ∈ ∂H(u, f), pd ∈ ∂|d|L1 . (6.90)

Since (u, d) satises φ(u) = d and u was a solution to the minimization problem

u = arg min |φ(u)|+H(u, f), (6.91)

we know that it satises the optimality condition

q + p = 0, (6.92)

for q ∈ ∂H(u, f) and p ∈ ∂u|φ(u)|, which can be rewritten as p = φ∗(pd) for a pd ∈ ∂d|d|L1 and thereforeshows that Condition 6.90 is always satised.

6.3 Stopping Criteria for Minimization Methods

The question that we want to discuss in this section is: When do we want to consider a minimizationmethod to have converged?

Generally, we suggest two stopping criteria:

1. We can consider the method to have converged if the norm of the dierence of two successive iteratesis below a certain threshold T , i.e.

‖uk+1 − uk‖ ≤ T. (6.93)

The choice of the norm and the threshold is somehow arbitrary. We used the l1 norm and normalizedit by |Ω| which becomes the number of pixels in the discrete case. This way we have:

1I · J ·N

I,J,N∑i,j,n

|uk+1i,j,n − u

ki,j,n| ≤ T. (6.94)

As a threshold we usually use values in the order of magnitude of 10−4. The choice of the l1 normthen tells us that when the stopping criterion is met the average change in value per pixel is below10−4.

The problem with this stopping criterion is that for the gradient descent methods, it is time stepdependent. Reducing the time step by a factor of ten means relaxing the stopping condition by thefactor of ten. This can cause the method to stop before it has converged. Furthermore, the timedependence makes it hard to compare it with the Split Bregman method that does not use such atime step. To overcome this problem we use a second criterion to determine the convergence of ouralgorithms.

2. A more universal and less time dependent criterion is to look at the total energy at each iteration.One could consider the iteration to have reached steady state when the relative change in energybetween two iterates is less than 0.05%,

|E(uk+1)− E(uk)|E(uk)

· 100 ≤ 0.05, (6.95)

or for comparison run each method for many iterations and look at the decay of energy. Theresults of these numerical experiments are shown in Section 8.3. Notice that the calculation of thetotal energy is good for comparison purposes but also computationally expensive, so for the actualimplementation we recommend criterion 1.


6.4 Image Registration

One task that is needed before any of the pan-sharpening algorithms can be carried out was omitted inthe description so far. The panchromatic image and a resized multispectral image need to be aligned ina process called registration. For the Quickbird data, the dierence in resolution between the two imageswas precisely a factor of four. The images were taken from the same satellite such that after selecting asmall example scene from the huge satellite image only translational registration needs to be done. Todo so, we upsample the multispectral image to the size of the panchromatic image (usually using bilinearinterpolation), calculate the edge set of both images and use a simple registration algorithm that shifts thepanchromatic image up to ve pixel in each direction and calculates the L2 norm of the dierence of theedgesets of both images. We choose the position with the lowest L2 norm and just take the intersectionof the panchromatic and the multispectral image for the pan-sharpening process. For the calculation ofthe edge set we did numerical experiments with Canny, Sobel, the edge detection method we use in ourenergy exp

(− d|∇P |2

)and (for consistency with our energy model) the L2 norm of the dierence of all

unit normal vectors to the level lines of both images. All methods gave similar results diering by atmost one pixel in each direction.

7. IMAGE QUALITY METRICS

For the multispectral data we work with, we do not have the true solution, also called ground truth. Aproblem we have to face during the evaluation of dierent methods is to judge which image is better.In this context we somehow need to dene better. A global mathematical variable that measures imagequality would be desirable, but this is not easy to dene and especially model dependent. The questionof how to judge image quality is actually closely related to designing a variational method. If we had onequantity that measures image quality we could dene this quantity as our energy functional and minimizeit. Already, this shows that there is no optimal way of judging image quality.

Generally, we distinguish between spatial and spectral quality. Visual spatial quality is determinedby the amount of details and the sharpness of edges in an image, but it is still hard to measure and toquantify. The spectral quality of an image is even harder to capture. For RGB images this could bedened as how close the colors of the sharpened image are to the colors of the original low resolutionimage. In the case of pan-sharpening we have to compare the colors of the small low resolution image tothe ones of the larger high resolution sharpened image. Furthermore, the human eye seems to be muchmore sensitive to shapes than to color, which shows that in general spectral quality is much harder tosee than spatial quality.

To still be able to quantify the quality of the fusion results we use a broad variety of eight dierentquality metrics which are common in literature for these evaluation purposes. An evaluation of dierentpan-sharpening methods including VWP was done by [SRM08], whose Matlab source code for the qualitymetrics we used in the results section. In the next two subsections we briey introduce the spatial metricas well as seven spectral quality metrics.

7.1 Spectral Quality Metrics

7.1.1 Relative Dimensionless Global Error in Synthesis (ERGAS)

The ratio between the root mean square error and the mean of each band is an important quantity forthe relative dimensionless global error in synthesis. ERGAS was used in [DYKS07] and [AWC+07] forthe evaluation of pan-sharpening methods and is given by

ERGAS = 100h

l

√√√√ 1N

N∑n

(RMSE(un,Mn)

µ(un)

)2

, (7.1)

where h/l is the ratio between the pixel size of P and Mi and µ(un) the mean of the nth band. A lowERGAS value indicates good spectral quality of the image. Notice that the minimal value (zero) is only

reached for ~u = ~M .

7.1.2 Spectral Angle Mapper (SAM)

The spectral angle mapper looks at the values of single pixels among all bands and views them as spectralvectors on a pixel-by-pixel base. As a quality metric the average change in angle of those spectral vectorsis computed [YGB92, DYKS07, AWC+07]. For each pixel we calculate

αi,j = arccos

( ∑n unMn√

(∑n u

2n)(∑nM

2n)

)⇔ αi,j = arccos

(〈~ui,j , ~Mi,j〉‖~ui,j‖ ‖ ~Mi,j‖

)(7.2)

and then take the average spectral angle of all pixels

SAM =

∑i

∑j αi,j

I · J. (7.3)

70 7. Image Quality Metrics

If the two spectral vectors point in similar directions the SAM value is small and indicates good spectralquality of the image. SAM is widely used to measure spectral delity and is especially popular forhyperspectral identication and classication tasks.

7.1.3 Spectral Information Divergence (SID)

In the SID quality metric ([Cha99]) each spectral vector ~x is modeled as a random variable. The spectralinformation divergence of two vectors ~x and ~y with normalized vectors ~q = ~x∑

i xiand ~p = ~y∑

i yiis dened

as

SID(~x~y) = D(~x‖~y) +D(~y||~x), (7.4)

where

D(~x||~y) =N∑n=1

pn log(pnqn

)(7.5)

is called the relative entropy or Kullback-Leibler divergence of ~y with respect to ~x. The average SID overall pairs of spectral responses can be used as a quality metric. Its reference value is 0.

7.1.4 Universal Image Quality Index (Q-average)

For two signals ~x and ~y the Q-average formula models distortion as a combination of three factors([WB02]).

Q =σxyσxσy

· 2xyx2 + y2

· 2σxσyσ2x + σ2

y

, (7.6)

where σx is the variance and c the average of a signal x. The rst term is the correlation coecientbetween the two signals, the second term measures the change in mean and the third the change invariance. Rewritten this formula can be expressed as

Q =4σxyxx

(σ2x + σ2

y)[x2 + y2]. (7.7)

Q-average takes values between −1 and 1 with 1 being the best possible value. To compare the spectralquality of multispectral images we use it slightly dierently than in the original Q-average paper [WB02].Instead of looking at the metric values bandwise we take the average over all metric values of all spectralresponse vectors to measure how the relation between the bands has been distorted.

7.1.5 Root Mean Squared Error (RMSE)

The root mean square error is a simple metric that directly takes the dierence in pixel values intoaccount. The formula for RMSE is given by

RMSE =

√∑n

∑i

∑j(ui,j,n −Mi,j,n)2

N · I · J. (7.8)

The smaller this value is the better is the spectral quality in this metric. Notice that in the RMSE metricany change of the multispectral image decreases the spectral quality. That also means that the less wesharpen the edges, the higher RMSE becomes. In this metric, we clearly see the tradeo between spatialand spectral quality.

7.1.6 Relative Average Spectral Error (RASE)

The relative average spectral error (as used in [Cho06]) is a metric which monotonically depends on theRMSE in each band. It is given by

RASE =100M

√√√√ 1N

N∑n=1

RMSE2(un,Mn) (7.9)

where M is the mean radiance of the original multispectral bands Mn.

7.2. Spatial Quality Metric 71

7.1.7 Correlation Coecient (CC)

The correlation coecient is a very popular statistical method for measuring the similarity of two datasets.The CC is in a range from minus one to one while values close to one indicate a high similarity and valuesclose to minus one an inverse relationship. The formula for the correlation between two bands A and Bis

Corr(A,B) =

∑i

∑j(Ai,j − A)(Bi,j − B)√(∑

i

∑j(Ai,j − A)2

)(∑i

∑j(Bi,j − B)

) . (7.10)

Since we want to sharpen, and therefore change, the bands it is obvious that a high correlation coecientbetween each original multispectral and the sharpened band is not necessarily desirable. We thereforefollow the idea in [VHY04] and calculate the correlation coecients for dierent combinations of bands.The correlation coecient between band i and j in the original multispectral image should be close tothe same correlation coecient in the sharpened image. The relation between the bands should stay thesame. If the bands were similar or dissimilar before the sharpening process they should still be similar ordissimilar afterwards. We therefore use the average change in correlation coecient for all combinationsof bands as a quality metric:

CC(u,M) =∑Nn

∑Nm<n |Corr(Mn,Mm)− Corr(un, um)|

N(N−1)2

. (7.11)

Low values of CC indicate good spectral quality.

7.2 Spatial Quality Metric

Big dierences in spatial quality of two images can be determined visually. The human eye is quitesensitive to blurred edges or missing texture. However, the spatial quality is hard to quantify in a precisemathematical way by a metric. In 1998 Zhou et al. proposed a spatial quality metric which looks atthe correlation coecients between ltered versions of the panchromatic image and each sharpened band([ZCS98]). We will refer to this metric as ltered correlation coecients (FCC).

7.2.1 Filtered Correlation Coecients (FCC)

The FCC wants to extract the high frequency components of the panchromatic image and each multi-spectral band. The high frequencies should reect edges and texture and therefore a high spatial qualityimage should have high frequency information similar to the panchromatic image. However, the magni-tude of these edges does not necessarily have to coincide, which is the reason why Zhou et al. proposedto look at their correlation coecients ([ZCS98]).

The rst step is to lter the panchromatic image and each band with the following convolution mask:

-1 -1 -1mask = -1 8 -1

-1 -1 -1

Then the average correlation coecient of the ltered panchromatic image and all ltered bands iscalculated to obtain FCC. An FCC value close to one indicates high spatial quality. We should mentionthat the FCC value is very sensitive to noise (in the panchromatic or the sharpened image). Also, theintroduction of too much texture will aect the FCC value, because this will strongly inuence the lteringresults. We will see that in some cases our visual analysis will not necessarily conrm the FCC results.

72 7. Image Quality Metrics

8. NUMERICAL RESULTS

In this chapter we will present pan-sharpening results for VWP, AVWP and all pan-sharpening techniqueswe have introduced in Chapter 3. This chapter is structured as follows: In the rst section we compareour proposed models with other pan-sharpening methods and evaluate several fusion results with the helpof image quality metrics. The comparison between AVWP and VWP is done in the second section. Herewe investigate the question how much quality do we loose and how much speed do we gain by choosingAVWP instead of VWP. The third and last section compares the three dierent minimization methods:explicit time stepping, ADI and Split Bregman. For the entire numerical results chapter, we shouldmention that the multispectral images were data on an unusual scale (of roughly one to one thousand)which is why we normalized each band such that the intensity values were between zero and one.

8.1 Comparison of Dierent Pan-Sharpening Methods

In this section we compare VWP and AVWP to the fusion results of all methods described in Chapter3, namely classical four-band IHS image fusion, Brovey image fusion, PCA image fusion, wavelet pan-sharpening in a two level wavelet decomposition using Matlab's sym4 stationary wavelets and P+XSimage fusion with parameter λ = 5, µ = 15, γi = 0.6 ∀i and αi = 0.25 ∀i. The results for all methods forthree dierent images are shown in Figures 8.1, 8.2 and 8.3.

74 8. Numerical Results

Fig. 8.1: First example image for the comparison of dierent pan-sharpening methods

8.1. Comparison of Dierent Pan-Sharpening Methods 75

Fig. 8.2: Second example image for the comparison of dierent pan-sharpening methods


Fig. 8.3: Third example image for the comparison of dierent pan-sharpening methods

8.1. Comparison of Dierent Pan-Sharpening Methods 77

First, we can say that the results of all sharpening methods are a huge improvement in visual spatialquality as compared to the low resolution image. Nevertheless, there are still big dierences among themethods. Brovey, IHS and PCA usually give the best spatial resolution, the images look very clear andhave sharp edges. However, we can also see that in these images the colors from the original low resolutionmultispectral image are not necessarily preserved. For example, in Figure 8.1 the color of the swimmingpool in the Brovey method has clearly changed to a darker blue. The PCA and the IHS method changedthe colors of the trees to a lighter green. The drawback of the high spatial quality of these methods seemsto be spectral distortion. Similar eects can also be observed in the other Figures 8.2 and 8.3. It appearsthat the spectral signal of vegetation is distorted in the PCA and IHS method. As mentioned earlier,this is highly undesirable, since the reason for taking multispectral images opposed to ordinary images isto use the spectral signature for identication and classication tasks.

The results for P+XS, wavelet fusion, AVWP and VWP in the rst two images do not look assharp as the other methods in terms of spatial quality, but therefore preserve the color from the originalmultispectral image much better. Visually, it is dicult to determine which of the images gives the bestspectral quality. Therefore, we use the seven dierent image quality metrics we described in Chapter 7to help us with the evaluation. The values of all quality metrics for all dierent methods correspondingto Figures 8.1, 8.2 and 8.3 are shown in Tables 8.1, 8.2 and 8.3 respectively.

Tab. 8.1: Image quality metrics for Figure 8.1

ERGAS Q-AVE RASE RMSE SAM SID CC FCC

Brovey 3.44 0.984 13.9 55.3 0.00 0.5 4.4 0.976IHS 3.38 0.984 13.6 54.2 1.64 19.6 3.5 0.982

PCA 3.31 0.982 13.1 52.3 2.03 14.0 2.5 0.976Wavelet 1.99 0.991 8.0 32.0 1.24 2.0 1.4 0.972P+XS 1.95 0.986 7.6 30.5 1.70 2.4 1.7 0.839AVWP 1.42 0.997 5.7 22.8 0.22 0.7 0.6 0.885VWP 1.79 0.996 7.2 28.7 0.43 0.3 1.2 0.951



Brovey 2.80 0.988 11.3 44.7 0.00 0.03 3.2 0.971IHS 2.84 0.989 11.2 44.5 1.35 2.39 4.2 0.974

PCA 2.52 0.977 10.5 41.7 1.96 2.72 0.3 0.956Wavelet 1.44 0.995 5.7 22.8 0.99 0.51 2.0 0.973P+XS 2.06 0.985 7.8 30.9 1.80 1.84 2.1 0.796AVWP 1.07 0.998 4.3 17.2 0.30 0.04 0.5 0.861VWP 1.12 1 4.5 17.8 0.34 0.07 1.5 0.939



Brovey 3.22 0.985 13.3 31.5 0.00 2.5 6.8 0.953IHS 3.21 0.981 13.2 31.2 1.39 4.1 6.9 0.956PCA 3.06 0.963 13.2 31.1 2.39 7.1 1.9 0.944Wavelet 2.39 0.987 9.8 23.2 1.25 5.0 3.5 0.977

P+XS 1.88 0.986 7.9 18.7 1.62 3.4 5.9 0.843AVWP 1.56 0.996 6.4 15.1 0.10 0.4 1.4 0.876VWP 2.27 0.992 9.3 22.0 0.39 0.8 3.3 0.967

To a large extent the quality metrics conrm our visual observations. Brovey, IHS and PCA haveclearly worse spectral quality metric values than the other methods. The fact that Brovey has a SAM


value of zero on every image is due to the algorithm. Brovey multiplies each multispectral band by thesame factor such that the spectral angle will always be zero. Nevertheless, the other quality metric valuesshow that Brovey still gives spectral distortion since the ERGAS, Q-AVE, RASE, RSME and CC valuesare of the same order of magnitude as IHS and PCA.

Wavelet fusion and P+XS give better spectral quality. However, our proposed VWP and AVWPmethods still outperform these methods in terms of spectral quality. For all three images the best valuesfor ERGAS, Q-AVE, RASE and RMSE are achieved by AVWP or VWP. SAM gives better values onlyfor the Brovey method where SAM= 0 by denition. For the IHS, PCA, Wavelet and P+XS methodsthe SAM quality metric (which we explicitly wanted to minimize by our energy functional) is over threetimes higher than for our proposed VWP and AVWP methods.

One can see that there is a trade o between spectral and spatial resolution. Generally, the better thespatial resolution gets the more dicult it is to preserve the spectral quality. The FCC metric indicateshigh spatial quality for the IHS method. However, the low FCC values for P+XS especially in comparisonto the wavelet and VWP method seem somehow strange and can not necessarily be conrmed visually,which shows that FCC values are related to the spatial quality but should not be considered as the onlymeasure.

The third image is somehow dierent from the rst two. Here wavelet fusion gets the best spatialquality value, closely followed by VWP. Compared to IHS and PCA, VWP does better spectrally as wellas spatially. Notice that in our example images we generally focused more on spectral quality. Thisemphasis on spectral or spatial quality is determined by the choice of parameters. One can alreadysee that the AVWP method had slightly better spectral while the VWP method had slightly betterspatial results. This is not a general conclusion, but rather due to the specic choice of parameter. Theparameters we used to produce the VWP and AVWP images in Figures 8.1, 8.2 and 8.3 are shown inTable 8.4.

Tab. 8.4: Choice of parameter for high spectral quality

γ ν µ ε η c0 c1 c2AVWP 1 4 500 10−4 1 - - -VWP 0.5 15 100 5 · 10−2 0.5 4 2 1

A great advantage of VWP and variational methods, in general, is their exibility. Brovey, PCA andIHS generally require no parameters and the only possible change in wavelet fusion would be the level ofdecomposition. In a variational method, the parameters have a huge inuence on the result and therebyone can choose which term in the energy functional is more important for a certain purpose. Puttingmore weight on spectral terms will especially enforce spectral quality as shown in the images above. Onthe other hand, using exactly the same model one can also focus on spatial quality and get results ofspatial quality similar to IHS. Figure 8.4 shows three fusion results of the VWP method for dierentchoices of parameter: One image with very high spectral quality, one image with high spatial quality anda compromise between spectral and spatial quality.

The VWP fused image (f) has spatial quality better than the IHS fusion result and one can see thatthe color of the trees is more realistic and closer to the original low resolution color in the VWP result.To produce this kind of high spatial quality images we use the parameters shown in Table 8.5.

Tab. 8.5: Choice of parameter for high spatial quality

γ ν µ ε η c0 c1 c2VWP 0.7 4 100 10(−3) 1.4 0.5 4 4

AVWP 0.7 4 100 10(−3) 1.4 - - -

8.2. Comparison of VWP and AVWP 79

Fig. 8.4: Inuence of the parameter in VWP

In summary we can say that VWP can produce a very wide range of good fusion results from imageswith spatial quality equal to IHS while still having more realistic colors to spectral qualities better thanany other fusion method. There is denitely a trade-o between spectral and spatial quality, but VWPhas the exibility to give almost any type of desired combination of spectral and spatial quality byadjusting the parameters.

8.2 Comparison of VWP and AVWP

In this section we want to investigate how much image quality we give up by using AVWP instead ofVWP and how much speed we gain. The diculty here is that we know that there is a trade o betweenspectral and spatial quality. To get a fair comparison we have to make sure that either the spectral or


the spatial quality of the VWP and AVWP fusion results are roughly the same and then compare theirspatial or spectral quality respectively. As seen earlier spectral quality is really hard to measure andwe have seven dierent metrics for spectral quality which will make it almost impossible to produce (oreven dene) equal spectral quality images. Hence, we will take a VWP and a AVWP fused image ofvisually equal quality and compare their spectral quality. Figure 8.5 shows sharpening results of VWPand AVWP on two dierent test images.

Fig. 8.5: Comparison of VWP and AVWP

It is hard to tell the dierence between the VWP and the AVWP sharpened images. As for theevaluation with the other methods we can have a look at the image quality metrics and the Runtime T ofboth methods (Table 8.6). The spatial quality metric FCC indicates almost exactly equal spatial quality

Tab. 8.6: Quality metric values for the comparison of VWP and AVWP

Image 1ERGAS Q-AVE RASE RMSE SAM SID CC FCC T in s

VWP 1.86 0.995 7.5 28.1 0.16 0.0003 0.0356 0.945 110.5AVWP 1.94 0.994 7.8 29.3 0.20 0.0004 0.0393 0.945 20

Image 2ERGAS Q-AVE RASE RMSE SAM SID CC FCC T in s

VWP 1.19 0.999 4.7 8.4 0.12 0.0004 0.0026 0.923 37.5AVWP 1.23 0.998 4.9 8.7 0.16 0.0007 0.0024 0.921 8

for the two fusion results for both images while the spectral quality is better in all categories for VWP(except for the correlation coecients on Image 2). On the other hand, we can see that the dierencein quality metric values is not very big. Furthermore, we could give up a very little spatial quality toget the same spectral quality. However, the dierence in runtime is huge. AVWP is about 4.5 timesfaster on the smaller Image 2 and 5.5 times faster on Image 1 (using ADI for the minimization). Thisdierence in runtime of course increases as the size or the dimension of the imagery increases. Takinginto account that the stationary wavelet decomposition in VWP not only aects the runtime but alsothe memory needed it appears AVWP is better suited for hyperspectral imagery. On our computer (with3GB memory) Matlab ran out of memory calculating the stationary wavelet decomposition of a 82 band345× 276 pixel hyperspectral image for VWP.

We can conclude that AVWP is the better choice than VWP if either a faster runtime is moreimportant than a slight increase in image quality or the dimension of the image is large. For small imageswhere a very high precision is desirable the original VWP energy is probably the better choice.

8.3. Comparison of Dierent Minimization Methods for VWP 81

8.3 Comparison of Dierent Minimization Methods for VWP

In this section we would like to compare the three dierent minimization methods we described in Section6. We implemented the ADI and explicit method as described above for AVWP and VWP in Matlab.For AVWP we additionally implemented the Split Bregman method. For the wavelet matching partsin VWP and AVWP Matlabs sym4 stationary wavelet transform was used in a second level waveletdecomposition. The runtimes for all methods are given using an Intel Duo Core processor with 2GHzand 3GB memory.

First we will compare the two gradient descent timestepping methods, which is easier since they arebased on the same idea. The advantage we hope to get out of the ADI method is that it is stablefor slightly larger time steps and has better convergence properties, such that we have to perform lessiterations.

8.3.1 Alternating Directions Implicit vs. Explicit method

We want to compare both methods for both energies: VWP and AVWP. For this comparison, we willuse the parameter shown in Table 8.7.

Tab. 8.7: Choice of parameters for the comparison

γn ν µ ε η c0 c1 c2VWP 1 2 100 2 ∗ 10(−2) 1 2 1 0.5

AVWP 1 2 100 2 ∗ 10(−2) 1 - - -

Numerically we found that using these parameters the explicit method remains stable for timestepsτ = ε

9 and the ADI method for τ = 10 ·ε. Let us rst investigate the decay of the L1 residual between two

successive iterates per pixel:|uk+1−uk|L1

|Ω| . We run both methods for 200 iterations on a 4-band 330× 280pixel image and measure the time for each method to fulll

|uk+1 − uk|L1

|Ω|≤ 10(−5) (8.1)

as well as the total time and the nal value for the residual in the last iteration. The result values forAVWP are shown in Table 8.8.

iterations to meet 8.1 time to meet 8.1 in s total time in s nal residual

ADI 47 34 138 1.9 · 10(−8)

explicit 141 44.5 63 4 · 10(−6)

Tab. 8.8: Comparison of ADI and explicit timestepping for minimizing AVWP

We can see that the ADI method reached the stopping criterion much faster in terms of the numberof iterations as well as the time needed. The total runtime clearly shows that each step of the explicitmethod is much faster than one step of the ADI method, but one ADI step is more eective. Doing twohundred iterations, the ADI method takes more than twice as long but therefore gives a residual that isover hundred times smaller. A similar observation can be made for the VWP energy minimization (Table8.9).

iterations to meet 8.1 time to meet 8.1 in s total time in s nal residual

ADI 88 214 497 9 · 10(−9)

explicit 160 333 418 7 · 10(−6)

Tab. 8.9: Comparison of ADI and explicit timestepping for minimizing VWP

In this case the dierence is even more drastic. The explicit method takes almost twice as manyiterations and about two minutes longer to reach the stopping condition. The nal residual for theADI method is even lower for VWP than for AVWP. However, the conversions between the spatial and


the wavelet domain are computationally much more expensive, as we have already seen in the previoussection. 88 iterations of the ADI took 214 seconds in VWP and 47 iterations took only 34 seconds inAVWP.

Besides looking at the residual as a stopping condition, we can also compare the methods in terms oftheir decay of the energy. The graphs for VWP and AWP for both methods are shown in Figure 8.6 for200 iterations.

Fig. 8.6: Decay of the energy in VWP and AVWP for the ADI and explicit method

As expected, the energy decays much faster for the ADI method for both energy models until bothminimization methods reach the same energy level as a steady state. Notice that our stopping criteriaseemed to be reasonable but a little strict. Especially for VWP it looks like we almost have a steadystate after about 25 iterations for ADI and 75 iterations of the explicit method. For visually satisfyingresults we can stop the iteration even earlier such that the runtimes in the above tables do not necessarilyreect the true algorithm runtimes. The above image can be sharpened using ADI on the AVWP energyin about 16 seconds.

Generally, we have seen that the ADI method clearly outperforms the explicit timestepping. For theVWP as well as for the AVWP model we need less iterations and get much faster convergence with ADI.We should mention that the convergence for both methods heavily depends on the choice of the additionalregularization ε we needed in the total variation term. Because this regularization seems to be importantwe will investigate its inuence on the results and runtime in the next section.

8.3.2 Dependence of the Gradient Descent Methods on the Regularization Parameter ε

As seen in the derivation of the method there is a time step restriction which depends on ε. The largerε the larger we can choose our time step and the faster our convergence will hopefully be. So the rstquestion we want to address here is: Why don't we choose ε really large for faster convergence? Theanswer is that for big ε the additional ε-regularized energy is clearly dierent from the energy we wantedto minimize. First of all this is a theoretical problem since the additional regularization is not justiedby our model described in Chapter 4 and we do not nd the minimum of our original energy functionalany more. Secondly, and this is even more important, we can see the dierence in the images for toolarge ε. Figure 8.7 shows three pan-sharpening results for dierent values of ε.

We can clearly see the inuence of ε. The sharpening result with the biggest ε has mostly blurryedges. Only the very strong edges look sharp. Decreasing ε by a factor of ten greatly improves theimages quality as we can see in the second image. In particular, the texture of the trees looks much nicerand sharper now. The improvements for a really small ε are not that great anymore. Furthermore, thegradient descent becomes very expensive for this kind of ε: While the rst two images could be producedin under 20 iterations taking under 20 seconds, the third image needed 584 iterations and over 8 minutesto reach the stopping criterion that the L1 residual between two successive iterates per time step and

pixel is less than 10−3:|uk+1−uk|L1

τ |Ω| ≤ 10−3.


Fig. 8.7: Inuence of the additional ε regularization in the ADI method

This eect of the inuence of the additional regularization parameter is well know for total variationdenoising, but seems to be a little less drastic for our problem. The reason for that is probably the edgeenforcing term div(θ) which seems to decrease the smearing of the edges. The above experiments showthat we can pick an ε for which we get relatively fast convergence without having great visual dierencesto the fusion results with a much smaller ε. In the next section we will further investigate the eect ofthe addition regularization by comparing the gradient descent methods with the Split Bregman method,which does not require such a regularization.

8.3.3 Split Bregman vs. Gradient Descent Methods

We have seen in Section 8.3.1, that the ADI method outperforms the explicit timestepping which is whywe will focus on the comparison of ADI and Split Bregman for minimizing AVWP in this section. Wehave seen that the parameter ε inuences the quality and the speed of the ADI method. As we cansee in the derivation of the Split Bregman method (Section 6.2) we do not need this kind of additionalregularization here. The alternate energy can be minimized without further modications. However, westill have to calculate theta which is given by θ = ∇P

|∇P |ε . For the ADI method it is kind of a natural

choice to use the same ε for the calculation of ∇P|∇P |ε as we do for the total variation part ∇u

|∇u|ε . For the

Split Bregman method we can choose the ε for calculating θ freely, since it does not aect the stabilityof the method at all. Thereby, we have to be aware of the fact, that the total variation term does nothave an additional regularization term in the denominator such that we minimize

∫div( ∇P|∇P |ε )u+ |∇u| dx

opposed to∫div( ∇P|∇P |ε )u + |∇u|ε dx in the ADI version. For a relatively big ε the term ∇P

|∇P |ε will be

close to zero in regions where there is only a little variation, i.e. a small ∇P . In these regions the totalvariation which is not articially reduced by ε will take over and greatly smoothen those regions.

Using the example of the previous section, this eect can clearly be seen: If we take ε = 2 · 10−2 asdone for fast convergence of the ADI method, we get a reasonable image with the ADI (which regularizesboth θ and TV), but a strongly denoised almost cartoon like image with the Split Bregman method(which only regularizes θ) as we can see in Figure 8.8. The choice of parameter ε denitely needs to beadapted for the Split Bregman method.


Fig. 8.8: Inuence on the ε regularization of θ in Split Bregman

The fact that these dierences in the sharpening results are just based on the additional regularizationwith ε can easily be shown by looking at the energies. The AVWP energy is dierent from the one we areminimizing with the ADI method. The following equations show both energies, the one from the originalAVWP model (8.2) and the one we numerically minimize when using the ε regularization (8.3).

E =N∑n=1

[γ

∫Ω

|∇un| dx+ η

∫Ω

div(θ) · un dx]

+µN∑

i,j=1,i<j

∫Ω


+νN∑n=1

∫Ω

(un − Zn)2 dx (8.2)

Eε =N∑n=1

[γ

∫Ω

|∇un|ε dx+ η

∫Ω

div(θ) · un dx]

+µN∑

i,j=1,i<j

∫Ω


+νN∑n=1

∫Ω

(un − Zn)2 dx (8.3)

Now, we can run the ADI and the Split Bregman method and calculate the decay of energy for E andEε during the iteration. Figure 8.9 shows the results.


Fig. 8.9: Left gure: Decay of the energy (8.2); Right gure: Decay of the energy (8.3)

The Split Bregman method minimizes the original AVWP energy (left part of Figure 8.9) while theADI method minimizes the epsilon regularized energy (right part of Figure 8.9). For this rather largevalue of ε the regularized energy does not even seem to be a good approximation of AVWP as we cansee in the huge dierence of nal energies of the two methods as well as very dierent fusion results. Wecan make the observation that to approximate our AVWP model with the energy ADI uses we need tochoose a much smaller ε. On the other hand we could also say, that the fusion results we get with theADI method are really good for this choice of parameter even though we do not minimize the originalAVWP energy. The additional regularization makes the energy dierentiable and therefore much easierand faster to minimize. Notice that the steady state in Figure 8.9 is reached much faster by the ADImethod.

A moderate ε in the Split Bregman method gives very good and sharp results and does not increasethe runtime at all. As seen in previous sections, the ADI method becomes very slow with a decreasing ε.Figure 8.10 shows the fusion result for a good choice of ε.

Fig. 8.10: Split Bregman fusion result

A nal conclusion for the comparison of ADI and Split Bregman is hard to draw, since it depends onthe question how much we are willing to move away from our original model and additionally regularizethe total variation term. One thing we can denitely say is that the Split Bregman algorithm is equallyfast while minimizing the energy more accurately. The actual runtime comparison is dicult since ADIand Split Bregman do not minimize the same energy and comparisons of the residual depend on the sizeof the timestep for the ADI method.

For the Split Bregman method, a dierent calculation of div(θ) might make sense. In the implemented


version with Split Bregman we additionally regularize θ but not the TV-term. To entirely get rid of theε regularization one could choose the subgradient of | · |BV at P to be determined by a BV minimization:p = 2λ(P − P ) with P = argminuλ

∫(P − u)2dx + |u|BV , instead of p = −div(θ). This approach might

be an interesting topic for future research.In the next section we will see how the pan-sharpening algorithm extends to higher dimensions, i.e.

to hyperspectral images.

9. EXTENSION TO HYPERSPECTRAL IMAGERY

9.1 Hyperspectral Images

So far we have discussed the possibilities for sharpening multispectral (4-6 band) images with the help ofa higher resolution panchromatic image. Similar techniques could be interesting for sharpening even highdimensional data, so called hyperspectral images that record the light intesities in up to 210 dierentfrequency channels.

Hyperspectral imagery oers a lot more information than ordinary color or multispectral images.While multispectral images can be used for detection of vegetation and camouage, estimating waterdepth, soil moisture content or seeing the presence of res, hyperspectral images even give the opportunityof identifying the material the objects in the image are made of. Hyperspectral images are usually used fortarget detection, material mapping, material identication or mapping details of the surface properties.For all these tasks the image is typically seen as collection of spectral vectors with the length of thenumber of bands. These spectral vectors can be compared with a library of so-called endmembers, whichare spectral signatures of known materials. The comparison of these vectors can be done in numerousdierent metrics. One of the most common metrics is the spectral angle mapper (SAM) which we used asa quality metric and which we specically wanted to minimize with our energy function. Precise materialclassication is a dicult task and requires precise spectral information. Since VWP specially focuses onpreserving spectral information it seems to be very well suited for sharpening hyperspectral images.

In this section we investigate the extension of VWP to hyperspectral data. We work with two datasets:A 210 band AVIRIS hyperspectral image of San Diego Harbor and a 210 band HYDICE hyperspectralimage of an urban scene in Texas which is freely available online ([Pri]).

9.2 Acquisition of a Master Image

A problem with the data is that no panchromatic image is included. Since we do not assume that the highresolution image we use for the sharpening is panchromatic (i.e. we do not assume its spectral responsespans all bands) it would be sucient to have any kind of high resolution image from the same scene.

Therefore, we look up the exact same scenes on Google MapsTM

[Goo] which oers very high resolution

images and use a screenshot for the sharpening. Because the Google MapsTM

image is not panchromaticanymore, we will refer to it as the master image.

The registration of the master image with the hyperspectral scene turns out to be very dicult.The hyperspectral image seems to be spatially distorted and, as opposed to the registration in the pan-sharpening case not only translation, but also rotation, shearing and scaling without keeping the aspectis needed. For the examples shown in this report we did the registration on small parts of the imagemanually.

9.3 Numerical Results

Once we have a hyperspectral image and a corresponding registered master image we run our AVWPalgorithm on this data. We specically try to preserve the spectral quality with large weights on thespectral quality enforcing terms. For the images shown in this report we used γ = 1, η = 1, ν = 2,µ = 300 and ε = 5 · 10(−2) for the ADI method and γ = 1, η = 1, ν = 2, µ = 500 and ε = 5 · 10(−4) forthe Split Bregman method.

The rst example is taken from the urban scene in Texas. Figure 9.1 shows the whole image wherethe red rectangle marks the part of the image we selected for the sharpening process.

88 9. Extension to Hyperspectral Imagery

Fig. 9.1: Selected scene of the Urban image

We look up the same scene on Google MapsTM

and extract a master image for the sharpening. Figure9.2 shows the master image, the low resolution image and the sharpened image.

The visual quality greatly improves. The blocky low resolution image becomes a much smoother imagewith sharp edges in which a lot of spatial details from the master image can be seen. In a few regionssome colors ow over certain edges which is due to a slightly inaccurate registration. Nevertheless, wegain a lot of spatial quality through the sharpening process.

Fig. 9.2: left image: master image ` c©2009 Google - Imagery c©2009 DigitalGlobe, Sanborn, Cnes/Spot ImageGeoEye, U.S. Geological Survey'; middle and right image: low resolution and sharpened image

The increase in resolution is even more dramatic for the San Diego harbor image for which Google

MapsTM

allows an even higher zoom. Figure 9.3 shows a building with some boxes in front of it. In thelow resolution hyperspectral image we can see that there is something in front of the building, but theidentication of the actual object (and its shape) is impossible even if we had the material informationfrom analyzing the hyperspectral bands. In the sharpened image we can clearly see the shape of theobjects and identify them as boxes or containers. On this hyperspectral image of 345 × 276 pixel with82 bands the ADI method takes about 7 minutes and 46 seconds, and the Split Bregman method about6 minutes and 11 seconds to converge on an Intel Duo Core processor with 2GHz and 3GB memory (todetermine the convergence of the algorithms we here looked at the decay of the residual and therebydetermined a reasonable stopping condition manually). The two dierent minimization methods givevisually undistinguishable results, such that we only present one sharpened image per test image.

9.3. Numerical Results 89

Fig. 9.3: left image: master image ` c©2009 Google - Imagery c©2009 DigitalGlobe, GeoEye'; middle and rightimage: low resolution and sharpened image displayed as false color images

In Figure 9.4 we see the top of a building and the master image shows that there are some pipes onthe roof, probably from an air conditioning system. The hyperspectral image itself is much too blurry tosee this detail and an analysis of this image would probably only indicate that there is a certain amountof metal on the roof. Only the sharpened image contains both pieces of information: the shape and thespectral signature of the material of the pipes.


In general we can say that the visual quality is greatly increased by the sharpening process. While inthe low resolution hyperspectral images small objects cannot be visually identied, the sharpened imagesare very close to the master images in terms of spatial quality. A lot of details can be seen.

We should mention that we do not know when the Google maps picture was taken. The boxes in Figure9.3 for instance could be things we see in the master image that were not present in the hyperspectralimage and therefore appear as phantoms. One has to be careful with the introduced spatial informationif the two images were not taken at the same time. On the other hand, this might also be an advantagebecause high resolution satellite images are much easier to obtain than hyperspectral images. One couldvisually update an older hyperspectral image with the spatial information of a recent master image.

As mentioned earlier the most important issue in sharpening hyperspectral images is that the spectralsignature of each pixel is preserved since this information is used for material classication. Figure 9.3shows another hyperspectral sharpened image displayed in false color.



To investigate how the spectral signature changed during the sharpening process we select three pixelsin the 82 band hyperspectral scene and look at their spectral signature. These pixels are marked withred crosses in Figure 9.3.

Fig. 9.6: Pixel to investigate the spectral signature of are marked in red. We will refer to these pixel as pixel (1),(2) and (3) from left to right.

Pixel (1) is in the middle of a box where there are no edges or texture even in the master image. Atthese pixels we match our current iteration to the low resolution image and therefore do not change theintensity in any band at all. Indeed the comparison of the spectral signatures between the low resolutionimage and the sharpened image at that point is almost identical as shown in Figure 9.3.

Fig. 9.7: Spectral response of pixel (1)

9.3. Numerical Results 91

At pixel (2) and (3) we are sure to change the signature because we want to increase the contrast atthese points to enhance the edges. Looking at Figure 9.3 we can clearly see this change in contrast.

Fig. 9.8: Spectral response of pixel (2) and (3)

Our energy model allows this increase of contrast, but enforces the frequency vectors to stay parallel.In other words, we keep the spectral angle at zero degrees. If we normalize both spectral responses tohave Euclidean norm 1 we get the result shown in Figure 9.3.

Fig. 9.9: Normed spectral response of pixel (2) and (3)

The spectral responses match almost perfectly for both pixels. The frequency vectors stay parallelsuch that any classication or identication method using the spectral angle will give exactly the sameresult on the spatially enhanced image as it would on the low resolution version.

10. CONCLUSIONS

We proposed a variational method based on the ideas of P+XS and wavelet image fusion for the task ofpan-sharpening multispectral images. The model incorporates the alignment of all unit normal vectorsof the level sets of each band with the panchromatic image. It includes color and edge matching in thewavelet domain and further preserves frequency information by keeping the ratio of all bands constantand by additionally matching the colors away from edges and texture.

VWP can produce a wide range of images where the user can choose the relative importance ofspectral and spatial quality by adjusting the parameters. High spatial and high spectral quality imageswere shown. We implemented eight dierent quality metrics to evaluate the performance of our proposedmethod in comparison to the most common pan-sharpening methods. Our method seems to be the bestchoice if one wants to preserve the spectral information from the multispectral image.

Among all pan-sharpening methods, VWP is the only method that easily extends to hyperspectralimagery while preserving the spectral information. We showed several sharpening results using Google

MapsTM

images as our master images and proved that the spectral signature is preserved up to a constantfactor. Any material classication method that uses the normed spectral response vectors will do equallywell on the sharpened image as on the original image, while an analysts work is greatly simplied by theadditional spatial information.

For future research one could look into incorporating technical information about the satellite sensorsinto the variational framework. Other totally dierent types of sensors and images could be used such asSAR or LIDAR images or MRI data for medical imagery. To automate the sharpening process a robustregistration method is needed. To reduce the bleeding of colors over some of the edges in the hyperspectralcase, deblurring could be included in the sharpening method and one could experiment with nonlocaltotal variation regularization. Furthermore, the proposed method could be combined with hyperspectralanalysis methods like demixing to not only improve the spatial but also the spectral resolution of theimage. Most detection and classication methods only take the spectral information of a hyperspectralimage into account without incorporating spatial information. New methods suitable for sharpenedimages could be developed, which also take the spatial information into account.

94 10. Conclusions

BIBLIOGRAPHY

[AABG02] B. Aiazzi, L. Alparone, S. Baronti, and A. Garzelli. Context-driven fusion of high spa-tial and spectral resolution images based on oversampled multiresolution analysis. IEEETransactions on Geoscience and Remote Sensing, 40:2300 2312, 2002.

[ABM05] H. Attouch, G. Buttazzo, and G. Michaille. Variational Analysis in Sobolev and BV Spaces:Applications to PDEs and Optimization (Mps-Siam Series on Optimization 6). SIAM,Philadelphia, USA, 2005.

[ACMM01] L. Ambrosio, V. Caselles, S. Masnou, and J. M. Morel. Connected components of sets ofnite perimeter and applications to image processing. Journal of EMS., (3):213 266, 2001.

[AK06] G. Aubert and P. Kornprobst. Mathematical Problems in Image Processing: Partial Dif-ferential Equations and the Calculus of Variations (second edition), volume 147 of AppliedMathematical Sciences. Springer-Verlag, Berlin, Germany, 2006.

[ali] Wakeforest University,Online: http://www.wfu.edu/ matthews/misc/DigPhotog/alias/, Jan. 2009.

[AWC+07] L. Alparone, L. Wald, J. Chanussot, C. Thomas, P. Gamba, and L. M. Bruce. Compar-ison of pansharpening algorithms: Outcome of the 2006 grs-s data-fusion contest. IEEETransactions on Geoscience and Remote Sensing, 45(10):30123021, Oct. 2007.

[BCIV06] C. Ballester, V. Caselles, L. Igual, and J. Verdera. A variational model for P+XS imagefusion. International Journal of Computer Vision, 69(1):43 58, August 2006.

[BO04] M. Burger and S. Osher. Convergence rates of convex variational regularization. InverseProblems, 20:1411 1421, 2004.

[Bre67] L. M. Bregman. The relaxation method of nding the common point of convex sets andits application to the solution of problems in convex programming. USSR ComputationalMathematics and Mathematical Physics, 7:200 217, 1967.

[BRH07] M. Burger, E. Resmerita, and L. He. Error estimation for Bregman iterations and inversescale space methods in image restoration. Computing, 81(2-3):109 135, 2007.

[Bur06] M. Burger. Numerik Partieller Dierentialgleichungen - Lecture Notes. WestfälischeWilhelms-Universität Münster, 2006.

[Bur07] M. Burger. Image Processing - Lecture Notes. Westfälische Wilhelms-Universität Münster,2007.

[Can86] J. Canny. A computational approach to edge detection. IEEE Transactions on PatternAnalysis and Machine Intelligence, 8(6):679 698, 1986.

[CCM02] V. Caselles, B. Collmar, and J.-M. Morel. Geometry and color in natural images. Journalof Mathematical Imaging and Vision, 16:89 107, 2002.

[Cd72] S.D. Conte and C. deBoor. Elementary Numerical Analysis. McGraw-Hill, 1972.

[CH03] Y. Chibani and A. Houacine. Redundant versus orthogonal wavelet decomposition for mul-tisensor image fusion. Pattern Recognition, 36:879 887, 2003.

96 Bibliography

[Cha99] C.-I Chang. Spectral information divergence for hyperspectral image analysis. Geoscienceand Remote Sensing Symposium, 1999. IGARSS '99 Proceedings. IEEE 1999 International,1:509 511, 1999.

[Cho06] M. Choi. A new intensity-hue-saturation fusion approach to image fusion with a tradeoparameter. IEEE Transactions on Geoscience and Remote Sensing, 44(6):1672 1682, June2006.

[CKCK] M. Choi, H.-C. Kim, N. I. Cho, and H. O. Kim. An improved intensity-hue-saturation method for IKONOS image fusion. submitted to IJRS. Online:http://amath.kaist.ac.kr/research/paper/06-9.pdf.

[CM99] V. Caselles and J.-M. Morel. Topographic maps and local contrast changes in naturalimages. International Journal on Computer Vision, 33:5 27, 1999.

[CS05] T. Chan and J. Shen. Image Processing And Analysis: Variational, PDE, Wavelet, AndStochastic Methods. Society for Industrial and Applied Mathematics, Philadelphia, USA,2005.

[CSZ06a] T. F. Chan, J. Shen, and H.-M. Zhou. Total variation wavelet inpainting. Journal ofMathematical Imaging Vision, 25:107 125, 2006.

[CSZ06b] T. F. Chan, J. Shen, and H.-M. Zhou. A total variation wavelet inpainting model with mul-tilevel tting parameters. Congress, Advanced signal processing algorithms, architectures,and implementations XVI, 69(1), August 2006.

[DB08] J. Dobrosotskaya and A. L. Bertozzi. A wavelet-Laplace variational technique for imagedeconvolution and inpainting. IEEE Transactions on Image Processing, 17:657 663, May2008.

[DGS05] Q. Du, O. Gungor, and J. Shan. Performance evaluation for pan-sharpening techniques.Geoscience and Remote Sensing Symposium, 2005. IGARSS '05. Proceedings. 2005 IEEEInternational, 6:4264 4266, July 2005.

[DYKS07] Q. Du, N. H. Younan, R. King, and V. P. Shah. On the performance evaluation of pan-sharpening techniques. IEEE Geoscience and Remote Sensing Letters, 4(4):518 522, Oc-tober 2007.

[ET99] I. Ekeland and R. Téman. Convex Analysis and Variational Problems. Society for Industrialand Applied Mathematics, Philadelphia, PA, USA, 1999.

[Fug] D. L. Fugal. Conceptual wavelets in digital signal processing.Online: http://www.conceptualwavelets.com/, May 2009.

[GO08] T. Goldstein and S. Osher. The split Bregman method for l1 regularized problems. UCLACAM report, 08-29, 2008.

[Goo] Inc. Google. Google Maps. Online: http://maps.google.com, Jan. 2009.

[Gra95] A. Graps. An introduction to wavelets. IEEE Computational Science and Engineering, 2(2),1995.

[HCB02] P. Hill, N. Canagarajah, and D. Bull. Image fusion using complex wavelets. The BritishMachine Vision Conference, 2002.

[HZ95] G. Hong and Y. Zhang. The eects of dierent types of wavelets on image fusion. Interna-tional Conference on Image Processing, 1995. Proceedings., 3:248 251, October 1995.

[JP55] J. Douglas Jr. and D. W. Peaceman. Numerical solution of two-dimensional heat-owproblems. AIChE Journal, 1(4):505 512, 1955.

[Mal98] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.

Bibliography 97

[Mal02] F. Malgouyres. Mathematical analysis of a model which combines total variation and waveletfor image restoration. Journal of information processes, 2(1):1 10, 2002.

[MS89] D. Mumford and J. Shah. Optimal approximations by piecewise smooth functions and asso-ciated variational problems. Communications on Pure and Applied Mathematics, 42(5):577 685, 1989.

[MWB08] M. Möller, T. Wittman, and A. L. Bertozzi. Variational wavelet pan-sharpening. UCLACAM report, submitted to IEEE Transactions on Geoscience and Remote Sensing, 08(81),2008.

[MWB09] M. Möller, T. Wittman, and A. L. Bertozzi. A variational approach to hyperspectral imagefusion. Proceedings of the SPIE Conference on Algorithms and Technologies for Multispec-tral, Hyperspectral, and Ultraspectral Imagery XV. Orlando, Florida, 2009.

[OBG+05] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method fortotal variation-based image restoration. SIAM Multiscale Modeling and Simulation, 4:460 489, 2005.

[OGAFN05] X. Otazu, M. González-Audícana, O. Fors, and J. Núñez. Introduction of sensor spectralresponse into image fusion methods. application to wavelet-based methods. IEEE Transac-tions on Geoscience and Remote Sensing, 43(10), 2005.

[PHHR55] D. W. Peaceman and Jr. H. H. Rachford. The numerical solution of parabolic and ellipticdierential equations. Journal of the Society for Industrial and Applied Mathematics, 3(1):28 41, 1955.

[PM90] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diusion. IEEETransactions on Pattern Analysis and Machine Intelligence, 12(7):629 639, 1990.

[Pri] Air University Space Primer. Chapter 12: Multispectral imagery.Online: http://www.au.af.mil/au/awc/space/primer/multispectral_imagery.pdf,Jan. 2009.

[ROF92] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algo-rithms. Physica D, 60:259 268, November 1992.

[Ser82] J. Serra. Image Analysis and Mathematical Morphology. Academic Press, Inc., 1982.

[She92] V. K. Shettigara. A generalized component substitution technique for spatial enhancementof multispectral images using a higher resolution data set. Photogrammetric engineeringand remote sensing, 58(5):561 567, 1992.

[Shi03] P. Shippert. Introduction to hyperspectral image analysis. Online Journal of Space Com-munication, 2003. Online: http://www.geo.utep.edu/pub/hurtado/5336/labs/lab7/readings/shippert_hyperspectral.PDF.

[Shl05] J. Shlens. A tutorial on principal component analysis, December 2005. Online:http://www.snl.salk.edu/shlens/pub/notes/pca.pdf.

[SLLS06] Y. Song, M. Li, Q. Li, and L. Sun. A new wavelet based multi-focus image fusion schemeand its application on optical microscopy. Proceedings of the 2006 IEEE InternationalConference on Robotics and Biomimetics, December 2006.

[SRM08] M. Strait, S. Rahmani, and D. Merkurev. Evaluation of pan-sharpening methods. UCLATechnical Report, August 2008.

[THHC04] T.-M. Tu, P. S. Huang, C.-L. Hung, and C.-P. Chang. A fast intensity-hue-saturation fusiontechnique with spectral adjustment for IKONOS imagery. IEEE Geoscience and RemoteSensing Letters, 4(1), October 2004.

98 Bibliography

[VHY04] V. Vijayaraj, C. G. O Hara, and N. H. Younan. Quality analysis of pansharpened images.Geoscience and Remote Sensing Symposium, 2004. IGARSS '04. Proceedings. 2004 IEEEInternational, 1, September 2004.

[WB02] Z. Wang and A. C. Bovik. A universal image quality index. IEEE Signal Processing Letters,9(3):81 84, 2002.

[Wik] Wikipedia. Online: http://en.wikipedia.org/wiki/Multispectral_imaging, Jan. 2009.

[YGB92] R. H. Yuhas, A. F. H. Goetz, and J. W. Boardman. Discrimination among semi-arid land-scape endmembers using the spectral angle mapper(SAM) algorithm. Proceeding Summeries3rd Annual JPL Airborne Geosciense Workshop, pages 147 149, 1992.

[ZCS98] J. Zhou, D. L. Civico, and J. A. Silander. A wavelet transform method to merge landsatTM and SPOT panchromatic data. International Journal of Remote Sensing, 19(4), 1998.

EIDESSTATTLICHE ERKLÄRUNG

Gemäÿ der Diplomprüfungsordnung für den Studiengang Mathematik an der Westfälischen Wilhelms-Universität Münster vom 15. Juli 1998 versichere ich, dass ich die vorliegende Arbeit selbständig verfasstund neben dem Programm Matlab keine weiteren Hilfsmittel als die im Literaturverzeichnis angegebenenverwendet habe.

Münster, den 17. Juni 2009

a variational approach for sharpening high dimensional...

Documents