quality assessrnent - university of toronto t-space · chapter 1 introduction 1.1 significance of...

Perceptual Wavelet Coding and

Quality Assessrnent

for S t il1 Image

Shu-Yu Zhu

-1 thesis submitted in conforrnity with the requirements for the degree of Master of Applied Science

Graduate Department of Electrical and Cornputer Engineering University of Toronto

@ Copyright by Shu-Yu Zhu, 2000

National Library 1*1 .Cam,, Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services seMces bibliographiques

395 Wdiington Street 395, rue Wellington Ottawa ON KIA ON4 OttawaON K1A ON4 Canada Canada

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seii reproduire, prêter, distribuer ou copies of this thesis in microfom, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film, de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesisnor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author' s ou autrement reproduits sans son permission. autorisation.

Perceptual Wavelet Coding and Quality Assessrnent

for S t il1 Image

Shu-Yu Zhu

Master of Xpplied Science, 2000

Graduate Department sf Electiical and Computcr Enginccri~g

University of Toronto

Abstract

This thesis investigates the use of human perceptual models to iniprove the performance

of image conipression and to provicie objective quaiity nieasures niore rrieariirigful than

the traditional mean-square-error (SISE) or the peak signal-to-noise ratio (PSNR).

.-\ perceptual wavelet coder is developed to satisfy a wide range of reqiiirements. from

perceptually lossless quality to hi& compression ratio. A perceptual mode1 is designed

to allow the coder to allocate bits for each subband based on minimizing the overall

perceptual distortion. An option is included to allow region of interest be compressed

at desired perceptual quality. The coder achieves scalability by multi-rate quantizat ion

and entropy coding. Results demonstrated better performance of the perceptual wavelet

coder than JPEG coder and wavelet coder without perceptual model.

h vision model is implemented for perceptual quality assessment. The model showed

fairly accurate prediction about where the distortion is more noticeable by liuman eyes.

The numerical measures generated from the model shon~ed more accurate assessment

than MSE and PSBR

Acknowledgement s

This thesis would not be possible without the help of many people. to whom 1 would like

to express my appreciation here. First 1 would like to thank Professor Lénetsanopoulos

for introducing me to the area of multimedia and for providing me with the resources

and guidance for my research. I woiild also like to t hank Professor Plataniotis for his

insight fiil advises and suggestions. This research is affiliatecl wit h MAS corporation.

and I would like to thank Dr. Samuel Zhou for giving me the chance to work on intliistry

projects and for always being encouraging and helpful.

hly two years of Master would not have been as interesting without the friendship of

the people in the Communications group. 1 would especially like to thank Salima. Eddy.

Li-Wei. Ryan. Wing-Chung, and Kelvin. Your fun-loving spint and warmheartedness

always cheer me up. I ~vould also like to thank my friend John. whose support and good

will has almays been my source of motivation.

Finally I would like to thank ml- parents, to whom this thesis is dedicated. I would

not be where 1 am without them. Yoii are the greatest parents in the world.

Contents

Abstract

Acknowledgment s iii

List of Tables

List of Figures

vii

INTRODUCTION 1

1.1 Significance of the Researcli . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 . Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Research Goals and Directions . . . . . . . . . . . . . . . . . . . . . . - . 3

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " 3

Background 6

2.1 Overview on Image Compression . . . . . . . . . . . . . . . . . . . . . . 6 -

2.1.1 Lossless Compression . . . . . . . . . . . . . . . . . . . . . . . . . I

2 . 1 . Quantization . . . . . . . . . . . . . . . . . . . . . . . - . - . . . 9

2.1.3 Lossy Compression . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Wavelet Transform and Image Compression . . . . . . . . . . . . . . . . 12

2.2.1 Space-Frequency Localization . . . . . . . . . . . . . . . . . . . . 13

2.2.2 The Continiious Wavelet Transform and Wavelet bases . . . . . . 15

2.3.3 'ilultiresolution hnalysis and Filter Banks . . . . . . . . . . . . . IS

2.2.4 Cornmon Wavelet Schemes . . . . . . . . . . . . . . . . . . . . . . 25

Human Visual System and Perceptual .\ Iodel . . . . . . . . . . . . . . . 29

'2.3.1 Quality Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.2 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

2.3.3 Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Contrast Sensitivity 33

2.3.5 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

. . . . . . . . . . . . . . . . . . . . . . 2.3.6 LEulti-resoliition Structure 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . '2.3.7 Error Summation :39

2.3.5 Psychovisual Validation . . . . . . . . . . . . . . . . . . . . . . . 40

3 Perceptual Image Codec 42

3.1 Overview of the Algori t hm . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -44

3.3 Perceptual Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.1 Cont ra t Threshold Function . . . . . . . . . . . . . . . . . . . . -41

3.3.2 Luminance and Texture Masking . . . . . . . . . . . . . . . . . . 49

3.3.3 Contrast Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3.4 Perceptual Distortion hletric . . . . . . . . . . . . . . . . . . . . . 52

3.3.5 Region of Interest Quantization . . . . . . . . . . . . . . . . . . . 53

3.4 'ilulti-layer Quantizer and Entropy Coder . . . . . . . . . . . . . . . . . . 33 C I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Experimental Results s c

4 Quality Assessment Using Vision Model

4.1 Vision Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1.1 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.12 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.1.3 Bandpass Contrast Responses . . . . . . . . . . . . . . . . . . . . 73

4.1.4 Oriented Responses . . . . . . . . . . . . . . . . . . . . . . . . . . 74

-- 4.2 Transclucer . . . . . . . . . . . . . - . . . . . . . . . . . . . . . . . - . . . 1 a

1 Distance bletric . . . . . . . . . . . . . . . . . - . . . . . . . . . . 76

-- 4.3 Experimental Resiilt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i i

5 Conclusion 84

5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Appendix A

Bibliography

List of Tables

2.1 Examples of Visual Resolution for Various DispIays[41] . . . . . . . . . . 32

3.1 Filter Coefficients (al1 coefficients start at zero. for biothorgonal filters. the

first row is analysis filter and the second row is synthesis filter. and they

are symrnetric about zero) . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Bais Fiinction Magnitudes .ii,e for 6 levels of an Antonini 917 DWT . . 49

List of Figures

. . . . . . . . . . . . . . . . . . . 2 Block Diagram for Lossy Compression 11

2.2 Space-Frequency Localization for (a) local Fourier bases . and (b) wavelet

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bases[35] 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 CVaveletFunctions 11

. . . . . . . . . . . 2.4 The Elernentary Haar Scaling Function and Wavelet 20

2.5 A Two-Channel Perfect Reconstruction Fiiter Bank . . . . . . . . . . . . 3 -- *>

. . . . . . . . . . . . . . . . . . . . 2.6 An 1-D Octave-Band Deconiposition 23

. . . . . . . . . . . . . . . . . . . . . 2.7 A 2-D Octave-Band Decomposition 23

. . . . . . . . . . . . . . . . . . 2.8 Parent-Child Dependencies of Subbands 27

. . . . . . . . . . . . . . . . 2.9 Point Spread Function of the Human Eue[-LZ] 33

. . . . . . . . . . . . . . . . . . . 2.10 Sensitivity of the Three Types of Cone 34

. . . . . . . . . . . . . . . . . . . . . . . . 2.11 Brightness Adjustment Curve 38

. . . . . . . . . . . . . . . 3.1 BIock Diagram of the Perceptual Image Codec 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Subband Indesing (1. O ) 45

. . . . 3.3 Estimated Threshold for Y(bottom). Cr (middle) . and Cb (top) [-LI] 48

3.4 Xonlinear Transducer Funct ion . . . . . . . . . . . . . . . . . . . . . . . 52

- - . . . . . . . . . . . . . . . 3.5 Rate-Distortion Cun-es with Optimal Solution 3s

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Bit .A Ilocation Tree 56

3.7 Original 'lenna' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.8 Original *flower0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Original 'pepper' 59

3.10 Original 'baboon' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.11 PSNR Cumes for AI1 Three Coders . . . . . . . . . . . . . . . . . . . . . 60

3.12 Reconstructed 'flower' Csing JPEG Coder (90: 1) . . . . . . . . . . . . . . 63

3.13 Reconstructed *flower' Lsing Perceptual Wavelet Coder (90:l) . . . . . . 63

3.14 Reconstructed 'baboon' Csing JPEG Coder (809) . . . . . . . . . . . . . 64

3-13 Reconstructed 'baboon' Csing Perceptual Wavelet Coder (8O:l) . . . . . 64

3.16 Reconstructed 'lenna' Csing Basic iVavelet Coder (50:I) . . . . . . . . . 65

3.17 Reconstructed 'ienna' Csing Perceptual Wave1t.t Coder (50:l) . . . . . . . 65

3.15 Reconstructed 'ffower' Lsing Basic Uavelet Coder (90:l) . . . . . . . . . 66

3.19 Reconstructed ?lower' Csing Perceptual Wavelet Coder (90:l) . . . . . . 66

3.20 Reconstructecl 'pepper' Csing Basic CVavelet Coder (8O:l) . . . . . . . . . 67

3.21 Recoostructed 'pepper' Using Perceptual Wavelet Coder (S0:l) . . . . . . 67

. . . . . . . . 3.22 Reconstmcted 'baboon' Using Basic C h e l e t Coder (80:l) 6S

3.23 Reconstriicted *baboon' L'sing Perceptual Wavelet Coder (80: 1) . . . . . 68

3.24 Reconstructed 'lenna' Using Percept ual Wavelet Coder Wit hout ROI(S0: 1) 69

3.25 Reconstructed 'lenna' Gsing Perceptual Wavelet Coder With ROI(80:l) . 69

4.1 Flow Diagram of the Vision XIodelj201 . . . . . . . . . . . . . . . . . . . 71 ..

4.2 Distance hIap for 'lenna' . . . . . . . . . . . . . . . . . . . . . . . . . . . i i

4.3 Distance Map for 'flowert . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.4 Distance Map for 'pepper' . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.5 Distance PIap for 'baboon' . . . . . . . . . . . . . . . . . . . . . . . . . . 80

1.6 4Iean Distance Measure from the Distance bIap . . . . . . . . . . . . . . 81

4.7 SLauimurn Distance Ueasure from the Distance Map. . . . . . . . . . . . 82

4.8 Histogram Bin (90% of Slauirnum) from the Distance Uap . . . . . . . . 53

Chapter 1

INTRODUCTION

1.1 Significance of the Research

Over the past few decades. an increasing number of muit iniedia applicat iotis errierging

rapidly wit h the advent of digital communication and internet. Efficient representation

of digital signais becomes an enabling technology in this new digital era. Of the various

types of data transferred over networks. image comprises the bulk of the traffic. It is

currently estimated that image data transfers takes up over 90% of the volume on the

intemet[35]. The ability to store. transmit and process digital images is usually limited

by disk space. available bandwidth. and processor speed. Even with the tremendous ad-

vancement in computer hardware and network systems. it is still impractical to deal with

uncompressed images. For instance. to transmit one second of HDTC' (high definition

TV) video sequence. a t a resolution of 2k x lk with 24 bits/pirel and 30 frames/sec. it

takes about 1.3 Gb. In another word, Nith a conventionai modem of 56kb/s. it would

take more than seven hours to transmit one second of such tldeo. And the most re-

cent DVD (digital versatile di&) of 5GB can only hold about 3 seconds of the video.

Other applications such as video conferencing, remote sensing, medical imaging, facsirn-

1.2. Review of Preuious CVurk

ile transmission. digital camera. internet transmission and browsing. etc. al1 rely upon

compression.

Traditionally. compression is rnainly used in telecommunication applications where

the emphasis is placed on high compression ratio. Today. compression is also used in

many commercial products. such as interactive HDTV. graphic arts archives. etc.. in

which image qiiality is the main consideration. Since human is the ultimate observer

of the images. compression algorithms taking advantage of hurnan vision mocicls allow

allocation of bits to signais that are rnost meaningful to the hurnan visual system (HVS).

and thiis lead to better perceptual quality. Vision mode1 is also helpful in image quality

assessrnent since the t raclit ional masures of image quality, mean square error (MSE). and

signal-to-noise ratio (PSNR) do not reflect the subjective perception of human observers.

1.2 Review of Previous Work

Subband coding has at tracted a lot of attentions in recent years because it provides better

performance than DCT-based methods. and it is able to achieve full scalability. The idea

behind subband coding is to decompose signal into frequency subbands that cün then be

encoded either independently or jointly. The decomposition leads to energv conipaction

and therefore with careful design of quantization. the subband coefficients can be greatly

compressed. providing spatial and bitsream scalability naturally. Earlier work on subband

coding for image compression can be found in [19][45]. Shapiroos embedded zerotree

wavelet (EZW) coder[30] demonstrates that fully embedded codes can be generated using

subband coding. EZW is also amoog the first to exploit the relationship between parent

and child subbands. It inspired several embedded subband coders. The set partitioning

in hierarchical tree (SPIHT) algorithm[29] further exploits the parent-child relationship

and zerotree stmct ure.

1.3. Research Goals and Directions

Vision science was first introduced to image processing in the late 7O's[l5][16]. How-

ever. since at that time the knowledge of the HVS was limiteci. the models coulcl not

interpret human perception very well. Recently several objective quality assessrnent al-

gorithms using vision models have been proposed[3][9][41]. Safranek ancl .Johnston[-S]

int roduced a quantization noise detection mode1 in subband coding. A similar approach

was taken by LVitson for DCT-based coding[-lO]. -1 review on perceptual optirnization

schemes can be found in[32].

1.3 Research Goals and Directions

One of the main objectives of this research is to develop an image compression scheme

that c m be used for a wide range of applications. from high-end cornniercial applications.

where no distortion can be tolerated. to internet ancl wireless types of applications. wtiere

bandwidth is highly restricted. Therefore the compression scheme should be able to

achieve bot h percept ually lossless quali ty at medium compression ratios and reasonable

perceptual quality a t very high compression ratios. It should also provide scalability. a

highly desirable property for applications that require image transmission. With these

requirements in mind. wavelet coding stands out as a good choice for our scheme due to

several reasons: it can achieve much higher compression ratio than DCT-based methods

witliout completely distort the image: it can generate fully embedded codes so that

scalability can be achieved easily: and it provides efficient ways for incorporating human

percept uai models in the compression scheme because bot h rvavelet t ransform and HVS

are well localized in space and frequency. Although some studies show that wavelet

coding does not perform as well as the popular JPEG scheme a t low compression ratios.

by incorporating a good perceptual model? the wavelet coding scheme has the potential

to outperforrn JPEG scheme even a t Iow compression ratios. Therefore. in our study we

1.3. Research Goals and Directions 4

design a perceptual model t hat include some of the well-kno~vn visual characteristics. In

some proposed perceptual compression schemes. the perceptual model iised in the encoder

is duplicated in the decoder to extract the weighting factor appliecl to each coefficient.

The disadvantage of these approaches is that embedded coding c m not be achieved

since the coefficients CO be decoded are interdependent. which prevents layered coding.

Therefore. to efficiently incorporating the perceptual niodel in our compression scherne

without sacrifice the embedded coding. a weighting factor for each subbancl coefficient is

generated according to the perceptual importance of that coefficient. Then the perceptual

distortion can be calculated so that the quantization steps for each subband are assignecl

in a way to miaimize the perceptual distortion instead of the mean square error. Since

the codec design is asynimetric. the decoder does not need to know the perceptual model

to reconstruct t lie image. a general-purpose wavelet decoder wit hout adclcil cornplexity

can be used. This h a s an advantage in applications where real-tinio decotlirig is recluired.

In some applications. users may be interested in a specific region of the irnage aiid nould

like it to maintain a certain level of qualit- Therefore. an option is added to our coding

scherne so that the region of interest can be encoded separately to satisfy n perceptual

quality.

Since an accurate image quality assessrnent is important in designing and evaluating

compression schemes. another objective of our research is to investigate the use of vision

model in image quality assesment. The vision model used for quality assessrnent is also

based on the fundamental human visual characteristics. but it can be more elaborate than

ones used for compression since there are less constraints such as cornplexity. scalability,

and compatibility with coding modules. ÇVe would like to first establish the validity of

using vision model in e d u a t i n g image qudity, then we would try to determine some nu-

merical measures for assessing image quality in place of -VISE and PSNR. This model will

also be used to evaluate the images compressed using our perceptual wavelet coder. The

visual discrimination model developed by David SarnofF Research Center(701 is chosen as

the base model for Our implementation.

1.4 Thesis Organization

'PI ~riis Liirsis is urgaii izd dj ~VIIÛVS. Chapter 2 prüiidcs s o x c backgroucd on image ccm-

pression. Wavelet theory. and human visual system. The chapter first gives an overview

on image compression. then it explains some of the fundamentals and principles of

Wavelet theory and hon CVavelet techniques can apply to image cornpression, and finally

it summarizes some of the human visual properties and how they can be incorporated in

compression and quality assessrnen t . Chapter 3 int roduces a perceptiial wavelet scherne

for image compression. A detailed description of each component is given. and the ex-

perimental results are presented. Chapter 4 presents a vision nioclel for iniage quality

assessment. This model is used to evaluate the images from Chapter 3 and the results are

presented. FinaII- Chapter 5 sunimariaes the results and contributions of the research.

and suggests some possible future directions.

Chapter 2

Background

Perceptual coding of images involves three major fields of research: digital signal pro-

cessing, information theory and vision science. This chapter provides the relevant back-

ground Erom al1 three fields in the context of image compression. An overview on image

compression is given in Section 2.1. followed by a review on wavelet coding in Section 2.2.

and finally Section 2.3 summarizes various human visual propert ies and percep t ual mod-

els used in image compression and quality assessment.

2.1 Overview on Image Compression

The ultimate goal of compression is to reduce the number of bits needed to represent the

signal. Data compression techniques cari be divided into two categories: lossless com-

pression and lossy compression. Lossless compression implies perfect reconstructability of

the original image. It ha. evolved from the practical application of the theoretic work of

Shannon and others on probabilistic view of information and its representation. Lossless

compression is a highly mature field. and only incremental improvements are achieved

in recent times. The limitation of lossless compression is its low achievable compression

2.1. Ovenn'ew on Image Compression

ratio. For a typical natural image. one can expect a compression about 2:l. At the

very high end, lossless can achieve a ratio of 4:l. Since this compression ratio is not

acceptable for most applications. researchers have look seriously at lossy compression in

recent time. Lossy compression can offer orders of magnitude greater compression than

lossless compression. The goal with lossy compression is to achieve indiscernible loss as

interpret ed subject ively by end users.

hl1 compression techniques rely on two features in an image to achieve reduction:

redundancy and irrelevancy. Lossless compression relies only on the redundancy feature

of data. esploiting unequal symbol probabilities and symbol predictability Lossy com-

pression relies on the additional feature of data: irrelevancy. A large amount of data can

be eliminatecl tliis way without significant subjective loss.

2.1.1 Lossless Compression

Lossless compression is a branch of information theory. It is part of the soiirce coding.

In source coding. a given soiirce can achieve its entropy with an optimal code. Suppose

a source is consists of a set of symbols x,? i = 1. ....V whose probability of occurrence is

given by pi, the eutropq- of this source is defined as:

Such a source can be encoded with H bits/sample. permitting a reconstniction of

arbitrarily small error. Though this theory can only be achieved for infinite strings of

s-=bols, the entropy measure serves as a benchmark and target rate for lossless com-

pression. .. brief review of some lossless compression techniques is given in the following.

2.1. Overview on Image Compression

DPCM

Different ial Pulse Coded hIodulation tries to rernove the redundancy in data by predic ting

the data saniple from its neighboririg samples and coding the prediction error:

where o[i] are the prediction coefficients. which can be derived by regression analusis.

The prediction coefficients can also adapt to data by learning - this is called Adaptive

Huffman Coding

Huffman coding c m achieve entropy of a source on a symbol bais . We first need to know

the probabilities of occurrence For each symbol in the source alphabet. and ortier them in

ascending order. Each symbol is placed at a separate leaf node of a tree. Then ive nierge

the two nodes with the smallest probabilities into one node. with the two original nodes

as children. The parent node is assigned the sum of the probabilities of the children

nodes. The children nodes are labeled as O and 1. This procediire is then iterated until

there is a single node which has al1 original leaf nodes as its children. The new bina-

symbol for each leaf node can be read sequentially down from the root node. And these

symbols are distinct by constriction. Huffman coding requires p ior knowledge of the

probabilities of occurrence of various symbols. which in practice are not usually available

a priori. Inferring the syrnbol distributions is therefore the first task of entropy coder.

Arit hmetic Coding

Huffman coding is Iimited to assigning an integer number of bits to each s-ymbol. This

mechanism is rate efficient on- if the symbol probabilities are exactly reciprocal powers of

2.1. Ovenn'e*w on Image Compression

two. In al1 other cases' the system fails to achieve the entropy rate. Non-integer number

of bits can be assigned to a symbol by encoding long strings simultoneously. Symbols

c m be grouped into substrings which have densities approaching the ideal. Arithrnetic

coder assigns real-valued syrnbol bit rates to achieve a compact encoding of the entire

message directly. Subinterval of the unit interval [O, 11 are used to represent syrnbols of

the code. The first letter is encoded by choosing a corresponding subintervals of [O. 11.

The length of the subinterval is chosen to equal its expected probability of occurrence.

Successive symbols are encoded by expanding the selected subintervals to a unit interval.

and choosing a corresponding subinterval. At the end o l the string. single element of the

designated subinterval is sent as the code almg with an end-of-niessage qrnbol. This

scheme can achieve virtually the entropy rate since non-integer bits c m be assigneci to

each symbol.

Run-Length Coding

Through appropriate transformation. significant compression c m be obtained. Some

types of transformation can produce long runs of a single symbol (i.e. 0). and it is very

useful to code the runs of '0' by a symbol representing the length of the run. This is the

Run-Length Coding, and it is ernployed in most of the compression methods with high

compression ratios. Fax data is by nature suitable for such type of coding. Other types

of image require some clever transformation after which rnost of the data has very little

ener,l.

2.1.2 Quant izat ion

Quant ization is essentially a mapping from a continuously parameterized set to a discre te

set, and dequantization is rnapping in the reverse direction. Quantization is a non-

Z . I . 0uenn'e.w on Image Compression

reversible process. When the elements of the continuously parameterized set are single

real numbers (i.e. scalar). they c m be represented by integers - this is calletl scalar

quantization. -4 scalar quantizer is simply mapping from R to 2. The purpose is to

permit a finite precision representation of data. When the elements are points in a vector

space (i.e. vector). they can be represented by one of a fixed discrete set of vectors - this

is called vector quantization. Vector qiiantization is the process of discretizing a vector

space by partitioning it into cells. and selecting a representative from each cell. It can

take blocks of data. and assign codewords to each block. Since codewords are available to

the decoder. only indes of the codewords needs to be sent. Vector quantizers demonstrate

good performance in practice. however. t heir main drawbacks are the extensive training

required to select the codebook. and the slow encoding process required to search for the

best codeword. béctor quant ization is useful in applications nhere ericoding time is not

important. such as CD applications.

Since a discrete representation is used to represent what originally is a continuous

variable. we need to measure the degree of distortion in the representation. The most

common measiire is the mean-squared error:

Once a quantitative rneasure of dis tort ion and al1 constraints are given. one can formulate

a quantization approach t hat minirnizes the distortion.

2.1.3 Lossy Compression

Lossy compression usuallv consists of three main blocks: transform, quantizert and en-

coder. as shown in Figure 2.1. The transform decorrelates image data and compactifies

t heir spectral energv. The quant izer allocates bit precision to the t ransform coefficients.

and the encoder converts the quantized data into syrnbols for transmission. The well

2.1. Ovemie w on Image Compression

known JPEG compression standard is described below as an illustration of lossy com-

pressiori.

Figure 2.1: Block Diagram for Lossy Compression

JPEG

JPEG is a well-knomn standard for still image compression[26]. It is basecl on a wiclely

used linear transform. discrete cosine transform (DCT). DCT belongs to a niore general

class of liarhunen-Loeve decomposition. and it offers a good compromise betmeen energy

compaction and computational cornplesity. The 2-D DCT pair can be espressed as

follows:

for rn? n, k.I = O, 1, ...: !\* - 1.

The ISO JPEG standard is specified as follows:

The image is divided into 5x8 blocks, and DCT is perfomed on each block.

2.2. Wavelet Transform and Image Compression

The resultant coefficients are quantized using uniform quantization whose step sizes

are specified according to a predefined quant izat ion matrix. This quant izat ion

controls the bit rate of the compression as well as the image degradation. The DC

coefficient receives the highest quantization precision because it contains most of

the block's energy. The AC coefficients are quantized more coarsely. Htirnan visual

system c m be incorporated in t his step by cletermining appropriate quant ization

stepsizes to match contra t sensitivity function.

a The quantized coefficients are scanned in zig-mg order to forni 1-D data string. on

which run-Iength coding are performed. And finally. the. are entropy coded using

ei t her Huffmsn or arit hmet ic coding.

2.2 Wavelet Transform and Image Compression

CVavelet image compression has become a focus of research over the p s t few years. The

advantage of wavelet transform in image processing over the other transfornis. such as

Fourier Transform. is that the transform is well-localized in both space ancl frequency

domains[6][8] [22] [37]. This feature allows wavelet compression to achieve bet ter coding

gain over DCT-based methods because it leads to some ciramatic simplification in image

statistics. especially for non-stationary natural images. Since HVS is also well-localized in

both space and frequency domains, wavelet compression has the advantage to incorporate

t hose properties efficiently.

Wavelet coding belongs to a more general class of subband coding. The prïnciple of

subband coding is to hierarchically decompose an image. usually with a two-channel anal-

ysis/spthesis filterbank, until the energy is highly concentrated in the lowest frequency

subband. Tremendous energy compaction c m be achieved this nray7 wvhich is cmcial in

2.2. Wauelet Transjorm and Image Compression

image compression. Among subband filters for image coclingo biort hogonal mvelet pl-s

a prominent role. The search for best filters among t housands of wavelet filters has lead

to discoveries of several suitable biorthogonal filters for image cornpression[l0][21] [35].

This section gives a hrief review of CVavelet transform theory and its application to

image compression. -4 review of wavelet filter design is also provided. And finally two

common wavelet coding techniques are depicted.

2.2.1 Space-Frequency Locaiization

While space and frequency are usually viened as two different domains. it is often valu-

able to represent signals in a iinified spüce-frequency plane. This is especially true for

image where the Frequency contents vary with the spatial location. The reason wavelet

transform generated great interest in image processing is that it provides space-frequency

localization, allowing simultaneous representation of image in both space and Frequency

Before Our discussion on space-frequency localization of wavelet transform. let ils first

consider another aidely used transform. the Fourier Transform. The 2-D Discrete Fourier

Transform (DFT) pair is defined as:

While DFT gives a frequency representation of an àIiN image, it does oot provide any

spatial information. Vice versa, the inverse transform provides complete spatial charac-

terization of the image, but it contains no frequency contents.

The general approach to develop a simultaneocis space-frequency representation of

a signal is to use window functions that isolate a segment of the signal of arbitra-

2.2. Wauelet Transfonn and Image Conrpression 1-4

length? and perform frequency analysis on that segment[22]. The entire signal c m be

represented by translating the wiodorv function. If Fourier Transform is used Cor the

frequency analysis, the Short-Time Fourier Transform can be defined as:

For a nindow function localized around the tirne point s = t. this process essentiaiiy

cornputes the Fourier Transform of the signal in a small wiridow around the point t . A

basic fact about the Short-Time Fourier Transform is that it is invertible: f(t) c m be

recovered from Sl(t. d) using the following inversion formula:

Eqiiation 2.9 holds as long as the window function d is any nonzero function in

L2(R) space. It is clear that the performance of this space-Frequency analysis depends

on the choice of the windom function. Ideally. we would like the analysis to be able to

discriminate between any tivo frequency components and between an- two pulses in the

space clornain. This is. however, not possible according to the following fundamental

t heorem[l3], Heisenberg's Inequali ty :

where Ar is the spatial resoliition, and 3 w is the frequency resolution. The uncertainty

formula in ( 2.10) applies to any nonzero function in L2(R) , and it states that the standard

deviation in the space and frequency domains cannot be arbitrary. Rather. there is a

trade-off between spatial and frequency resolution. Furthermore. the equality in 2.10

holds if and onlp if the function is among the set of translated. modulateci and dilated

Gaussians:

2.2. CVauelet Transjom and Image Compression

The result suggests t hat Gaussians. under the influence of translations. modulations, and

dilations. form the elernentary building blocks for decomposing signals. Effectively. t hese

signais are individual packets of energi that are as concentrated as possible in space

and frequency. Another attractive property of the Gaussian function is tliat its Fourier

Transform is also a Gaussian function. Therefore. Gabor proposed the use of Gaussian

window since it is smooth in both space and frequency domains and it offers the best

compromise between spatial and frequency resolutions.

.-\lthough the above window method achieves simultaneous space-frequency represen-

tation. it has some drawbacks for image compression. One of them is that l x and

remain constant in the analysis. This is rather inflesible when a higher frequency res-

olution or spatial resolution is desired. Xlso. the constant frequency resolution is not

compatible with models of humaii visual system. Research has shown that the frequency

resolution is inversel- proportional to the center frequency for HVS[S]. Therefore. it is

more desirahle to have high frequency resolution at low frequencies. and low frequency

resolution at high frequencies. Another disaclvantage of the above method is that Gaus-

sian functions do not form an orthonormal bu i s in L 2 ( R ) . In another word. they are

highly redundant. and esqmnsion of a signal in these functions generally leads to an in-

crease in the sarnpling rate. rhich works against compression. -4s Nil1 be explaineci in

the following sections. wavelet analysis. while using the same principle as the window

method. is able to resol~es these issues. Figure 2.2 illustrates the two different types of

space-frequency localization wit h Fourier bases and wavele t bases.

2.2.2 The Continuous Wavelet Transform and Wavelet bases

An alternative interpretation of the Short-Time Fourier Transform in 2.8 is that the signal

f (t) is decomposed ont0 translated and modulated versions of the mindow function @(x).

2.2. LVuvele t Transfirm and Image Compression

l'iriic- Freqiiency Locdiza tion

f req

Figure 2.1: Space-Frequency Localization for (a) local Fourier bases. and (b) wavelet

bases[33]

In ana1og-y to the Short-Tirne Fourier Transform. the Continuous \Cavelet Transform.

CWT. is a decomposition of a function onto translated and dilated versions of some

basic function (" mot her rvavelet" ) ~ ( x ) . More specifically. each wavelet func tion can be

expressed as:

The parameter u is the amount of translation and the parameter s is the scale factor.

which controls the size of the analysis window. When the scale factor is small. the de-

tailed. non-stationary behavior of the signal can be captured. and as the scale s increases.

the impulse response of cit,?. spreads out in space and detects the global behavior of the

signal. Figure 2.3 gives an example of a mother wavelet function and two of its dilated

versions.

The wavelet decomposition can be expressed as the inner product of function f (x)

2.2 CVaveZet Tram f o m and Image Compression

(a) s c 1 (b) Mother Wavelet (c) s > I

Figure 2.3: PVavelet Functions

with the wavelet bases in L2(R):

1 x - U CWT(a. u) =< f (r) . us,. >= - / f / x ) u ( - ) d x

fi Y

And given some mild coriditions on the analyzing function s * ( x ) : that it is absoliitely inte-

grable and has niean zero. there is an inversion formula availablc. j(x) can be representecl

cas:

I x - u 1 /(r) = I /m &wT(~. u)-c(- ) 7d.sd~u

% -Co s s -

As evident in Equation 2-13? the wavelet transform can also be thought of as applying a

bandpass filter with impulse response $ @(y) to the input signal f (r) a t location ~(61.

This interpretation is important in the implementation of wavelet transform. as will be

discussed later.

Since the parameters s and u in CWT are continuous: decomposition oiito these

functions will generate a lot of redundancy. -4s we b o n . from the remarkable Sampling

Theorem discovered by Whit taker[43jy bandlimited function can be perfectly constmcted

from the values of that function a t the integer points. There is an analogous sampling

theorem in wavelet transfom. nhere a discrete set of wavelet functions can represent

f ( t ) . The finite set of wavelet functions can be obtained by discretizing the parameters

2.2. CVavelet Transfon and Image Compression

s and il. A iiseful sampling grid is the dyadic sampling grid. where the parameters s and

u take on the following val~les:

s = u = dm : m. n integer (2.13)

The wavelet funct ions become:

The next step in constructing wavelet transform is finding a function u(r) so that

c, , , (x) form a orthonormal bases. -ln orthonormal b a i s of wavelet functions was first

discovered in 1910 by Haar. His early example wm:

The function is of zero mean ancl absoliite integrable. so is an admissible wavelet for

the CWT. The Haar navelet f~inction has compact support in the spatial domain. but is

discontinuoiis iind therefore is not differentiable. .Uso. the Haar wavelet function has a

non-compact support in the frequency clomain. which leads to poor frequency localization.

It is desirable to find wavelet fiinctions that have compact support in spatial domain.

which enables an FIR filter implernentation. and that the FIR filter be regular. The

regularity means that the filter sequence conwrges to a continuous and differentiable

function rir(x)[37]. .A sporadic set of wavelet bases were discovered in the mid 80's. and

a fundamental theory developed nicely encapsulated almost a11 wavelet bases. and made

the application of wavelets more intuitive: it is called iLIultiresolution Analysis. which

will be discussed next.

2.2.3 Multiresolut ion AnaIysis and Filter Banks

Multiresolution Xnalysis (5IR-4) is a method OF decomposing image to a hierarchy of

resolution levels[22]. The t heory can be fomulated as follows[lO]:

2.2. lYavelet Transfom and Image Compression

Given a tower of subspaces Li c L2(R) siich that

5 . There exists a 6 E 1.8 such that ( d ( t - n) : n integer) is an ort honormal

basis of CL.

m Then there exists a function m ( t ) E 1; siich that u*,,,(t) = E 1 ~ ( 2 " t - n ) : m. TL integer

is an orthonormal basis for L2(R).

By aviorn 1 to 3 we are given tower of subspaces of the whole function space. cor-

responding to the full structure of signal space. In the idealized space L2(R). al1 the

detailed testure of image- can be represented. whereas in the intermecliate spaces

only the details up to a fised granularity are arailable. Asiom 4 states that the structure

of the details is the same a t each scale. and it is only the granularity that changes - this

is the rnultiresolution aspect. Finally. aviorn 5 states that the intermediate space li and

hence al1 spaces Ci have a simple basis given by translations of a single function.

A simple example. using Haar wavelet. can illustrate above auiorns. Let I.i = { f :

f[,,,,l) = constant. B integer}, with d ( t ) = r e d ~ ~ , ~ l . It is clear t hat {d ( t - n ) : n integer)

form an orthonormal basis for h. Al1 the ot her subspaces are defined from I by âuiom

4. The spaces V L o k + co are made up of functions that are constant on smaller and rn

smaller dyadic intenxls. Let p,(t) = 2 ~ 4 ( 2 ~ t - n). so that dm, is an orthonormal

b a i s for l/oo and ph is one for Li. Since 1.b c 1;. we must have that O E l i? so

4(t) = C, ho(n)4(2t - n). for some coefficients ho(n). By ort honormality of &,. we have

ho(n) =< &,, # >, and by normality of 4 = #ao me have that C lho(n)12 = 1. This

2.2. Wavelet Transform and Image Cornp~ession 20

places a constraint on the coefficient ho(n) in the expansion for d. which is often called

the scaling fiinction. Like d. the wavelet & wili also be a linear combinaticin of the dl,:

1 1 ' Thisis For the Haar wavelet. we have ho(0). h o ( l ) = z. and hL(0) . h l ( l ) = -&. -z. easy to compute from function Qoo = red[o,il direct lu. since:

Figure 2.4 shows the Haar scaling function and wavelet at level O. Functions at other

levels are rescaled version of these functions.

Figure 2.4: The E l e r n e n t a ~ Haar Scaling Function and Wavelet

The multiresolution analysis provides an intuitive approach for image processing by

dissecting image into manageable pieces. CVe can view the elements of the spaces L i , k +

CCI as providing greater and greater detail, and we can view the projections of functions

to spaces \i. k + -cc as providing coarsening with reduced detail. From the axioms: we

can show that the subspace Ci c Vl has an orthogonal complement. CV', which in turn is

spanned by the wavelets This gives the orthogonal sum I; = 1.b + iV0. Sirnilarly,

2.2. Wauelet Transfonn and Image Compression

we can get:

From the point of view of signal processing,

A-. f2-23)

if we start with a signal / E Io. we

can deconipose it as f = f - L + 6-L E Li + iL1. We c m view / - 1 as a first conrse

resolution approsirnation of /. and the differeiice 6-1 as the first detail signal. This

process c m continue. deriving coarser resolu t ion signals. ancl the corresponding detail

signals. Because of the orthogonal decompositions at each step. the original signal always

can be recovered from the final coarse resolution signal and the secpence of detail signais.

and the cleconiposition a t each step is a unitary transforni. ivliich meaiis the energ. of the

signal is prcserved. It c m be shown [Xi] that the coarse resolution signal can be obtained

by performing a lowpass filtering n i th coefficients {ho(n) }. followed by a downsample by

a factor of 2: and the detail signal can be obtained by perforrning a highpass filtering with

coefficients {h l (n) }. followed by a downsample by a factor of 2. Therefore orthonormal

wavelet bases give rise to lowpass and highpass filters, whose coefficients are given by

various dot products between the dilated and translated scaling and wavelet functions.

Conversel. these filters fully encapsulate the wavelet functions and are recoverable by

iterat ive procedures.

The idea described above can be encapsulated in what is commonly called a two-

channel perfect reconstruction filter. as shown in Figure 2.5. The filter Ho refers to the

lowpass filter. and Hl refers to the highpass filter. kVe know from the wavelet theon; above

that the highpass filter is derived from the lowpass filter, hl (n) = (- l )"ho(-n + 1). The

reconstruction filters are precisely the same filters, but applied after reversing. Therefore,

2.2. Wavelet Trnnsform and Image Cornpression

t here is only one filter to design. Ho.

Analysis Synt hesis

Figure 2.5: ;\ Two-Channel Perfect Reconstruction Filter Bank

For typical images. the detail signal generally contains liniitecl riiergy. wliile the a p

prosimation signal usiially carries most of the energy. Since energv compaction is impor-

tant in compression. this decornposit ion mechanism is frequent ly applied to the lonrpass

signal for a nimber of levels. üs shown in Figure 2.6. Such decompositiori is called a n

octave-band or Mallat decorn position. The reconstruction process t hen duplicate the

decomposition in reverse. Since the transformation is unitary at each step. the concate-

nation of unitary transformations is still unita- and there is no loss of energv in the

process. The loss in information for compression is the result of quantization following

the decomposition.

To extend this one-dimensional structure to two dimensions. as needecl in image

processing, we simply need to apply this structure to row vectors first then to column

vectors. With the lowpass and highpass filters. this results in a decomposition into

quadrants, corresponding to four subsequent channels: Iow-low, low-high' high-low. and

high-high. as shonm in Figure 2.7. Again: the low-low subband typically contains the

2.2. Wauelet Transfonn and hnage Compression

Figure 2.6: .An 1-D Octave-Band Decomposition

most energv and is decomposed several tirnes.

IMAGE

Figure 2.7: h 2-D Octave-Band Decomposition

With the filter bank structure developed, the next step is to design the filter sets

(Ho, Hl: Fo, f i } that satis. the perfect reconstruction property We will now describe a

general design approach. Refer to Figure 2.3, YovL are the output after analysis filtering

followed by downsampling, and 9 is the reconstruction after upsampling and synthesis

2.2. Wavelet Transjonn and Image Compression 24

filtering. Let us first consider the effect of downsampling and upsampling a signal:

From the above equations, Ive can derive an equation for the reconstruction signal -s:

In Equation 2.29. the term proportional to S ( z ) is the desire signal. and ive want its

coefficients to be either unity or at most a delay 2-'. for some integer I . The term

proportional to -Y(-r) is called the alias term. which we would want to set to zero. This

leads to the following t m equations:

2.2. CVu'avelet Transfonn and Image Compression 25

As mentioned before. the orthonormality of wavelets allow the design of only one filter.

Ho(:). and the other three can be derived from that:

To simplif- things, ive can let FQ(z) = HL(--). ancl Fi(:) = -Ho(-:). Thcn Equa-

tion 2.30 is satisfied. And let R ( z ) = z'~~(r)H~(r). after substituting to eqtiation 2.31.

we get:

Sote that if n e w i t e R(2) as a polynoniial in 2 . al1 even potvers must be zero. except

the coristant terni. whicli niiist be 1. The product filter c m be expresscd as:

Once R(:) is designed. Fo(z) and H&) can be obtainecl by factoring 2-'R(z). Finite

length two channel filter bank that satisfy the perfect recoristructioo property are called

the FIR perfect reconstruction Quadrature 41irror Filter (Q MF) Banks. A cletail accounts

of these filter banks can be lound in[10][36]. The? play an important role in image

compression.

2.2.4 Common Wavelet Schemes

The performance breakthrough of modern wavelet coders is due to the exploitation of

the correlation between parent and child subbands. Shapiro's embedded zerotree mavelet

(EZW) coder[30] is among the first to exploit this parent-child relationship and the set

partitioning in hierarchical tree (SPIHT) algorithm[29] further exploits the parent-child

2.2. Wavelet Transfonn and Image Compression 26

relationship and zerotree structure. 90th EZW and SPIHT are well knorvn in the wavelet

community and this section provides a brief description for both coders.

Embedded Zerotree Wavelet Coder

The EZW is an effective wavelet coding algorithm for low bit-rate cocling because it

achieves scalability and efficiency mhile retaining a fairly low complesit. It has the

property that the bits in the bit stream are generated in order of importance. yielding

a fully embedded code. The embedded code represents a seqiience of binary decisions

that distinguish an image from the 'null" image. Using an embedded coding aigorithm.

an encoder can terminate the encoding at any point thereby allowing a target rate or

distortion metric to be met. The decoder can also cease decocling at any point in a giveri

bit stream and still produce the same image tliat would have been encodecl at the bit rate

corresponding to the truncated bit stream. EZW does not require training. pre-storage

or codebooks. or any prior knowledge of the image source.

The EZW algorithm contains a discrete wavelet transform: a zerotree structure whirh

provides compact multiresolution representation of significance map: successive appros-

imation of significance coefficients: a prioritization protocol which determines the order

of importance by precision. magnitude. scale and spatial location: adaptive mu1 t ilevel

anthmetic coding which performs entropy coding.

The discrete wavelet transform used in the algorithm employs octave-band decom-

position. The filters used are based on the 9-tap symmetric quadrature mirrors (QhIF).

One important aspect of low bit-rate coding is the use of zerotree coding. After scalar

quantization followed by entropy coding, the probability of the zero syrnbol is estrernely

high a t low bit-rate. Typically a large fraction of the bit budget must be spent on en-

coding the binary decision as to whether the coefficient has a zero or nonzero quantized

value. The zerotree structure can be used to irnprove the compression of significance map

2.2. Wauelet Transfom and Image Compression 2 1

based on the hypothesis that if a wavelet coefficient at a coarse scale is insignificarit with

respect to a thresholtl T, then al1 wavelet coefficients of the same orientation in the same

spatial location at finer scales are also insignificant with respect to T. In a hierarchical

su bband systeni. ewry coefficient e t a given scale can be related to a set of coefficients

at the next scale of similar orientation. The coefficient corresponding to the same spatial

location a t the coarse scale is called the parent and al1 coefficients corresponding to the

same spatial location at the next finer scale of similar orientation are called children.

Figure 2.8 illustrates the parent-child dependencies.

Figure 2.8: Parent-Child Dependencies of Subbands

The coefficients are scanned in such a way that no child node is scanned before its

parent. An element of a zerotree is a zero root if it is not a descendant of a previous

found zerotree root. The significance map can be represented as a string of symbols kom

a 3-symbol alphabet: zerotree root, isolated zero and significant. Zerotree reduces the

cost of encoding the significance map using self-similarity. Zerotree like structures can

8.2. ÇVavelet Transform and Image Compression

be applied to other subband configurations such as DCT. wavelet packets. etc.

After zerotree coding. successive approximation quantization is appliecl. The idea

is to sequentially apply a sequence of thresholds to determine significance. Each time

a coefficient is encoded as significant. its magnitude is appended to a list. This way.

embedded coding can be iichieved. Finally atlaptive arithmetic coding is used for entropy

coding. Since there are never more than three synbols. it is very quick for adaptation

algorithm to learn and keep track of changing syrnbol probabilities.

Image Codec Based on Set Partitioning in Hierarchical Trees

An alternative implementation based on the principles of EZW using set partitionirig

in hierarchical trees (SPIHT) was developecl. It surpasses the performance of the orig-

inal EZW and the encoding and decoding are estremel- fast. Three of the iinderlying

principles of EZW are employed: partial ordering of the transformed image elenierits by

magnitude. wit h transmission of order by a stibset partitioriing algorit hm t hat is dupli-

cated at the decoder: ordered bit plane t ratisniission of refinement bits: esploit at ion of the

self-similarity of image mavelet transform across clifferent scales. The crucial clifference of

this algorithm with EZW is the way subsets of coefficients are partitionetl and how the

significance information is conveyed. Also arithnietic coding of bit s t r e m s are necessary

in EZW, and in this algorithm, the subset partitioning is so effective that the significance

information is compact enough for bina. uncoded transmission achieves about the same

or better performance. The encoding algorithm can be stopped a t any cornpressecl rate

or let run until it achieves a nearly lossless image.

One of the main features of this algorithm is that the ordering data is not explicitly

transmitted. The encoder and decoder have the same sorting algorithm so that the

decoder can recover the ordering information from encoder's execution path. The sorting

algorithm divides the set of pixels into partitioning subsets and performs magnitude

2-3. h n m n Visual System and Perceptuai hlodei 29

test on the maimiim coefficient of the subset against certain threshold. If the siibset is

insignificant . then al1 coefficients in the subset are insignificant . if the subset is significant.

then it will be partition into new subsets. This division continues until the magnitude

test is clone to al1 single coordinate significant subsets in order to identify each significant

coefficient. A set partitioning rule using ordering in the hierarchy defined by the siibband

pyramid is defined to reduce the nurnber of magnitude comparisons. The objective is

to create new partitious such that subsets expected to be insignificant contain a large

number of elements and subsets expected ta be insignificant contain only one element.

Similar to the zerotree, the spatial orientation tree naturally defines the spatial rela-

tionship on the hieraschical pyramid. Each node of the tree corresponds to a pisel. ancl

its direct tlescendants correspond to the pixels of the same spatial orientation in the next

finer level of the pyramid. The tree is defined in such a way that each node has either no

offspring or four offspring. which always forni a group of 2x2 adjacent pixels. This tree

structure is used in partitioning subsets in the sort ing algorit hm.

This SPIHT algorithm uses the principles of partial ordering by magnitude. set par-

titioning by significance of magnitudes mith respect to a sequence of octavely decreasing

threshold. ordered bit plane transmission. ancl self-similarity across scale in an image

wavelet transform. This algorithm realizes these principles in matched encoder and de-

coder? and its performance surpasses original EZW algorithm.

2.3 Human Visual System and Perceptual Mode1

One of the major limitations in digital image systems is the lack of nell-accepted image

quality metric. Commonly used error measures such as mean-squared-error (-VISE) or

peak signal-to-noise ratio (PSNR) operate on a pixel-by-pixel base and neglect the im-

portance of image content such as edges, textured regions, and large luminance variations;

2.3. Hurnan h u a l Systern and Perceptual Mode1

and the viewing conditions on the actual visibility of artifacts. \+*hile these mesures are

simple. they do not correlate well with perceived quality. Therefore in many cases. de-

signers have to resort to subjective tests in order to obtain reliable ratings for the quality

of compressed images. These tests are usually complex and t ime-consuniing, and thus

often impractical.

The missing link between the physical paranieters and the subjective viewing of the

end users ni- be established using a visual model. which allows direct psychophysical

measurement as a function of physical parameters. In response to these problems. a

number of objective quality assessments that incorporate perceptual factors have been

proposed[l2\. There is a broad range of applications for objective cpality metrics includ-

ing:

evaluating and comparing image codecs:

quality monitoring and control:

perceptual image compression and restoration.

.Jayant[l7] gives a general description on the application of perceptual quality metrics in

signal processing. Ahumuda[?] provides a summary of perceptual models for image qüal-

ity assessment. Daly[9] reviews a number of visual factors that should be incorporated

in the perceptual models. An ideal metric based on models of human visual system can

achieve consistency and accuracy in quality assessment and improve visual quality in de-

signing compression algorîthms; howver. the human visual system is estremely complex

and is still not well understood. As we acquire additional knowledge of visual factors.

it can be expected that a mode1 providing consistent performance over a wide range of

images can be developed.

This section provides a brief review on several prominent phenornenon of the human

visual system? and how they may be incorporated in perceptual models. It outlines the

2.3. Hurnan Visual System and Perceptual Mode1 31

advantages and limitations of a number of qiiality metrics. The section concludes with

the validation and evaluation of the perceptual quality metrics.

2.3.1 Quality Factors

The viewing condition and image content play an important role in quality assessrnent.

Research has shown that image quality deperids on viewing distance. ciisplay size. reso-

Iution. brightness, contrast. sharpness, colorfulness. and so on[3][44]. It is often useful

to relate these factors in visual modeling. For instance. the viewing distance is usu-

ally specified in terms of display size. One of the reasons for doing this is based on

an assurnption that the preferred viewing distance to screen height is constant. Recent

experiments show, however. that this assumption is only true for smail displays. where

the preferred viewing distance is around 6 to 7 screen heights. but the preferred viewing

distance approaches 3 to 4 screen heights with increasing display size[l]. Display reso-

lution is another important quality factor. In vision modeling. the size and resolution

of the image projected onto the retina are more meaningful measures. Given a viewing

distance d in inches. and a display resolution in r pisels/inch

resolution (DVR) u in pixeis/degree of visual angle is

the effective display visual

(2.37)

The maximum spatial frequency corresponds to the perceptual Nyquist frequenc- which

is half the display resolution: f,, = v / 3 . Sorne illustrative examples are given Table 2.1.

The optics of the eye constitute the first processing stage in the human visual system.

.4lthough the optical characteristics of each individual V a r y considerable, they are cor-

related in such a way that healthy eyes can produce sharp image of a distant object on

2.3. Human Visual System and Pe~ceptvnl Mode1

Table 2.1: Examples of Visual Resolution for Various Displays[-!l]

the retina. The retinal image is a distort.ed version of the input. and the most noticeable

distortion is blurring. Due to a wriety of phpical and geornetrical optical factors, a point

object gives rise to a retinal distribution that is bell-shaped in cross section. It is cailecl

the point spreacl function. There are a number of visual factors that contribute to the

spreading of Light. For small pupil diameters up to 3-4 mm. the point spread function

approaches diffraction limit, which is given by:

Display

Computer Display

HDTV

Low Quality Pinting

High Quality Pinting

where d is the pupil diameter, A is the wavelength of light. p is the visual angle in radians.

and J1 is the first-order Bessel function. -1s the pupil diameter increases. the width of

the point spread function also increases because the distortion due to cornea and lens

imperfections becorne large compared to diffraction effects. Current best estimation of

the foveal point spread function of the hurnan eye is proposed by Westheirner[42]:

Resolution Distance DVR

(pkeIs/inch) (inches) pkels/degree

12 1'2 15.1

300 12 60.3 - 300 12 62.8

1200 12 251.4

where p is distance in minutes of arc from the image. This function is illustrated in

Figure 2.9, and it applies to standard viewing conditions of white targets with pupil

diameter in the vicinity of 3mm.

2.3. Human Visual Systern and Perceptual iI.lodel

Figure 2.9: Point Spread Function of the Human Eye[42]

2.3.3 Color Space

After the image is projected ont0 the retina, the photoreceptors sarnple the image and

convert it to signals interpretable by the brain. There are tmo different types of pho-

toreceptors. rods and cones. Rods are responsible for vision at low light levels. and the?

can be neglected for the applications considered here. Cones are responsible for vision at

higher light levels. There are three types of cones: L-cones, SI-cones, and S-cones, and

they are sensitive to long, medium and short wavelengths, respectivelq-. see Figure 2.10.

They form the basiç of color perception.

Various color spaces have been developed for different purposes. The common red.

green, and blue (RGB) color model is used in color CRS monitors and color raster

graphies, and it employs a Cartesian coordinate system.

The hue, saturation and value (HVS) color model is user oriented, being based on the

intuitive appeal of the artist's tint, shade, and tone. The coordinate system is cylindricalt

and the subset of the space Nithin which the model is defined is a hexcone.

2- 3. Human Visual Systern and Perceptual Mode1

Figure 2.10: Sensitivity of the Three Types of Cone

The opponent color theory states that the sensations of red and green as well as

blue and yeliow are encoded in separate visual pathways[44]. The principal cornpo-

nents of opponent-colors space are black-white (B-CV). red-green (R-G), and blue-yellow

(B-Y). The B-W channel, which encodes luminance, is determined mainly by medium

to long wavelengths. The R-G channel discriminates between medium and long wave-

lengths, while the B-Y channel discriminates between short and medium wavelengthç.

The opponent-color space have an advantage in psychophysical experiments based on

opponent-color stimuli because their channels can adapt to these stimuli. which facili-

tates mode1 design and analysis.

The perceptually uniform color spaces: CIE L'u'u* and CIE L'a'b*. have also been

proposed for vision rnodels. Thep are d e h e d çuch that the Euclidean distance between

color coordinates in these spaces provides an approximation to the perceived difference.

2.3. Human Visval Systern and Perceptual Mode1 35

This can be advantageous in vision modeling since they tx-y to determine the amount of

perceiveci clifference between reference and test images.

The YtC',Ck color space is used in many standards. including PAL, NTSC. JPEG.

MPEG. etc. I t takes into account certain properties of the human visual system: the

opponent color theory? the fact that human is less sensitive to color than to luminance.

and the nonlinearity of the hunian visual system. It happens that the conventional CRT

displays also have a nonlinear relationship between signal voltage c and display intensity

1:

Applying the inverse of this function is referred to as gamma correction. Coincidentally.

the lightness sensitivity of hurnan vision is close to the inverse of the function 2.40.

Therefore. coding images in the gamma-corrected domain is not only more meaningful

perceptually. but also compensates for CRT nonlinearit ies. I "CBC', operates in gamma-

corrected domain, where Y' is luminance. CL is the difference between blue primary and

luminance. and Ck is the difference between red primary and luminance. The conversion

formula between YtCbCR and standard CIE 1931 XYZ crin be found in Appendix A.

2.3.4 Contrast Sensitivity

Contrast is a rneasure of the relative variation of luminance. It is an important concept

in human vision because we perceive light in terms of contrast rather than the absolute

luminance level. There does not exist a unique definition of contrast suitable for al1

stimuli. For periodic pattern of symmetrical deviations ranging from Lmin to Lm=?

Michelson contrast is often used:

2.3. Hurnan Visual Systern and Perceptual Mudel

For pattern with a single increment or decrement I L to an uniform background lumi-

nance L. Weber contrast is often used:

Neither of these two definition is appropriate for rneasuring contrast in complex images.

Peli[25] proposed a local band-limiteci contrast for comples images. where the image is

decomposed into a pyramid of lowpass and bandpas subbands.

assignecl to every point in the image as a Eunction of the spatial

ivhere BP,(x. y) is the bandpass image of band i' and LP&.

and a contrast value is

freqiiency band:

y ) contains the energy

below band i. This definition is in good agreement with psychoph~sical contrat-masking

experiments with Gabor patches(-51. Xotice also that the subband coding For compression

bears a resemblance to the structure used here. which means that the contrast value using

this definition can be obtained easily mith the subband coding scheme.

The minimum contrast necessary for an observer to detect the difference is defined

as the contrast threshold. Contrast sensitivity is the inverse of the contrast threshold.

Contrast sensitivity functions (CSF) are used to quantify the dependency of the con-

t r a t sensitivity on frequency of the stimuli. There are a number of estimations of the

contrast sensitivity function in literature[l][9][18]. The shape of the CSF curve can be

altered greatly by various stimulus configurations. Generally the CSF is assurneci to

be a bandpass function. Achromatic contrast sensitivity is generally higher than chro-

matic- especially for high fiequencies. The full range of color is perceived only at low

fiequencies. As frequency increases. blue-yellow sensitivity declines first, then red-green

sensitivity begins to diminish, and the perception becomes achromatic.

2.3- Humun Visual System and Perceptval Illudel

Masking refers to the phenomenon whereby the visibility of a signal is reduced due to

the presence of another signal. In the context of image compression. it is usually helpful

to regard the distortion being niasked by the original image acting as background. In

+ion mode!s7 t ~ m typer of n i ~ k i n g are d t ~ n mn+br~r l : in t rn-rhsnnd masking and

inter-channel masking. Intra-channel masking models the masking occurred between

stimuli located in the same frequency channel. It usually includes two types of masking,

luminance masking and testure masking.

The ability of human eyes to detect the magnitude difference between an object and

its background is dependent on the average value of background luminance. According to

WeberYs Law[23], if the luminance of a test stimulus is just noticeable from the surroiind-

ing luminance. the ratio of just noticeable luminance difference to stimiilus?~ luminance

is almost constant. However. due to the ambient illumination on the display. the noise in

dark areas tends to be less perceptible than that occurring in regions of high luminance.

In general, high visibility thresholds occur in regions of gray levels close to the niid-gray

luminance. A psychophysical esperirnent conducted by Safranek[2S] yields a brightness

adjustment curve: as s l o w in Figure 2.11.

Texture masking refers to the reduction in visibility of stimuli due to the increase in

spatial nonuniformity of background luminance. In many vision models, visibility thresh-

olds are defined as functions of the amplitude of luminance edge in nhich perturbation

is increased. Simple image structures such as edges or curves have only a srnall degree of

masking compared to texture regions because the observer typically has prior knowledge

of how those simple patterns look like. Homever, the visibility threshold in a texture

region may decrease as the observer becomes farniliar with the image.

Most vision models are limited to intra-channe1 masking. However, recent psy-

23. Human Visval System and Perceptual Mode1

Correction n

Figure 2.11: Brightness Adjustnient Curve

chophsical experiments suggest that maçking also occurs between channels of different

orientation[ll]. Therefore. we should take into account the inter-channel masking in

vision modeling as well.

Care must be taken in incorporating masking for perceptual coding since the mask-

ing models obtained through experiments are highly dependent on the masker and the

target stimulus. The masking threshold Ml1 vary with the stimuli's bandwidth. phase,

orientation, as well as the familiarity of the observer to the stimuli. Incorrect predictions

of masking can be the prirnary cause of failures in perceptual modeling.

2.3.6 Multi-resolution Structure

The neurons in the primary visual cortex serve as oriented bandpass filters, and they

respond to a certain range of spatial frequencies and orientations about its center values.

2.3. Human Visual Systern and Perceptual Mode1

For achromatic visual pat hways? it is estirnated that the spatial frequency bandwidth is

approximately 1 to 2 octaves and the orientation bandwidth is about 20 to 60 degrees.

The chromatic pathways are açsurned to have similar spatial frequency bandwidth. but

their orientation bandwidth are significantly larger? ranging from 60 to 130 degrees[44].

Given these bandwidths. the spatial frequency plane for the achromatic channel can be

covered by 4-6 spatial frequency-selective and 4-8 orientation-selective mechanisms. For

chromatic channels. 2-3 orientation-selective mechanisms are sufficient.

The fundamental requirement in incorporating the above visual characteristics in

vision rnodels is the joint localization in space. spatial frequency and orientation. The

design of pyramid structure with self-similar filters and d p d i c subsampling is appealing.

It has been adopted in many vision models. Again. we see that the nwele t transform

offers a natural pyramid structure for dealing with the multi-resolutional characteristics

of the visual system.

Error Summation

It is often necessary in rnany applications to use a single nurnber to indicate the image

quality. There is thus a need to integrate the 3-D distortion maps for various channels

and convert them into a scalar. It is believed that the brain integrateç information in

various channel according to

summation provides a good

where e(n) is the perceptual

rules of probability or vector summation[d-t]. The Minkowski

estimate for probability summation:

n

error a t location n. and the exponent ,$ determines the slope

of the psychornetric function near threshold. Different exponents ,8 have been found to

yield good results for different e-xpenments. ,L? = 2 corresponds to the ideal observer under

independent Gaussian noise, which assumes that the observer has complete knoivledge

2.3. Hurnan Visual System and Perceptual iWodel

of the stimuli. Higher exponents (0 = 4: 5 ) are used based on the intuition that a few

high distortions tend to draw the viewer's attention more than many lower ones.

Alternatively, the distortion can be computed locally for every pixel. yielding a per-

ceptual distortion map for better visualization of the distribution of distort ions. Such

distortion maps can help the designer to better identify problems in the encoder.

2 A 8 Psychovisual Validation

Once a perceptual mode1 is developed? it must be validated by some subjective tests.

However, the accuracy and robustness of the validation are highly depenclent on the

psychovisual experiments used.

One simple approach is to use a rating scale to evaluate a set of images as suggested

in CCIR Recommendatioii 500-3. The obscners rate each image with a scale froni 1

to 5 . indicating bad. poor. fair, good or excellent. The scores are then a n a l p d by

some statistic techniques. The limitation of this approach is that i t can only differentiate

images with relatively large differences in quali t . It can also be inconsistent with different

type of artifacts such as ringing artifacts and block artifacts. Modifications such as rating

sub-regions of the image instead of the whole image are made to this rating approach.

h pair comparison approach is often used for compression tests. A set of images corn-

pressed with different methods or a t different ratios is compared with each other. The

CCIR Recommendation 500-3 recornmends a scale from -3 to 3 corresponding to much

worse, worse, slightly worse, same, slightly better, better, and much better. The advan-

tage of this approach is that images with difFerent types of artifacts can be compared.

in high bit-rate applications, it is often important to measure the just noticeable dis-

tortion (JND) point. There is a technique to ewluate the ability of the perceptual mode1

to predict the JND point. Both the original and the cornpressed images are displayed to

2.3. Human Visual System and Perceptual klodel 41

the observers alternatively for a short period of time (Le.. 1 second), then the observes

decide which is the original image and which is the compressed image. The JND point is

typically defined as the compression point at which the observer correctly identifies the

compressed image 75% of the time. The display time and the familiarit- of the observers

to the artifacts in the images play an major role in the JND esperiments. To reduce in-

consistency, these experiments should be performed under consistent viewing conditions.

i.e., fked display tirne, with limiting number of times each image is displayed. and with

observers familiar with the type of artifacts introduced by the particular compression

process.

Chapter 3

Perceptual Image Codec

The block diagram of the perceptual image codec is shown in Figure 3.1. The decoder

is the same as in the general transform coding scheme. and is fairly straight fonwird.

The encoder is able to cornpress the image witti a specified bit rate or perceptual quality.

And the use of rnulti-rate quantization and entropy coding allows the encoder to generate

ernbedded coding so that both the encoder and the decoder can terminate at an' point

and be able reconstruct the image at a l e s bit rate. In some applications a particular

region is more important than the rest of the image. and it might be desirable to have

certain quality control and encode it with a predefined perceptual quality. Therefore the

encoder includes a region of interest (ROI) request that can be set to encode the ROI

ivith a specified perceptual quality. The rest of the image is encoded with the remaining

bit budget. The encoder contains a perceptual model which produces a weighting factor

WJND for each subband coefficient. This factor indicates the importance of each coef-

ficient in contributing to the visibility of the image. When the ROI request is set. the

perceptual model aIso outputs the quantization step for the ROI region with the given

perceptual quality. The Y'CLCR color space is selected for the codec because it is widely

used and it takes into account some properties of HVS. Each component of the encoder

3.1. Overview of the Algorithm

is addressed below, and the experimental results are give in the 1st section.

Percep tnd

a) encoder

Figure 3.1: Block Diagram of the Perceptual Image Codec

Rit

~irmrnl Dccoding

3.1 Overview of the Algorithm

The encoder is consisted of the foUoMng major functions:

lnvwse Inverse Dccodcd Image

Qumtization Transformation

3.2 CVuvelet Transfom 44

0 Tmnsfonn performs the wavelet decompostion: it tâkes the original image and out-

puts the wavelet coefficients for each subband: the parameters include the number

of decompostion levels and filter coefficents.

a HGLnnodel uses the perceptual model to calculate the weighting factor for each

ro~ffirient: it takes in the wavelet coefficients and the original image and outputs

the weighing factors and the quantization step-size for the ROI if the option is

set: the parameters include viewing distance, display resolotion. and psychovisual

parameters.

a Qvantize performs the quantization on the wavelet coefficients: the quant izat ion

stepsizes for each subband are determined by minimizing the perceptual error using

the weighting factors From the H\.Srnodel: it takes in tlie wavelet coefficients and

the weighting factors and outputs the quantized coefficients: the parameters include

smallest quantization stepsize. maximum number of quantizers and precision for

quant ized coefficients.

Coder performs the entropy coding using adaptive arithmetic coding; it takes in the

quantized coefficients and outputs the bitstream: the parameters include histogram

capacity and adaptive model.

The decoder reverse the process and it includes Decode. Dequantire, and Inverse-

Transfonn. The perceptual model is not needed in the decoder.

3.2 Wavelet Transform

The wavelet transform in the perceptual codec employs an octave-band decomposition

using a two-channel perfect-reconst ruction analysis/synt hesis fil t er bank, as s h o m in

Figure 2.6. Let level, 1. denotes the number of filter stages. and orientation. O. denotes

the four possible combinations of Iotvpass and highpass filters. The orientation is indered

as follows: {0,1.2.3} = {LL, HL: LH, HH}. Each combination of level and orientation

(1, O ) specifies a single band. Figure 3.2 illustrates this t e rn l i no lo~ using a three lerel

decomposi t ion.

Figure 3.2: Subband Indexing (1.0j

There are two factors needed to be determined in the transform stage: the number of

decomposition levels and the wavelet filterset. There is no rules in selecting the number

of decomposition levels. It usually ranges from 3 to 6 levels. After conducting some

experiments, we found a decomposition level of five is suitable for our purpose. Some of

the best known filters for image coding are included in Our codec. It includes the set of

filters evaluated by \.'illasenor[38], a linear-phase 9/7 pair from Odegard['L4]: and a fea

Dsubechies filters. Table 3.1 gives the coefficients of the filters used in our esperiments

in addition to the filters found in [38]. The performance difference between these filters

is mainly image dependent. It is possible to add a filter selection stage by applying al1

3.3. Perceptual iModel

Table 3.1: Filter Coefficients (al1 coefficients start at zero, for biothorgonal filters. the first row

is analysis filter and the second row is synthesis filter. and they are symmetric about zero) - - -- -

Filter 1 Coefficients

Antonini ( 0.85269. 0.3'7740. -0.1 1062. -0.02355. -0.03783

Villa 1 0.37528. -0.02385. -0.1 1062. 0.3774. 0.8327

Oclegard

the different filters on the image. Then select the filter that produce the least amount

of significant coefficients. rvhich often results in highest compression ratio. as suggested

in[2f 1.

0.658848. 0.415092. -0.04069. -0.06454

O.6XZ 1. 0.38697. -0.0930'7. -0.03343. 0.0523'7

3.3 Perceptual Mode1

The perceptuai model used in Our codec takes into account contr ast sensi t ivi t~ r at different

frequency subband, local background luminance and texture. and contrast masking. It

produces a JND threshold for each subband coefficient, CVJND(x. y. 1. O ) . mhere (1, O)

specifies the subband and (x. y) is the spatial location of the coefficient. This threshold

can then be used to calculate the perceptual error for each coefficient.

3.3. ferceptual iCfudel

3.3.1 Contrast Threshold Function

The contrast sensitivity threshold model used here is based on Watson's model[U]. This

intensity based model accounts for display resolution. viewing distance. and the level ancl

orientation of the snbband.

A set of psychovisual experiments performed in[41] inclicated the Following factors:

The contrast sensit ivity declines wit h increasing spatial freqiiency.

The size of the noise stimuli decreases wit h increasing spatial frequency.

The noise amplitudes are typically very close to the basis functiou amplitudes.

We can see that spatial frequency is the dominant factor in the contrast thresholcl frinc-

tion. The spatial frequency can be obtained From the effective display visual resolution.

as given in Equation 2.37. The discrete rvavelet transform operates essentially by bi-

secting a frequency band at each level. At the first level. the spatial freqiiency is taken

as the Nyquist frequency of the display resolution (Le.. @). The spatial frequency of

subsequent levels will be halved at each level. Therefore. for a display resolution of c

pkelsldegree. the spatial resolution at level 1 is:

f ( l ) = UT' cyclesldegree

The noise threshold for the luminance component is estimated as:

log kiWd = loga + k(10g f - log fogo)' (3.2)

where a = 0.495, k = 0.466, fo = 0.401. go = 1.5011 gi = g2 = 1. 93 = 0.534. The term

a defines the minimum threshold. The term go takes into account the effect of orientation

on spatial frequency. Orientation 0 = O is approximately a factor of two lower in spatial

frequency than orientation O = 1: 2. However, since the signal energy of orientation

3.3. Perceptual IlIodel 48

O spread over al1 orientations' it is less visually efficient than to concentrate them at a

narrow range, as in orientation 1 and 2. Thus go is set to be less t han 2 . For orientation 3.

the spatial frequency is about fi above that of orientation 1 and 2 due to the Cartesian

splitting of the spectrum. But since the spectrum in orientation 3 is distributed over

two orthogonal orientat.ions (45'. 135'). again the parameter g : ~ is less than fi. For

the chrominance channels Cb and Cr. the effects of spatial freqiiency and orientation are

similar to those of the 1' chanuel. Horvever. their thresholds are generally higher by a

factor of two For Cr threshold and a factor of four for Cb threshold. Figure 3.3 shows

the thresholds obtainecl throiigh esperiments for al1 three color cornponents at different

orientation and spatial frequencies.

Spatial F requency (log cyldeg)

Figure 3.3: Estimated Thresliold for Y(bo t tom), Cr (middle) . and Cb (top) [4 11

Since the noise amplitude resulting from uniform quantization is approximately the

basis function amplitude, the contrast threshold function can be computed as:

where .-llae is the b a i s function amplitudes for the popular Antonini 917 DWT: as given

in Table 3.2. The basis function amplitudes for other wavelet basis can be obtained

through experiments.

Table 3.2: Basis Function Magnitudes for 6 levels of an An-

tonini 917 DWT

Orientation Level

I 3 I 3 4 a 6 -

3.3.2 Luminance and Texture Masking

The visibility threshold due to average background luniinance and testure masking c m

be described by the following expression['i]:

where ft represents the visibility threshold due to texture masking, and fi represents the

visibility t hreshold due to average local background luminance. mg ( L . y) denotes the

maximal weighted average of luminance gradients arounci pixel @.y). which indicates the

busyness of that region; bg(x. y) is the average local background luminance. The twvo

functions ft and fi are defined as follows:

( qI(bg(xT y) - 127) + 3 for bg(x. y) > 127

a(bg(xl y)) = bg(x. y) 0.0001 + 0.115 (3.7)

3 1 where To = 17. y = A = - 2 '

3.3. Perceptual ibhdel 50

The value of mg(r? y ) is determined by performing a weighted average of the lumi-

nance changes around the pkel ( x ~ ) in four directions (O0. 43'. 90": 180°). The weighting

coefficient decreases as the distance from pixel (r ,y) increases. Four gradient operators

are used to calculate the luminance changes in each direction:

Let the pixel value at (s.y) be p(x. y): the d u e of mg(r. y) can be calculated as follows:

The average local background luminance bg(x: y) is calculated using a weighted low-

3.3. Perceptual Mode1

p a s operator:

The perceptual coefficients resulting from the luminance and texture rnasking for each

su bband are calculated as follows:

3.3.3 Contrast Masking

As described in Section 2.3.5. contrast rnasking refers to the reduction in visibility of

a signal by the presence of another. In image compressiono the signal ive want to be

masked is the quantization noise. In Our model. we consider the masking effects from

signals wit hin the same channel (intra-channel masking).

The increase in the visual threshold due to a large coefficient magnitude at the same

location in the same subband can be taken account into by an adjustment coefficient

, . It is modeled by a non-linear transducer function[34], as shown in Figure 3.4.

The adjustment coefficient can be calculated as:

where i ( x . y, 2: 6) is the subband coefficient, and c is the dope of the line in Figure 3.4,

3.3. Percep tua1 Mode1

and it depends on the distortion measure. LVe found through experiments that in our

mode1 e = 0.32 is appropriate.

Figiire 3.4: Nonlinear Transciucer Funct.ion

3.3.4 Perceptual Distortion Metric

In order to minimize perceptual distortion resulted from compression. we need to have a

perceptual distortion metric. The probability summation mode! is used in determining

Our distortion metric. The probability of detecting distortion a t the location of a subband

coefficient is determined by the psychornetric function:

and /3 is chosen to be 4. e(x. y, 1,8) denotes the quantization error a t location (s.y) and

subband (1 O ) :

3.4. Multi-layer Quantzzer and Entrop y Coder 53

The overall probability of the observer noticing the distortion at subband (1. O ) is:

It is clear that minimizing the probability of detecting a difference in the subband is

equivalent to minimizing the metric D(r,e).

3.3.5 Region of Interest Quantization

The region of interest (ROI) is quantized by a different set of cliiantizers ciefined to satisfy

certain perceptual qiiality. Since the maximum quantization error for uniform qtiantizer

with quantizer step Q is Q/2. the quantization step for each subband is calculated as:

where s is the quality adjustment factor. When .s is set to 1: the distortion is not

perceptible under the predefined condition.

3.4 Multi-layer Quantizer and Entropy Coder

A multi-layer uniform quantizer is used in our codec. This quantizer coupled with a

multi-layer entropy coder enables the embedded coding of the compressed images. Before

quantizing the subband coefficients. the quantizer step for each subband needs to be

determined for a given bit budget. This is accomplished through a rate-distortion based

3.4. Multi-layer Quantizzr and Entropy Coder

algori tlirn using integer programrning(3lj. The goal is to minimize the overall clistortion

for a given bit rate. Let Rr be the total bit budget. R, be the bit rate for each subband'

and D be the overall distortion. then ive want to find the minimum D such that:

xhcrc I< is the total n r m b ~ r cf subbands Tn qimplify things. we define t h e overall

clistortion D as:

where Di is the perceptual distortion for each subband. Given these conditions. the

optimal solution can be achieved by the ive11 known constant-siope condition. Fint.

n e define a cost funtion conibining the rate and distortion through a positive Lagrange

mu1 t iplier:

Nest. let us express Di as a function of rate. and set the derivative of the cost funtion to

zero to find the minimum D with respect a specific Ri:

Di(&) is an operational distort ion-rate function which will depend on the quant izat ion

scheme for the subband. The solution to Equation 3.24 is unique if we assume Di(&)

to be ~ont~inuous and convex. In conclusion. for a solution to be optimal. the set of

chosen rates have to correspond to constant-slope points on their respective weighted

distortion-rate Cumes. This is illustrated in Figure 3.5.

To allocate the optimal bit rate for each subband, we first construct a bit allocation

tree consisting of K subtrees corresponding to al1 of the subbands? as shown in Figure 3.6.

3.4. Multtlayer Quantizer and Entropy Coder

Figure 3.5: Rate-Distortion C u n w mith Optimal Solution

Each subtree is composed of N nodes. with each notle n corresponding to a specific point

(Ri ( n ) , D i ( n ) ) on the rate-distortion ciirve of that siibhand. ln our codec. ne loiincl thet

N=10 is sufficient. and the quantization steps at each node is set to exponcntial of two.

Le. Q(n) = 2". The rate increases and the distortion decreases when trav~rsing down

a subtree. The topmost node of a subtree corresponds to the zero-rate point wliere the

distortion is maximum. At each node n. ive also define the parameter A,(n) üs the ratio

between AD and AR, where A D and AR denote the magnitude differences of distortion

and rate between the current node and the leaf node X:

The bit allocation algorithm begins with the initial tree and obtained a series of pruned

trees iteratively. At each iteration: the node having the srnallest A&) is pruned since

this represents the best trade-off betmeen rate and distortion a t that step. The subtree

containing the pruned node has a new leaf node, and the A&) must be calculated for

al1 the remaining nodes in that subtree. At any iteration, the overall rate of the tree

3.4. hlultz-layer Quantizer and Entropy Coder

can be calculated from the leaf nodes of al1 of the subtrees. The algorithni terminates

when the total rate falls below the target rate. Each subband is then assigned an optimal

quantization steps according to the leaf nodes of the final pruned tree.

Figure 3.6: Bit Allocation Tree

The multi-rate quantization scheme used in the codec is equivalent to the successive

approximation quantization (SAQ) used in Shapiro's embedded zerotree wavelet coder.

But instead of Huffman coding, the rnulti-rate quantization is coupled with an adaptive

anthmetic coding. The multi-rate coding is achieved by progressive quantization and

coding of each subband in a sequence of N layers, representing progressively finer quan-

tization step sizes. Let us define a set of N quantizers, QI, . . . , QN, and N quantization

3.5. Experimental Results

Layers LI: . . . . L x . The symbols for quantizer QI are encoded into layer LI. while the

information necessary to recover the symbols for quantizer Q,. given that the symbols

for quantizers QI. . . . ? Q,- are known. is encoded into layer Ln. In this way. the decoder

is able to recover the subband coefficients quantized by any of the quantizers. Q,. by

decoding layer LI.. . . . Ln only. One of the objectives in this approach is for the total

number of bits to encode layers Li , . . . . Ln to be approximately the sanie as the number

of bits required to encode the output of quantizer Q.. This way the coding efficiency

is not sacrificed in obtaining the multirate property. It can be provecl that for pulse

coded modulation (PChI) coding, this coding efficiency goal can be achievecf if and only

if every quantization interval of Q,, 1,: is contained in sorne quantization intemal of

Qn-i. 1,-i[33j:

where ICn are integers. In Our codec Kn is set to two for al1 n.

Arithmetic coding is used in the multi-rate coder because it is able to approach the

information theoretical lowerbounds for encoding each laver arbitraril:; close. Huffman

coding is not a viable alternative here because it cannot realize bit rates of l e s than

one bit per syrnbol. Since the number of symbols in each layer is relatively small. the

arit hrnetic coder can adapt quickly.

3.5 Experimental Results

Four color images of size 512x512 were selected in our experiments: lenna. Bower. pepper,

and baboon. They are shown in Figure 3.7 - 3.10. These images include a wide varie-

of image features such as facial region, natural scene, texture region. bright colors. etc.

In addition to our perceptual wavelet coder (PFW), tmo more compression coders were

used for cornparison purpose: basic wavelet coder wit hout perceptuai mode1 (BWC) and

JPEG coder. The images were compressed at nine different compression ratios, from

10:l to 90:l. The perceptual wavelet coder was designed For a visual resolution of 32

pkelsidegree. That is. for a display resolution of 40 pLuels/cm (typical resolution for

cornputer monitor), the coder is optimized for a viewing distance around 45 cm.

Figure 3.7: Original 'ienna'

Figure 3.8: Original 'flower'


Figure 3.9: Original 'peppert

Figure 3.10: Original 'baboon'

The peak signal-to-noise ratio (PSSR) was obtained for each image at different corn-

pression ratios. The resulting PSXR versus compression ratio Cumes for each image with

al1 t hree coders are shown in Figure 3.11. The ûrst thing we can notice from Figure 3.1 1

is that the PÇXR for JPEG coder is consistently lomer than the mavelet codew, especially

at high compression ratios. This is not surprishg since we knorv mavelet coding generaliy

3.5. Ezperimental Results

offers better coding gain than DCT-based coding. The basic wavelet coder has a slightly

higher PSNR than the perceptual coder. This also was expected since the basic wavelet

coder is optimized to minimize the mean-square error (hISE) while the perceptual coder

is opt imized t O minirnize the percep t ual error met ric.

PSNR for Flower 180r 1

- - BWC

compression ratio

PSNR for Baboon

I - JPEG 1

compression ratio

PSNR for Lenna

compression ratio

PSNR for Pepper

- - BWC

O 20 40 60 80 100 compression ratio

Figure 3.11: PSNR Curves for .Al1 Three Coders

Let us now look at each image for subjective evaluation. First. we mil1 consider the

JPEG coder versus the wavelet coders. At low compression ratios, al1 three coders per-

Form quite well, and there are no perceptible distortions in the reconstructed images. As

3.5. Eqerimental Results 61

the compression ratio increases. the distortions from wavelet coders appear as blurriness

around edges. also known as the ringing effect. The distortion resulting from JPEG coder

takes the forni of block artifacts. wliich is generally more distracting than the ringing

effect because it gives images a disjoint look. Figure 3.12 - 3.15 illustrate the two types of

distortion using 'flower' and 'baboon' compressed by the perceptual coder and the JPEG

coder. We can see that the perceptual quality of the wa~elet coded images is better than

that of the JPEG coded images. Also. a t high compression ratios. the image quality

using JPEG compression degrades niore rapidly than that using wavelet compression.

Next we will compare the subjective performance between the basic navelet coder and

the perceptual wavelet coder. The perceptual coder generally clemonstrates less ringing

effects t han the basic coder. As evident from Figure 3.16 - 3.19. the basic wavelet produces

more noticeable distortions at the edges in both 'lenna' and .flower'. The performances

of the two cotlers on 'pepper' are similar. escept at very high compression ratio. nhere

the perceptual coder seems to be able to preserve the testure of the pepper better than

the basic coder. as shown in Figure 3.20 and 3.21. For the 'baboon'. the texture region

of the reconstriicted images differs for the t~vo coder. The basic navelet coder produces

many srnüll white holes in the texture region. which is quite risible. The perceptual

wavelet coder bliirs out the region. which tends to be less perceptible. Figure 3.22 and

3.23 dernonstrate the two effects. -4 separate set of experiments were performed for high

resolution Ibl-L'i images to achieve perceptually lossless quality. And the results show

that around the compression ratio where distortion just starts to be noticeable. it is easier

to identify the distortion in JPEG coder than in the perceptual wavelet coder. In another

word? it is possible for the perceptual coder to achieve higher compression ratio while

still maintain perceptually lossless quality. However, due to reasons of confidentiali-.

these results can not be presented here. The option of separate quantization for region

of interest at a specified perceptual quality is tested on 'lenna', and the facial region tvas

selected as the region of interest. The scale hc to r is set at one. And as the compression

ratio increases. the quality of the facial region remains the same. Figure 3.24 and 3.25

show the reconstructed images from the perceptual wwelet coder mith and without the

ROI option at high compression ratio. While the distortion is apparent in the facial

regioo in Figure 3.24. the facial region in Figure 3.23 remains perceptiially lossless under

the predefined condition. The sacrifice is that the background in Figure 3.25 is more

blurred than that in Figure 3.24.


Figure 3.1'2: Reconstructed 'flower' Csing JPEG Coder (90:l)

Figure 3.13: Reconstructed 'Kower' Using Perceptual Wavelet Coder (90: 1)

Figure 3.14: Reconstructed 'baboon' Using JPEG Coder (80: 1)

Figure 3.15: Reconstructed 'baboon' Using Perceptual Wavelet Coder (80:l)

Figure 3.16: Reconstructed 'lenna' Gsing Basic W e l e t Coder (50: 1)

Figure 3.17: Reconst ructed 'lenna' Gsing Percept ual Wavelet Coder (JO: 1)

Figure 3.18: Reconstructed ' flower' Csing Basic Wavelet Coder (90: 1)

Figure 3.19: Reconstructed *flower7 Gsing Perceptual Wavelet Coder (90:l)

Figure 3.20: Reconst nicted 'pepper Using Basic Wavelet Coder (80: 1)

Figure 3.2 1: Reconstmcted 'pepper' Csing Perceptual Wavelet Coder (80: 1)


Figure 3.22: Reconstructed 'baboon' Csing Basic Kavelet Coder (80:l)

Figure 3.23: Reconstructed 'baboon' Using Perceptual Chele t Coder (80:l)

Figure 3.24: Reconst ructed *lennaT Csing Percept ual Wavelet Coder Wit hout ROI(80: 1)

Figure 3.25: Reconstmcted 'lenna? Using Percept ual Wavelet Coder Wit h ROI(8O: 1)

Chapter 4

Quality Assessrnent Using Vision

Mode1

It is clear that an objective measure for evaluating image qualit? more accurate than

simple PSXR measure would be very helpful in designing compression algorithms. PSNR

measure often leads to inaccurate prediction of perceptual quality when coniparing two

different algorithrns because it operates on pixel by pixel basis. which is not how human

perceive images. Subjective tests are usually time-consuming. and tend to be inconsistent.

It is desirable to have a vision model that can mimic the way human perceive images

and express the quality numerically. In this chapter. we construct and investigate a

mechanistic model based on Sarnoff's visual discrimination model[20]. This vision mode1

is more elaborate than the one used in the compression schenie in the previous chapter

because it does not have some of the constraints imposed b~ the encoder. The qualitp

asessrnent using this vision model is then applied to the compressed images from the

previous chapter. The flom diagram of the model is s h o m in Figure 4.1 and each of its

components is described in the next section followed by a result section.

3.5. Experirnental Results

Figure 4.1: Flow Diagram of the Vision Mode@O]

4- 1 . Vision Mode1

4.1 Vision Mode1

The definition of mechanistic mode1 of human vision system is that each of its compo-

nents tries to model the functional response of physiological rnechanisms in the visual

pathways of the way. The vision model consists six stages, each representing a particu-

lar viwal rharart~ristirs. Earh s t a y will be acldressed below. Sirice most compression

coders distribute the distortions more or less equally between chromatic and achromatic

channels. the performance difference between a luminance-only quality assessrnent mea-

sure and its full-color extension is small[4-4]. Therefore. Ive will apply the vision model

only to the luminance component.

The input images are convolved with a function approximating the point spread function

given in Equation 2.39. The use of point spread function is jiistified by the fact that a

point object gives rise to a retinal light distribution that is bell-shaped in cross-section.

It can be viewed such that the intensity of one pisel spreads out to its neighboring pixels.

This concept is simple; we first need to calculate the physical clifference betwen pixels in

terms of visual angle. then convert the point spread function into a 2-D discrete function.

One of the concerns here may be that the point spread function is not separable. But since

its value fa11 off exponentially it is a fairly short FIR filter. and thus still reasonable to

cornpute. An additional operation is performed when the fixation depth does not match

the image depth. In this case, a blur spot will form at the retina. We need to calculate

the size of the blur circle using the distance from the exit pupil to the image surface and

the dept h information, and t hen convolve this disk-shaped convolu t ion kernel wit h t lie

image in a same fashion as the point spread function. This kernel can be combined with

the point spread function so that we on- need to perform convolution once.

4.1. Vision Madel

4.1.2 Sampling

h Gaussian convolution and point sampling sequence of operations are used to simulate

the sampling of retinal cone mosaic. For foveal viewing, the image is sampled at 120

pixels per degree of visual angle. resulting in a retinal image of 512x512 pixels. For

non-foveal viewing. the sanipling density is calculated as:

mhere e is the eccentricity in degrees. and k is set to 0.4. as estimated from psychophysical

data by Watson (391.

4.1.3 Bandpass Contrast Responses

The raw luminance signal is converted to units of local contrast. Contrast is a basic

perceptual attribute of an image. The absoiute luminance does not rnean too much

to the human eye. it is the contrast that we perceive. A local band-limited contrast

rnethod for complex image is employed here[25]. The first step is to decompose the

image into a Laplacian pyrarnid. resulting in several levels of bandpaçs signals. each level

separated from its neighbors by one octave[5]. The resulting structure is ven similar

to the wavelet subband de composition^ cxcept that a simple Gaussian filter is used for

fast cornputation. After decomposition. at each point in each level. the Laplacian value

is divided by the corresponding point upsampled from the Gaussian pyramid level two

levels d o m in resolution:

where Cr(x: y) is the contrast at pyramid Ievel1, location (x,y); ï(x. y) is the input image;

and Gi (xt y) is a Gaussian convolution kemel:

4.1 - Vision Model

where 01 = 2'-'a0. This operation results in a local nieasure of contrast. localized iri

both space and Frequency. Then a brightness adjustmcnt is addeci t o mode1 the reduced

visibility threshold at dark regions. The adjustrnent curve in Figure 2.11 is used.

4.1.4 Oriented Responses

At this point. we have a pyramid structure of contrast values. To ttake into accoiint the

orientation bandwidth of the bandpass signal. each pyramid level is convolved with four

pairs of spatially oriented filters. Each pair consists of a directional second derivative

of a Gaussian and its Hilber transform. This is a type of steeriible filter. meaning ttiat

a filter of arbitra- orientation cnn be synthesized as a linrar combination of a set of

b a i s filters[l-L]. Furthermore. the Gaussian derivative filter is separable. and its Hilbert

transform can be approximated by four b a i s functions. tvhich is also separable. Then to

have a meaningful analpsis of the local orientation. the orientation strength along each

of the four directions (O0. Xi0. 90°. 135') is calculated as the square sum of the output of

the pairs of orientation filters. resulting in a phase indeperident energy response:

where O and h are the oriented operator and its Hilbert transform. The advantage of

phase independence of the energy response is that it rnakes the mode1 less sensitive to

the exact location of an edge, a property d s o exhibited in hurnan visual system.

Each energ- measure is first normalized by a value AG. which is close to the square of

the grating contrast detection threshold for that pyramid lerei and luminance:

1 -\ft(ul> L . ~ ( x : y)) =

a - c l 0 e +buIiJ-

IV here

and cf is the peak spatial frequency For pyraniid level 1. Li (1. y) is the local luminance

used in the contrast calculation described in previous section. IL. is the clisplay d t h

in degrees. This value can bc adjusted for more robust performance. Next. a sigrnoid

non-linearity is applied to the riormalized energy measure to reproduce the clipper shape

of contrast discrimination functions. These two operations c m be cornbinecl as one scalar

operation on each energy measure:

where 71 is chosen as 1.5, and w is set to 0.07. Again. calibration of this function using

a few typical images can be performed to improve the mode1 prediction accuracy. This

function has a number of interesting properties when considering a grating stimulus of

contrast c, and frequency ui. For srnail values of c: the maximum transducer output

at level I accelerates as cn. while for a large values of' c. the function is compressive as

P. For an intermediate value of c a t the contrast detection threshold for frequency v-

the transducer output is 1. An eccentricity dependent pooling stage can be added to

4.2. Transducer

improve the performarice further. This is achievecl by averaging the transdiicer out put

over a srnall neighborhoocl by convolving with a disk-shape kernel of diameter 4 = 5 for

fovea inputs. For stimuliis oiitside the fovea. the diameter d, of this kernel increases as

linear funct ion of eccent rici tu:

where e is the eccentricity in degrees. and kp is a scaling factor. This eccentricity depen-

dent increase in pooling is iiseful in modeling the eccentricity-dependent loss in perfor-

mance.

4.2.1 Distance Metric

At this point. we have a Xl-dimensional vector for eacli spatial position of the image.

where SI is the number of pyramid levels multiplies the number of orientations. Before

calculating the distance between these vectors. we first nced to upsample each pyramid

levels to the full 512x512 size. which results in a set of II arrays P,(f). i = 1. . . . . r n for

each input image 2. Then a distance measure D between the two input images 6) and

.E2) is calculated as:

where Q is set at 2.4. The result from this stage is a 2-D distance map indicating the

perceptual difference between two images at each spatial location.

Since a single wlue is often useful in evaluating the image quality? we need to extract

a rneaningful wlue From the 2-D distance map. In practice. two different measure are

usually used: the average across the map and the maximum. ive also calculated a

histogram to get the number of points exceed 90% of the maximum wlue. This measure

may offer additional insight for interpreting the distance map.

4.3. Eqerimental Result

4.3 Experimental Result

The vision mode1 is applied to the four images in the previous chapter. The distance

maps for the four images and their reconstructed images compressed a t 20:l are shown

in Figure 4.2 - 4.5. The bright regions indicate high probability of the distortion being

wtkd hy human rvhib the dark regions indicate low probabilitv. \ lé c m see

that the distance maps are generally able to predict areas where human eyes are likely

to be sensitive to. The distance map predicts that distortion in uniforrn regions are

more perceptible thün in the texture regions. which agrees with our expectation. It also

predicts that distortion in small regions are less likely to be detected.

Figure 4.2: Distance Map for 'lenna'

Three numerical measures were extracted from the distance map: mean distance.

maximum distance, and the number of points exceed 90% of the maximum distance.

Figure 4.6 - 4.8 show the plots of these three measures versus the compression ratio

for the four images. Two observations can be made from the mean distance curves:

4.3. Ezperimental Result

Figure 4.3: Distance Map for *flower*

the first one is that the mean distance for the perceptual coder is generally lower than

the basic coder. and the second one is that the JPEG coder has lower mean distance

a t low compression ratios and higher mean distance at high compression ratios than

the wavelet coders. These results reasonably correspond to our expectations from the

subjective evaluation in the previous chapter. -1 discrepancy between the mean distance

measure and the subjective evaluation occurs for the image -baboon'. The mean distance

measure suggests that the JPEG coder consistently has a lower error than the wavelet

coders. Honiever. the reconstructed image using JPEG showed more apparent distortion

than those using wavelet coden starting a t the compression ratio 40:l. Let us now turn

Our attention to the masimum distance rneasure in Figure 4.7. Notice that there is a large

ele~ation in the maximum distance curve for JPEG coder starting a t the compression

ratio 4O:I. The reason that the mean distance cuwe for the .JPEG coder is lower than

that of the niavelet coders may be accounted for by the fact that the *baboonT has a

large texture region, and JPEG usually performs better in highly textured regions. From

4.3. Ezperimental Result

Figure 4.4: Distance XIap for 'pepper'

the maximum distance measure. ive c m see that again. the perceptual coder has lower

distance values than the basic coder. and except at a few low compression points. the

perceptual coder has lower distance values than the .JPEG coder as well. Generally.

the maximum distance measure should not be used by itself to judge the overall picture

quality. since a few points with large error might lead to inaccurate prediction. At a few

places on the curve. the maximum distance decreases as the compression ratio increases.

but we knoiv that the picture quality should decrease with increased compression ratio.

The histogram plot in Figure 4.8 does not give us too mcich information. but there is

one interesting point worth noting. There is a sharp increase in the histograrn curve

of 'baboon' for the perceptual coder at the compression ratio 90~1. From subjective

inspection, t here is a sudden quality degradation for 'ba boon' using the perceptual coder

at compression ratio 90:l. Overall. the mean distance measure is a fair assessment for

evaluating image quality, but the maximum distance measure and the histogram measure

can provide some additional useful information for more accurate quality assessment.

4.3. ExpeB'mental Result

Figure 4.5: Distance Map for 'baboon'

4.3. Eqerimentd Result

mean distance for Flower 1.9 I 1

compression ratio

mean distance for Baboon

1

0.5 O 20 40 60 80 100

compression ratio

mean distance for Lenna

I - PWC l

compression ratio

mean distance for Pepper 1 -8 1 1

- JPEG - - BWC

compression ratio

Figure 4.6: Mean Distance bieasure from the Distance Map

rnax distance for Flower 5.5 1

- JPEG - - BWC . - . PWC

3.5 O 20 40 60 80 100

compression ratio

max distance for Baboon

'n

1

O 20 40 60 80 IO0 compression ratio

max distance for Lenna

6r

l 1 - PWC


max distance for Pepper 5.5 p l


Figure 4.7: Maximum Distance Measure frorn the Distance SLap

4.3. Elper ih en t al Result

histograrn (90°b) for Flower

20 40 60 80 100 compression ratio

histogram (90%) for Baboon

- JPEG - - BWC

compression ratio

histogram (90% for Lenna 200 1

compression ratio

histogram (90°/~) for Pepper

20 40 60 80 100 compression ratio

Figure 4.8: Histogram Bin (90% of k1âuirnurn) from the Distance hlap

Chapter 5

Conclusion

5.1 Contributions

The focus of this thesis was to explore the use of hurnan visual charactenstics in image

compression and quality assessment. X perceptual wavelet coder was developed to sat isfy

a wide range of requirements. It consists of four major components: wavelet transform.

perceptual model. quantizer and entropy coder.

For wavelet transfom. the selection of levels of decomposition and wavelet filters was

investigated. We found that a decomposition level of five is suitable for Our purpose. We

included some of the best known filters for image compression in Our wavelet coder. and

the performance difference between these filters were found to be mainly image depen-

dent. The perceptual model mas designed to include several well-knoivn characteristics

of the HVS: contrast sensitivity. contrast masking, luminance and texture masking, and

probability summation. The model generates a weighting factor for each subband coeffi-

cient according to their visual importance. The perceptual distortion is then calculated

based on the weighted coefficients. X bit allocatim algorithm was used to determine

the quantization steps for each subband based on minimizing the perceptual error of the

5.. Contributions 85

reconstructed image. The multi-rate quantization mas used in conjunction a i t h an adap-

tive arithmetic coder to generate fully embedded bit strearns so that scalability can be

achieved. The perceptual wavelet coder was compared with two other coders: a wavelet

coder without incorporating perceptual mode1 and JPEG coder. The stibjective results

showed that at lom compression ratios. the .JPEG coder and the wavelet coders perform

comparably- In fact in man? cases. the? can be considered perceptually lossless under

the specified riew conditions. -4s the compression ratio increases. the perceptual quality

of JPEG coder falls clown quicklh but the wavelet coder can still niaintain reasonable

qiiality at high compression ratios. Between the two wavelet coders. the percept ual coder

generally demonstrated better visual quality t han the wavelet coder wit hout perceptual

model. These results justifiecl the use of perceptual mode1 in image compression. Results

obtained lrom perceptually lossless compression for high resoliition ILIAS images also

showed that it is possible for the perceptual wwelet coder to achieve higgher compression

ratio for which the images are considered perceptually lossless. An option w u included

in the coder to allow a specified region of interest be compressed at a desired perceptual

quali ty.

To investigate the use of vision model in image quality assessment. a vision model

based on Sarnoff's visually discriminated model was implemented and applied to the

compressed images frorn previous experiments. The 2-D distortion maps produced by the

mode1 showed fairly accurate prediction about where the distortion is more noticeable by

human eyes. Three numencal measures were estracted from the distortion map: mean

distance. maximum distance. and the number of points exceed 90% of the maximum

distance. Results showed that reasonable assessrnent can be made from the mean distance

mesure, but mith additional information from the other two measures, more accurate

judgement can be made.

5.2. Future Research

5.2 Future Research

Both perceptual coding of visual information and perceptual quality assessrnent have

important implications in many applications; therefore. it is worth exploring the topic

further. Some of the suggestions for future research are listed below.

-4 lot of work are remained to be done in designing an accurate human vision model.

From the point of view of vision science. extensive psychovisual experiments are

required to establish better understanding of the HVS. From the point of view of

image processing? better schemes for incorporating the vision niodel in coding can

potent ially improve the perceptual quality.

The correlation between the color components \vas not Ftilly esploited in the thesis.

it m q prove beneficial to incorporate ttiat in the vision mode[. Also it is wortliivhile

to inveçtigate other types of color spaces.

0 Higher compression ratios might be achieved using the Hierarchical Vector Quan-

tization scheme in wllvelet coder. I t shoiild also be interesting to investigate how

to espand our perceptual scheine for scalar quantization to vector quantization.

Finally? our perceptual wavelet coding scheme for still image can be estended to

video compression. TWO possible striictures can be used for video coding. In

the hybrid structure the intra-frame is coded using the wavelet scheme, and the

inter-frame is est imated From intra-frame by motion estimation and compensation.

Xnother possible video coding structure is the 3-D extension of wavelet coding to

the temporal domain. The HVS in the temporal domain also needs to be considered.

Appendix A

Color Space Conversion

I T b C k color space is clefined in ITU-R Recommendation 601. Conversion from

k'CbCh to standard CIE 1931 S Y Z tristimulus values requires two linear transforma-

tions and a gamma correction. IptCbCh cocling uses 8 bits for each component: Y' is

coded with an offset of 16 and an amplitude range of 219. ahile CL and Ch are codecl

with an offset of 128 and an amplitude range of f 112. The estremes of the coding range

are reserved for synchronization and signal processing headrooni. which requires clipping

prior to conversion. Nonlinear RtG'B' values in the range of [0.1] are computed from

Gamma correction in Equation 2.40 is applied to R'G'B' to obtain linear RGB values.

For display with standard phosphors, these linear RGB values can be converted to CIE

5.2. Future Research

Bibliography

[11 A. .J. lhumuda and H. -1. Peterson. "Luminance-!dodel-Based DCT Quantization

for Color Image Compression." Human Vision. Visaal Processing. und Digital Dzs-

play III. pp. 365-37-1, 1992.

[-1 A. .J. Ahumuda. "Computational [mage Quality Uetrics: .A Review." Society /or

Information Display In ternational symposium. Digest of TechnicaL Papers. pp. 30.5-

308. 1993

[31 A. .J. Ahumiida and C. H. Null. " Image Quality: .A SIiilti-Dimensional Problem."

Digital Images and Hurnan Vision. MIT press. pp. 141- M. 1993.

[A] M. hrdito? M. Gunetti. and XI. Visca, "Preferred Viening Distance and Displ-

Parameters." MOSAIC Handbook, pp. 165-151. 1996

[5] P. J. B u a and E. H. Adelson. '*The Laplacian Pyrarnid as a Compact Image Code."

IEEE Trans. on Communications. vol. corn-31, no. 1. pp. 532-540. April, 1953.

[6] Y. T. Chan, " Wavelet Basics." Khwer Academic Publishers. 1995.

[ i l C. H. Chou and Y. C. Li. ''-4 Perceptually Tuned subband Image Coder Based on

the Measure of Just-Noticeable Distortion Profile." IEEE Circuits and Systems for

Video tech., vol. 5, pp -167-476. Dec. 1995.

[SI R. J. Clark. "Digital Compression of Still Images ancl Vitleos." Academic Press.

1993.

[9] S. Da13 "The Visible Differences Predictor: An Algorithm for the ;\ssessment of

Image Fidelits" Digital Images and Hurnan Irision. pp. 179-206. Cambridge. MA:

MIT Press. 199.3.

[IO] 1. Daubechies. " Ten Lectures on Wavelets." SIAM. 1992.

[II] M. Dzniura and B. Singer. "Spatial Pooling of Contra t Gain Control." J. Opt. Soc.

of Amer. A. vol. 13' no. 11. pp. 2135-1140. 1996.

[12] I I . P. Eckert. A. P. Bradley. "Perceptual Quality hletrics Applied to Still Image

Compressionl" signal Processiny. vol. 70. no. 3. pp. 177-ZOO. 1998.

[13] G. Folland. ' Harmonic Analysis in Phase Space.' Princeton Linluersity Press. 1989

1 W. T. Freeman and E. H. Adelson. "The Design and Cse of Steerable Filters." IEEE

Tmns. on Pat tern .4nafysis and Machine Intelligence. vol. 13. no. 9. pp. $91-906.

September, 1991.

[13] D. J. Granrath. "The Role of Human Visual Models in Image Processing," Proceed-

ings of the IEEK vol. 69: pp. 552-361. 1981.

[16] C. F. Hall and E. L. Hall. "A Nonlinear Mode1 for the Spatial Characteristics of the

Human Visual System," IEEE TrBns. on Systern. Man, and Cybernetics. vol. 7. pp.

161-170. hlarch, 1977.

[17] Pl. Jayant, J. Johnston, and R. Safranek? "Signal Compression Based on blodels of

Human Perception," Proceedings of the IEEE. vol. 81, no. 10' pp. 1385-1422, 1993.

[18] D. H. Kelly, " Moion and Vision. S tabilized Spatio-Temporal Threshold Surface.'' J.

Opt. Soc. Amer. vol. 69. no. 10. pp. 1340-1349. 1979.

[19] Y. H. Kim and J. Xlodestino: " Adaptive Entropy-Coded Subband Coding of Im-

ages." IEEE Trans. on Image Processing. vol. 1. pp. 31-15. January 1992.

[?O] J. Lubin. "The Use of Psychophysical Data and Uodeis in the Anaiysis of Dispiay

System Performance." Digital Images and Hvman Vision. MIT Press. pp. 163-178.

1993.

[21] E. hlajani. "Biorthogonal CVavelets for Image Conipression." froc. SPIE. WIP-9-4.

1994.

[Z] S. G. Mallat. '3Iiiltifreqiiency Channel Decomposition of Images and Wavelet h d -

els." IEEE Transactions on ..Lcoustics. Speech. and Signal Procrssing, vol. 37. no.

12. pp. 0091-2190. Decernber. 1989.

[23] A. N. Netravali and B. C. Haskell. "Digital Pictures: Representation and Compres-

sion?" N e w York: Plenum. 1985.

1241 J. E. Odegard and C. S. Burrus. " Srnooth Biorthogonal Wwelets for Applications

in Image Compression." Proceedings of DSP Workshop, Norway, Septem ber. 1996.

[25] E- Peli. "Contrast in Complex Images.'' J. Opt. Soc. Amer. il. voi.7. no. 10. pp.

2032-2040, 1990.

[26] W. Penebaker and J . Mitchell' "JPEG Still Image Data Compression Standard.l

Van Nostrand. 1993.

[27] Ido Rabinoiitch. " High Quality Image Compression Using the Wavelet Transform,"

Master Thesis, University of Toronto; 1996.

[28] R. J. Safranek and J . D. Johnston. ''A Perceptiially Tued Subband Image coder with

Image Dependent Quantization and Post-qriantization Data Compression.'' IEEE

Internation al Conjerence on Acoustics. Speech. and Signal Processing. pp. 1943-

1948. 1989.

1291 . , A. Said and W. .A. Pearlman. " A Xew. Fast, and Efficient Image Codec Based on

Set Partitioning In Hierarchical Trees." IEEE Trans. on Circuits and Systems /or

Video Technology. vol. 6. pp. 243-250. June 1996.

[30] J . Shapiro. "Ernbedded Image Coding Using Zerotrees of CCavelet Coefficients."

IEEE Trans. on Signal Processing. vol. 41. pp. 3-44.3-3462. Dec. 1993.

[31] Y. Shoham and A. Gersho. "Efficient Bit Allocation for an Arbitra- Set of Quari-

tizers." IEEE Trans. on Acoustics. Speech. and Signal Processing. vol. 36. no. 9. pp.

1445-1453. September 1988.

[32] K. T. Soon. K. K. Pang, and K. Y. Ngan. "Classified Perceptual Coding tvith Aclap-

tive Qiiantization." IEEE Trans. on Circuits and Systems lor Video Techriolopj.

vol. 6. no. 4. pp. 375-388. 1996.

[33] D. Taubman and -4. Zakhor. ' hlultirate 3-D Subband Coding of Video." IEEE

Trans. on Image Processang. vol. 3. no. 5, pp. 572-38s. September 1994.

[34] P. C. Teo and D. J. Heeger. "Perceptual Image Distortion." lEEE International

Con ference on Image Processing, pp. 982-986. 1994.

[35] P. N. Topinrala (editor). " CVaveIet Image and Video Compressiont'' Kluwer rlcademic

Publishers, 1998.

(361 P. Vaidyanathan, "hlultirate systerns and filter Banks:" Prentice-Hall. 1993.

[37] 0. Rioul. XI. Vetterlio "Wavelet and Signal Processing." IEEE Signal Processing

Magazine. pp. 14-38. October 1991.

[35] J . Villasenor et al. " Filter Evaluation and Selection in b k e l e t Image Compression."

Proc. Data Compression Conference. IEEE. pp. 351-360. 'iIarch. 1994.

i393 -4. S. Vktwr i . " Detrct iuii aid Rccugnitiun uf Simple Spatial Forms." Phgsico! m d

Biological Processing of Images. 1983.

[-IO] A. B. Watson. 'DCTune: .A Technique for Visual Optimization of DCT Quantization

Matrices for Indicidual Images." Society for In/ormation Display Digest O/ Technical

Papers XXIV. pp. 946-949. SPIE. 1993.

[dl] -4. B. Watson. G. Y. Yang, .J. A. Solornon. and J . Villasenor. 'Tisibility of \Vivelet

Quantization Noise." IEEE Trans. on Image Processing. vol. 6. no.8. Aiigiist 1997.

[A?] G. Westheimer. "The Eye as an Optical Instrument." Hurrdbook of Perception and

Humun Perfo.mance, vol.1. chapter 4. John Wiley S( Sons. 1986.

[43] E. Whittaker. "On the Functions which are Representecl by the Expansions of

Interpolation Theorf Proc. Royal Soc.. Edinburgh. Section .A 35. pp. 181-194.

1915.

[U] S. Winkler, 'Issues in vision Modeling for Perceptual Video Quality ;\ssessment.*'

IEEE Signal Processing, vol. 78, pp 231-252? 1999.

[G] d. Woods and S. O'Neill "Subband Coding of Images." IEEE Trans. on îicowtic.

Speech. Signal Processing. vol. 34, pp. 1275-1285. October 1986.

quality assessrnent - university of toronto t-space · chapter 1 introduction 1.1 significance of...

Documents