quality assessrnent - university of toronto t-space · chapter 1 introduction 1.1 significance of...
TRANSCRIPT
![Page 1: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/1.jpg)
Perceptual Wavelet Coding and
Quality Assessrnent
for S t il1 Image
Shu-Yu Zhu
-1 thesis submitted in conforrnity with the requirements for the degree of Master of Applied Science
Graduate Department of Electrical and Cornputer Engineering University of Toronto
@ Copyright by Shu-Yu Zhu, 2000
![Page 2: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/2.jpg)
National Library 1*1 .Cam,, Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services seMces bibliographiques
395 Wdiington Street 395, rue Wellington Ottawa ON KIA ON4 OttawaON K1A ON4 Canada Canada
The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seii reproduire, prêter, distribuer ou copies of this thesis in microfom, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film, de
reproduction sur papier ou sur format électronique.
The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesisnor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author' s ou autrement reproduits sans son permission. autorisation.
![Page 3: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/3.jpg)
Perceptual Wavelet Coding and Quality Assessrnent
for S t il1 Image
Shu-Yu Zhu
Master of Xpplied Science, 2000
Graduate Department sf Electiical and Computcr Enginccri~g
University of Toronto
Abstract
This thesis investigates the use of human perceptual models to iniprove the performance
of image conipression and to provicie objective quaiity nieasures niore rrieariirigful than
the traditional mean-square-error (SISE) or the peak signal-to-noise ratio (PSNR).
.-\ perceptual wavelet coder is developed to satisfy a wide range of reqiiirements. from
perceptually lossless quality to hi& compression ratio. A perceptual mode1 is designed
to allow the coder to allocate bits for each subband based on minimizing the overall
perceptual distortion. An option is included to allow region of interest be compressed
at desired perceptual quality. The coder achieves scalability by multi-rate quantizat ion
and entropy coding. Results demonstrated better performance of the perceptual wavelet
coder than JPEG coder and wavelet coder without perceptual model.
h vision model is implemented for perceptual quality assessment. The model showed
fairly accurate prediction about where the distortion is more noticeable by liuman eyes.
The numerical measures generated from the model shon~ed more accurate assessment
than MSE and PSBR
![Page 4: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/4.jpg)
Acknowledgement s
This thesis would not be possible without the help of many people. to whom 1 would like
to express my appreciation here. First 1 would like to thank Professor Lénetsanopoulos
for introducing me to the area of multimedia and for providing me with the resources
and guidance for my research. I woiild also like to t hank Professor Plataniotis for his
insight fiil advises and suggestions. This research is affiliatecl wit h MAS corporation.
and I would like to thank Dr. Samuel Zhou for giving me the chance to work on intliistry
projects and for always being encouraging and helpful.
hly two years of Master would not have been as interesting without the friendship of
the people in the Communications group. 1 would especially like to thank Salima. Eddy.
Li-Wei. Ryan. Wing-Chung, and Kelvin. Your fun-loving spint and warmheartedness
always cheer me up. I ~vould also like to thank my friend John. whose support and good
will has almays been my source of motivation.
Finally I would like to thank ml- parents, to whom this thesis is dedicated. I would
not be where 1 am without them. Yoii are the greatest parents in the world.
![Page 5: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/5.jpg)
Contents
Abstract
Acknowledgment s iii
List of Tables
List of Figures
vii
INTRODUCTION 1
1.1 Significance of the Researcli . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 . Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research Goals and Directions . . . . . . . . . . . . . . . . . . . . . . - . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " 3
Background 6
2.1 Overview on Image Compression . . . . . . . . . . . . . . . . . . . . . . 6 -
2.1.1 Lossless Compression . . . . . . . . . . . . . . . . . . . . . . . . . I
2 . 1 . Quantization . . . . . . . . . . . . . . . . . . . . . . . - . - . . . 9
2.1.3 Lossy Compression . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Wavelet Transform and Image Compression . . . . . . . . . . . . . . . . 12
2.2.1 Space-Frequency Localization . . . . . . . . . . . . . . . . . . . . 13
![Page 6: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/6.jpg)
2.2.2 The Continiious Wavelet Transform and Wavelet bases . . . . . . 15
2.3.3 'ilultiresolution hnalysis and Filter Banks . . . . . . . . . . . . . IS
2.2.4 Cornmon Wavelet Schemes . . . . . . . . . . . . . . . . . . . . . . 25
Human Visual System and Perceptual .\ Iodel . . . . . . . . . . . . . . . 29
'2.3.1 Quality Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.3.3 Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Contrast Sensitivity 33
2.3.5 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . . . . . . . . . 2.3.6 LEulti-resoliition Structure 35
. . . . . . . . . . . . . . . . . . . . . . . . . . . '2.3.7 Error Summation :39
2.3.5 Psychovisual Validation . . . . . . . . . . . . . . . . . . . . . . . 40
3 Perceptual Image Codec 42
3.1 Overview of the Algori t hm . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -44
3.3 Perceptual Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.1 Cont ra t Threshold Function . . . . . . . . . . . . . . . . . . . . -41
3.3.2 Luminance and Texture Masking . . . . . . . . . . . . . . . . . . 49
3.3.3 Contrast Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.4 Perceptual Distortion hletric . . . . . . . . . . . . . . . . . . . . . 52
3.3.5 Region of Interest Quantization . . . . . . . . . . . . . . . . . . . 53
3.4 'ilulti-layer Quantizer and Entropy Coder . . . . . . . . . . . . . . . . . . 33 C I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Experimental Results s c
4 Quality Assessment Using Vision Model
4.1 Vision Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
![Page 7: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/7.jpg)
4.1.1 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.12 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.1.3 Bandpass Contrast Responses . . . . . . . . . . . . . . . . . . . . 73
4.1.4 Oriented Responses . . . . . . . . . . . . . . . . . . . . . . . . . . 74
-- 4.2 Transclucer . . . . . . . . . . . . . - . . . . . . . . . . . . . . . . . - . . . 1 a
1 Distance bletric . . . . . . . . . . . . . . . . . - . . . . . . . . . . 76
-- 4.3 Experimental Resiilt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i i
5 Conclusion 84
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Appendix A
Bibliography
![Page 8: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/8.jpg)
List of Tables
2.1 Examples of Visual Resolution for Various DispIays[41] . . . . . . . . . . 32
3.1 Filter Coefficients (al1 coefficients start at zero. for biothorgonal filters. the
first row is analysis filter and the second row is synthesis filter. and they
are symrnetric about zero) . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Bais Fiinction Magnitudes .ii,e for 6 levels of an Antonini 917 DWT . . 49
![Page 9: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/9.jpg)
List of Figures
. . . . . . . . . . . . . . . . . . . 2 Block Diagram for Lossy Compression 11
2.2 Space-Frequency Localization for (a) local Fourier bases . and (b) wavelet
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bases[35] 16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 CVaveletFunctions 11
. . . . . . . . . . . 2.4 The Elernentary Haar Scaling Function and Wavelet 20
2.5 A Two-Channel Perfect Reconstruction Fiiter Bank . . . . . . . . . . . . 3 -- *>
. . . . . . . . . . . . . . . . . . . . 2.6 An 1-D Octave-Band Deconiposition 23
. . . . . . . . . . . . . . . . . . . . . 2.7 A 2-D Octave-Band Decomposition 23
. . . . . . . . . . . . . . . . . . 2.8 Parent-Child Dependencies of Subbands 27
. . . . . . . . . . . . . . . . 2.9 Point Spread Function of the Human Eue[-LZ] 33
. . . . . . . . . . . . . . . . . . . 2.10 Sensitivity of the Three Types of Cone 34
. . . . . . . . . . . . . . . . . . . . . . . . 2.11 Brightness Adjustment Curve 38
. . . . . . . . . . . . . . . 3.1 BIock Diagram of the Perceptual Image Codec 43
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Subband Indesing (1. O ) 45
. . . . 3.3 Estimated Threshold for Y(bottom). Cr (middle) . and Cb (top) [-LI] 48
3.4 Xonlinear Transducer Funct ion . . . . . . . . . . . . . . . . . . . . . . . 52
- - . . . . . . . . . . . . . . . 3.5 Rate-Distortion Cun-es with Optimal Solution 3s
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Bit .A Ilocation Tree 56
![Page 10: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/10.jpg)
3.7 Original 'lenna' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8 Original *flower0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Original 'pepper' 59
3.10 Original 'baboon' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.11 PSNR Cumes for AI1 Three Coders . . . . . . . . . . . . . . . . . . . . . 60
3.12 Reconstructed 'flower' Csing JPEG Coder (90: 1) . . . . . . . . . . . . . . 63
3.13 Reconstructed *flower' Lsing Perceptual Wavelet Coder (90:l) . . . . . . 63
3.14 Reconstructed 'baboon' Csing JPEG Coder (809) . . . . . . . . . . . . . 64
3-13 Reconstructed 'baboon' Csing Perceptual Wavelet Coder (8O:l) . . . . . 64
3.16 Reconstructed 'lenna' Csing Basic iVavelet Coder (50:I) . . . . . . . . . 65
3.17 Reconstructed 'ienna' Csing Perceptual Wave1t.t Coder (50:l) . . . . . . . 65
3.15 Reconstructed 'ffower' Lsing Basic Uavelet Coder (90:l) . . . . . . . . . 66
3.19 Reconstructed ?lower' Csing Perceptual Wavelet Coder (90:l) . . . . . . 66
3.20 Reconstructecl 'pepper' Csing Basic CVavelet Coder (8O:l) . . . . . . . . . 67
3.21 Recoostructed 'pepper' Using Perceptual Wavelet Coder (S0:l) . . . . . . 67
. . . . . . . . 3.22 Reconstmcted 'baboon' Using Basic C h e l e t Coder (80:l) 6S
3.23 Reconstriicted *baboon' L'sing Perceptual Wavelet Coder (80: 1) . . . . . 68
3.24 Reconstructed 'lenna' Using Percept ual Wavelet Coder Wit hout ROI(S0: 1) 69
3.25 Reconstructed 'lenna' Gsing Perceptual Wavelet Coder With ROI(80:l) . 69
4.1 Flow Diagram of the Vision XIodelj201 . . . . . . . . . . . . . . . . . . . 71 ..
4.2 Distance hIap for 'lenna' . . . . . . . . . . . . . . . . . . . . . . . . . . . i i
4.3 Distance Map for 'flowert . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Distance Map for 'pepper' . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5 Distance PIap for 'baboon' . . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.6 4Iean Distance Measure from the Distance bIap . . . . . . . . . . . . . . 81
![Page 11: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/11.jpg)
4.7 SLauimurn Distance Ueasure from the Distance Map. . . . . . . . . . . . 82
4.8 Histogram Bin (90% of Slauirnum) from the Distance Uap . . . . . . . . 53
![Page 12: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/12.jpg)
Chapter 1
INTRODUCTION
1.1 Significance of the Research
Over the past few decades. an increasing number of muit iniedia applicat iotis errierging
rapidly wit h the advent of digital communication and internet. Efficient representation
of digital signais becomes an enabling technology in this new digital era. Of the various
types of data transferred over networks. image comprises the bulk of the traffic. It is
currently estimated that image data transfers takes up over 90% of the volume on the
intemet[35]. The ability to store. transmit and process digital images is usually limited
by disk space. available bandwidth. and processor speed. Even with the tremendous ad-
vancement in computer hardware and network systems. it is still impractical to deal with
uncompressed images. For instance. to transmit one second of HDTC' (high definition
TV) video sequence. a t a resolution of 2k x lk with 24 bits/pirel and 30 frames/sec. it
takes about 1.3 Gb. In another word, Nith a conventionai modem of 56kb/s. it would
take more than seven hours to transmit one second of such tldeo. And the most re-
cent DVD (digital versatile di&) of 5GB can only hold about 3 seconds of the video.
Other applications such as video conferencing, remote sensing, medical imaging, facsirn-
![Page 13: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/13.jpg)
1.2. Review of Preuious CVurk
ile transmission. digital camera. internet transmission and browsing. etc. al1 rely upon
compression.
Traditionally. compression is rnainly used in telecommunication applications where
the emphasis is placed on high compression ratio. Today. compression is also used in
many commercial products. such as interactive HDTV. graphic arts archives. etc.. in
which image qiiality is the main consideration. Since human is the ultimate observer
of the images. compression algorithms taking advantage of hurnan vision mocicls allow
allocation of bits to signais that are rnost meaningful to the hurnan visual system (HVS).
and thiis lead to better perceptual quality. Vision mode1 is also helpful in image quality
assessrnent since the t raclit ional masures of image quality, mean square error (MSE). and
signal-to-noise ratio (PSNR) do not reflect the subjective perception of human observers.
1.2 Review of Previous Work
Subband coding has at tracted a lot of attentions in recent years because it provides better
performance than DCT-based methods. and it is able to achieve full scalability. The idea
behind subband coding is to decompose signal into frequency subbands that cün then be
encoded either independently or jointly. The decomposition leads to energv conipaction
and therefore with careful design of quantization. the subband coefficients can be greatly
compressed. providing spatial and bitsream scalability naturally. Earlier work on subband
coding for image compression can be found in [19][45]. Shapiroos embedded zerotree
wavelet (EZW) coder[30] demonstrates that fully embedded codes can be generated using
subband coding. EZW is also amoog the first to exploit the relationship between parent
and child subbands. It inspired several embedded subband coders. The set partitioning
in hierarchical tree (SPIHT) algorithm[29] further exploits the parent-child relationship
and zerotree stmct ure.
![Page 14: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/14.jpg)
1.3. Research Goals and Directions
Vision science was first introduced to image processing in the late 7O's[l5][16]. How-
ever. since at that time the knowledge of the HVS was limiteci. the models coulcl not
interpret human perception very well. Recently several objective quality assessrnent al-
gorithms using vision models have been proposed[3][9][41]. Safranek ancl .Johnston[-S]
int roduced a quantization noise detection mode1 in subband coding. A similar approach
was taken by LVitson for DCT-based coding[-lO]. -1 review on perceptual optirnization
schemes can be found in[32].
1.3 Research Goals and Directions
One of the main objectives of this research is to develop an image compression scheme
that c m be used for a wide range of applications. from high-end cornniercial applications.
where no distortion can be tolerated. to internet ancl wireless types of applications. wtiere
bandwidth is highly restricted. Therefore the compression scheme should be able to
achieve bot h percept ually lossless quali ty at medium compression ratios and reasonable
perceptual quality a t very high compression ratios. It should also provide scalability. a
highly desirable property for applications that require image transmission. With these
requirements in mind. wavelet coding stands out as a good choice for our scheme due to
several reasons: it can achieve much higher compression ratio than DCT-based methods
witliout completely distort the image: it can generate fully embedded codes so that
scalability can be achieved easily: and it provides efficient ways for incorporating human
percept uai models in the compression scheme because bot h rvavelet t ransform and HVS
are well localized in space and frequency. Although some studies show that wavelet
coding does not perform as well as the popular JPEG scheme a t low compression ratios.
by incorporating a good perceptual model? the wavelet coding scheme has the potential
to outperforrn JPEG scheme even a t Iow compression ratios. Therefore. in our study we
![Page 15: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/15.jpg)
1.3. Research Goals and Directions 4
design a perceptual model t hat include some of the well-kno~vn visual characteristics. In
some proposed perceptual compression schemes. the perceptual model iised in the encoder
is duplicated in the decoder to extract the weighting factor appliecl to each coefficient.
The disadvantage of these approaches is that embedded coding c m not be achieved
since the coefficients CO be decoded are interdependent. which prevents layered coding.
Therefore. to efficiently incorporating the perceptual niodel in our compression scherne
without sacrifice the embedded coding. a weighting factor for each subbancl coefficient is
generated according to the perceptual importance of that coefficient. Then the perceptual
distortion can be calculated so that the quantization steps for each subband are assignecl
in a way to miaimize the perceptual distortion instead of the mean square error. Since
the codec design is asynimetric. the decoder does not need to know the perceptual model
to reconstruct t lie image. a general-purpose wavelet decoder wit hout adclcil cornplexity
can be used. This h a s an advantage in applications where real-tinio decotlirig is recluired.
In some applications. users may be interested in a specific region of the irnage aiid nould
like it to maintain a certain level of qualit- Therefore. an option is added to our coding
scherne so that the region of interest can be encoded separately to satisfy n perceptual
quality.
Since an accurate image quality assessrnent is important in designing and evaluating
compression schemes. another objective of our research is to investigate the use of vision
model in image quality assesment. The vision model used for quality assessrnent is also
based on the fundamental human visual characteristics. but it can be more elaborate than
ones used for compression since there are less constraints such as cornplexity. scalability,
and compatibility with coding modules. ÇVe would like to first establish the validity of
using vision model in e d u a t i n g image qudity, then we would try to determine some nu-
merical measures for assessing image quality in place of -VISE and PSNR. This model will
also be used to evaluate the images compressed using our perceptual wavelet coder. The
![Page 16: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/16.jpg)
visual discrimination model developed by David SarnofF Research Center(701 is chosen as
the base model for Our implementation.
1.4 Thesis Organization
'PI ~riis Liirsis is urgaii izd dj ~VIIÛVS. Chapter 2 prüiidcs s o x c backgroucd on image ccm-
pression. Wavelet theory. and human visual system. The chapter first gives an overview
on image compression. then it explains some of the fundamentals and principles of
Wavelet theory and hon CVavelet techniques can apply to image cornpression, and finally
it summarizes some of the human visual properties and how they can be incorporated in
compression and quality assessrnen t . Chapter 3 int roduces a perceptiial wavelet scherne
for image compression. A detailed description of each component is given. and the ex-
perimental results are presented. Chapter 4 presents a vision nioclel for iniage quality
assessment. This model is used to evaluate the images from Chapter 3 and the results are
presented. FinaII- Chapter 5 sunimariaes the results and contributions of the research.
and suggests some possible future directions.
![Page 17: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/17.jpg)
Chapter 2
Background
Perceptual coding of images involves three major fields of research: digital signal pro-
cessing, information theory and vision science. This chapter provides the relevant back-
ground Erom al1 three fields in the context of image compression. An overview on image
compression is given in Section 2.1. followed by a review on wavelet coding in Section 2.2.
and finally Section 2.3 summarizes various human visual propert ies and percep t ual mod-
els used in image compression and quality assessment.
2.1 Overview on Image Compression
The ultimate goal of compression is to reduce the number of bits needed to represent the
signal. Data compression techniques cari be divided into two categories: lossless com-
pression and lossy compression. Lossless compression implies perfect reconstructability of
the original image. It ha. evolved from the practical application of the theoretic work of
Shannon and others on probabilistic view of information and its representation. Lossless
compression is a highly mature field. and only incremental improvements are achieved
in recent times. The limitation of lossless compression is its low achievable compression
![Page 18: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/18.jpg)
2.1. Ovenn'ew on Image Compression
ratio. For a typical natural image. one can expect a compression about 2:l. At the
very high end, lossless can achieve a ratio of 4:l. Since this compression ratio is not
acceptable for most applications. researchers have look seriously at lossy compression in
recent time. Lossy compression can offer orders of magnitude greater compression than
lossless compression. The goal with lossy compression is to achieve indiscernible loss as
interpret ed subject ively by end users.
hl1 compression techniques rely on two features in an image to achieve reduction:
redundancy and irrelevancy. Lossless compression relies only on the redundancy feature
of data. esploiting unequal symbol probabilities and symbol predictability Lossy com-
pression relies on the additional feature of data: irrelevancy. A large amount of data can
be eliminatecl tliis way without significant subjective loss.
2.1.1 Lossless Compression
Lossless compression is a branch of information theory. It is part of the soiirce coding.
In source coding. a given soiirce can achieve its entropy with an optimal code. Suppose
a source is consists of a set of symbols x,? i = 1. ....V whose probability of occurrence is
given by pi, the eutropq- of this source is defined as:
Such a source can be encoded with H bits/sample. permitting a reconstniction of
arbitrarily small error. Though this theory can only be achieved for infinite strings of
s-=bols, the entropy measure serves as a benchmark and target rate for lossless com-
pression. .. brief review of some lossless compression techniques is given in the following.
![Page 19: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/19.jpg)
2.1. Overview on Image Compression
DPCM
Different ial Pulse Coded hIodulation tries to rernove the redundancy in data by predic ting
the data saniple from its neighboririg samples and coding the prediction error:
where o[i] are the prediction coefficients. which can be derived by regression analusis.
The prediction coefficients can also adapt to data by learning - this is called Adaptive
Huffman Coding
Huffman coding c m achieve entropy of a source on a symbol bais . We first need to know
the probabilities of occurrence For each symbol in the source alphabet. and ortier them in
ascending order. Each symbol is placed at a separate leaf node of a tree. Then ive nierge
the two nodes with the smallest probabilities into one node. with the two original nodes
as children. The parent node is assigned the sum of the probabilities of the children
nodes. The children nodes are labeled as O and 1. This procediire is then iterated until
there is a single node which has al1 original leaf nodes as its children. The new bina-
symbol for each leaf node can be read sequentially down from the root node. And these
symbols are distinct by constriction. Huffman coding requires p ior knowledge of the
probabilities of occurrence of various symbols. which in practice are not usually available
a priori. Inferring the syrnbol distributions is therefore the first task of entropy coder.
Arit hmetic Coding
Huffman coding is Iimited to assigning an integer number of bits to each s-ymbol. This
mechanism is rate efficient on- if the symbol probabilities are exactly reciprocal powers of
![Page 20: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/20.jpg)
2.1. Ovenn'e*w on Image Compression
two. In al1 other cases' the system fails to achieve the entropy rate. Non-integer number
of bits can be assigned to a symbol by encoding long strings simultoneously. Symbols
c m be grouped into substrings which have densities approaching the ideal. Arithrnetic
coder assigns real-valued syrnbol bit rates to achieve a compact encoding of the entire
message directly. Subinterval of the unit interval [O, 11 are used to represent syrnbols of
the code. The first letter is encoded by choosing a corresponding subintervals of [O. 11.
The length of the subinterval is chosen to equal its expected probability of occurrence.
Successive symbols are encoded by expanding the selected subintervals to a unit interval.
and choosing a corresponding subinterval. At the end o l the string. single element of the
designated subinterval is sent as the code almg with an end-of-niessage qrnbol. This
scheme can achieve virtually the entropy rate since non-integer bits c m be assigneci to
each symbol.
Run-Length Coding
Through appropriate transformation. significant compression c m be obtained. Some
types of transformation can produce long runs of a single symbol (i.e. 0). and it is very
useful to code the runs of '0' by a symbol representing the length of the run. This is the
Run-Length Coding, and it is ernployed in most of the compression methods with high
compression ratios. Fax data is by nature suitable for such type of coding. Other types
of image require some clever transformation after which rnost of the data has very little
ener,l.
2.1.2 Quant izat ion
Quant ization is essentially a mapping from a continuously parameterized set to a discre te
set, and dequantization is rnapping in the reverse direction. Quantization is a non-
![Page 21: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/21.jpg)
Z . I . 0uenn'e.w on Image Compression
reversible process. When the elements of the continuously parameterized set are single
real numbers (i.e. scalar). they c m be represented by integers - this is calletl scalar
quantization. -4 scalar quantizer is simply mapping from R to 2. The purpose is to
permit a finite precision representation of data. When the elements are points in a vector
space (i.e. vector). they can be represented by one of a fixed discrete set of vectors - this
is called vector quantization. Vector qiiantization is the process of discretizing a vector
space by partitioning it into cells. and selecting a representative from each cell. It can
take blocks of data. and assign codewords to each block. Since codewords are available to
the decoder. only indes of the codewords needs to be sent. Vector quantizers demonstrate
good performance in practice. however. t heir main drawbacks are the extensive training
required to select the codebook. and the slow encoding process required to search for the
best codeword. béctor quant ization is useful in applications nhere ericoding time is not
important. such as CD applications.
Since a discrete representation is used to represent what originally is a continuous
variable. we need to measure the degree of distortion in the representation. The most
common measiire is the mean-squared error:
Once a quantitative rneasure of dis tort ion and al1 constraints are given. one can formulate
a quantization approach t hat minirnizes the distortion.
2.1.3 Lossy Compression
Lossy compression usuallv consists of three main blocks: transform, quantizert and en-
coder. as shown in Figure 2.1. The transform decorrelates image data and compactifies
t heir spectral energv. The quant izer allocates bit precision to the t ransform coefficients.
and the encoder converts the quantized data into syrnbols for transmission. The well
![Page 22: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/22.jpg)
2.1. Ovemie w on Image Compression
known JPEG compression standard is described below as an illustration of lossy com-
pressiori.
Figure 2.1: Block Diagram for Lossy Compression
JPEG
JPEG is a well-knomn standard for still image compression[26]. It is basecl on a wiclely
used linear transform. discrete cosine transform (DCT). DCT belongs to a niore general
class of liarhunen-Loeve decomposition. and it offers a good compromise betmeen energy
compaction and computational cornplesity. The 2-D DCT pair can be espressed as
follows:
for rn? n, k.I = O, 1, ...: !\* - 1.
The ISO JPEG standard is specified as follows:
The image is divided into 5x8 blocks, and DCT is perfomed on each block.
![Page 23: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/23.jpg)
2.2. Wavelet Transform and Image Compression
The resultant coefficients are quantized using uniform quantization whose step sizes
are specified according to a predefined quant izat ion matrix. This quant izat ion
controls the bit rate of the compression as well as the image degradation. The DC
coefficient receives the highest quantization precision because it contains most of
the block's energy. The AC coefficients are quantized more coarsely. Htirnan visual
system c m be incorporated in t his step by cletermining appropriate quant ization
stepsizes to match contra t sensitivity function.
a The quantized coefficients are scanned in zig-mg order to forni 1-D data string. on
which run-Iength coding are performed. And finally. the. are entropy coded using
ei t her Huffmsn or arit hmet ic coding.
2.2 Wavelet Transform and Image Compression
CVavelet image compression has become a focus of research over the p s t few years. The
advantage of wavelet transform in image processing over the other transfornis. such as
Fourier Transform. is that the transform is well-localized in both space ancl frequency
domains[6][8] [22] [37]. This feature allows wavelet compression to achieve bet ter coding
gain over DCT-based methods because it leads to some ciramatic simplification in image
statistics. especially for non-stationary natural images. Since HVS is also well-localized in
both space and frequency domains, wavelet compression has the advantage to incorporate
t hose properties efficiently.
Wavelet coding belongs to a more general class of subband coding. The prïnciple of
subband coding is to hierarchically decompose an image. usually with a two-channel anal-
ysis/spthesis filterbank, until the energy is highly concentrated in the lowest frequency
subband. Tremendous energy compaction c m be achieved this nray7 wvhich is cmcial in
![Page 24: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/24.jpg)
2.2. Wauelet Transjorm and Image Compression
image compression. Among subband filters for image coclingo biort hogonal mvelet pl-s
a prominent role. The search for best filters among t housands of wavelet filters has lead
to discoveries of several suitable biorthogonal filters for image cornpression[l0][21] [35].
This section gives a hrief review of CVavelet transform theory and its application to
image compression. -4 review of wavelet filter design is also provided. And finally two
common wavelet coding techniques are depicted.
2.2.1 Space-Frequency Locaiization
While space and frequency are usually viened as two different domains. it is often valu-
able to represent signals in a iinified spüce-frequency plane. This is especially true for
image where the Frequency contents vary with the spatial location. The reason wavelet
transform generated great interest in image processing is that it provides space-frequency
localization, allowing simultaneous representation of image in both space and Frequency
Before Our discussion on space-frequency localization of wavelet transform. let ils first
consider another aidely used transform. the Fourier Transform. The 2-D Discrete Fourier
Transform (DFT) pair is defined as:
While DFT gives a frequency representation of an àIiN image, it does oot provide any
spatial information. Vice versa, the inverse transform provides complete spatial charac-
terization of the image, but it contains no frequency contents.
The general approach to develop a simultaneocis space-frequency representation of
a signal is to use window functions that isolate a segment of the signal of arbitra-
![Page 25: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/25.jpg)
2.2. Wauelet Transfonn and Image Conrpression 1-4
length? and perform frequency analysis on that segment[22]. The entire signal c m be
represented by translating the wiodorv function. If Fourier Transform is used Cor the
frequency analysis, the Short-Time Fourier Transform can be defined as:
For a nindow function localized around the tirne point s = t. this process essentiaiiy
cornputes the Fourier Transform of the signal in a small wiridow around the point t . A
basic fact about the Short-Time Fourier Transform is that it is invertible: f(t) c m be
recovered from Sl(t. d) using the following inversion formula:
Eqiiation 2.9 holds as long as the window function d is any nonzero function in
L2(R) space. It is clear that the performance of this space-Frequency analysis depends
on the choice of the windom function. Ideally. we would like the analysis to be able to
discriminate between any tivo frequency components and between an- two pulses in the
space clornain. This is. however, not possible according to the following fundamental
t heorem[l3], Heisenberg's Inequali ty :
where Ar is the spatial resoliition, and 3 w is the frequency resolution. The uncertainty
formula in ( 2.10) applies to any nonzero function in L2(R) , and it states that the standard
deviation in the space and frequency domains cannot be arbitrary. Rather. there is a
trade-off between spatial and frequency resolution. Furthermore. the equality in 2.10
holds if and onlp if the function is among the set of translated. modulateci and dilated
Gaussians:
![Page 26: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/26.jpg)
2.2. CVauelet Transjom and Image Compression
The result suggests t hat Gaussians. under the influence of translations. modulations, and
dilations. form the elernentary building blocks for decomposing signals. Effectively. t hese
signais are individual packets of energi that are as concentrated as possible in space
and frequency. Another attractive property of the Gaussian function is tliat its Fourier
Transform is also a Gaussian function. Therefore. Gabor proposed the use of Gaussian
window since it is smooth in both space and frequency domains and it offers the best
compromise between spatial and frequency resolutions.
.-\lthough the above window method achieves simultaneous space-frequency represen-
tation. it has some drawbacks for image compression. One of them is that l x and
remain constant in the analysis. This is rather inflesible when a higher frequency res-
olution or spatial resolution is desired. Xlso. the constant frequency resolution is not
compatible with models of humaii visual system. Research has shown that the frequency
resolution is inversel- proportional to the center frequency for HVS[S]. Therefore. it is
more desirahle to have high frequency resolution at low frequencies. and low frequency
resolution at high frequencies. Another disaclvantage of the above method is that Gaus-
sian functions do not form an orthonormal bu i s in L 2 ( R ) . In another word. they are
highly redundant. and esqmnsion of a signal in these functions generally leads to an in-
crease in the sarnpling rate. rhich works against compression. -4s Nil1 be explaineci in
the following sections. wavelet analysis. while using the same principle as the window
method. is able to resol~es these issues. Figure 2.2 illustrates the two different types of
space-frequency localization wit h Fourier bases and wavele t bases.
2.2.2 The Continuous Wavelet Transform and Wavelet bases
An alternative interpretation of the Short-Time Fourier Transform in 2.8 is that the signal
f (t) is decomposed ont0 translated and modulated versions of the mindow function @(x).
![Page 27: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/27.jpg)
2.2. LVuvele t Transfirm and Image Compression
l'iriic- Freqiiency Locdiza tion
f req
Figure 2.1: Space-Frequency Localization for (a) local Fourier bases. and (b) wavelet
bases[33]
In ana1og-y to the Short-Tirne Fourier Transform. the Continuous \Cavelet Transform.
CWT. is a decomposition of a function onto translated and dilated versions of some
basic function (" mot her rvavelet" ) ~ ( x ) . More specifically. each wavelet func tion can be
expressed as:
The parameter u is the amount of translation and the parameter s is the scale factor.
which controls the size of the analysis window. When the scale factor is small. the de-
tailed. non-stationary behavior of the signal can be captured. and as the scale s increases.
the impulse response of cit,?. spreads out in space and detects the global behavior of the
signal. Figure 2.3 gives an example of a mother wavelet function and two of its dilated
versions.
The wavelet decomposition can be expressed as the inner product of function f (x)
![Page 28: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/28.jpg)
2.2 CVaveZet Tram f o m and Image Compression
(a) s c 1 (b) Mother Wavelet (c) s > I
Figure 2.3: PVavelet Functions
with the wavelet bases in L2(R):
1 x - U CWT(a. u) =< f (r) . us,. >= - / f / x ) u ( - ) d x
fi Y
And given some mild coriditions on the analyzing function s * ( x ) : that it is absoliitely inte-
grable and has niean zero. there is an inversion formula availablc. j(x) can be representecl
cas:
I x - u 1 /(r) = I /m &wT(~. u)-c(- ) 7d.sd~u
% -Co s s -
As evident in Equation 2-13? the wavelet transform can also be thought of as applying a
bandpass filter with impulse response $ @(y) to the input signal f (r) a t location ~(61.
This interpretation is important in the implementation of wavelet transform. as will be
discussed later.
Since the parameters s and u in CWT are continuous: decomposition oiito these
functions will generate a lot of redundancy. -4s we b o n . from the remarkable Sampling
Theorem discovered by Whit taker[43jy bandlimited function can be perfectly constmcted
from the values of that function a t the integer points. There is an analogous sampling
theorem in wavelet transfom. nhere a discrete set of wavelet functions can represent
f ( t ) . The finite set of wavelet functions can be obtained by discretizing the parameters
![Page 29: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/29.jpg)
2.2. CVavelet Transfon and Image Compression
s and il. A iiseful sampling grid is the dyadic sampling grid. where the parameters s and
u take on the following val~les:
s = u = dm : m. n integer (2.13)
The wavelet funct ions become:
The next step in constructing wavelet transform is finding a function u(r) so that
c, , , (x) form a orthonormal bases. -ln orthonormal b a i s of wavelet functions was first
discovered in 1910 by Haar. His early example wm:
The function is of zero mean ancl absoliite integrable. so is an admissible wavelet for
the CWT. The Haar navelet f~inction has compact support in the spatial domain. but is
discontinuoiis iind therefore is not differentiable. .Uso. the Haar wavelet function has a
non-compact support in the frequency clomain. which leads to poor frequency localization.
It is desirable to find wavelet fiinctions that have compact support in spatial domain.
which enables an FIR filter implernentation. and that the FIR filter be regular. The
regularity means that the filter sequence conwrges to a continuous and differentiable
function rir(x)[37]. .A sporadic set of wavelet bases were discovered in the mid 80's. and
a fundamental theory developed nicely encapsulated almost a11 wavelet bases. and made
the application of wavelets more intuitive: it is called iLIultiresolution Analysis. which
will be discussed next.
2.2.3 Multiresolut ion AnaIysis and Filter Banks
Multiresolution Xnalysis (5IR-4) is a method OF decomposing image to a hierarchy of
resolution levels[22]. The t heory can be fomulated as follows[lO]:
![Page 30: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/30.jpg)
2.2. lYavelet Transfom and Image Compression
Given a tower of subspaces Li c L2(R) siich that
5 . There exists a 6 E 1.8 such that ( d ( t - n) : n integer) is an ort honormal
basis of CL.
m Then there exists a function m ( t ) E 1; siich that u*,,,(t) = E 1 ~ ( 2 " t - n ) : m. TL integer
is an orthonormal basis for L2(R).
By aviorn 1 to 3 we are given tower of subspaces of the whole function space. cor-
responding to the full structure of signal space. In the idealized space L2(R). al1 the
detailed testure of image- can be represented. whereas in the intermecliate spaces
only the details up to a fised granularity are arailable. Asiom 4 states that the structure
of the details is the same a t each scale. and it is only the granularity that changes - this
is the rnultiresolution aspect. Finally. aviorn 5 states that the intermediate space li and
hence al1 spaces Ci have a simple basis given by translations of a single function.
A simple example. using Haar wavelet. can illustrate above auiorns. Let I.i = { f :
f[,,,,l) = constant. B integer}, with d ( t ) = r e d ~ ~ , ~ l . It is clear t hat {d ( t - n ) : n integer)
form an orthonormal basis for h. Al1 the ot her subspaces are defined from I by âuiom
4. The spaces V L o k + co are made up of functions that are constant on smaller and rn
smaller dyadic intenxls. Let p,(t) = 2 ~ 4 ( 2 ~ t - n). so that dm, is an orthonormal
b a i s for l/oo and ph is one for Li. Since 1.b c 1;. we must have that O E l i? so
4(t) = C, ho(n)4(2t - n). for some coefficients ho(n). By ort honormality of &,. we have
ho(n) =< &,, # >, and by normality of 4 = #ao me have that C lho(n)12 = 1. This
![Page 31: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/31.jpg)
2.2. Wavelet Transform and Image Cornp~ession 20
places a constraint on the coefficient ho(n) in the expansion for d. which is often called
the scaling fiinction. Like d. the wavelet & wili also be a linear combinaticin of the dl,:
1 1 ' Thisis For the Haar wavelet. we have ho(0). h o ( l ) = z. and hL(0) . h l ( l ) = -&. -z. easy to compute from function Qoo = red[o,il direct lu. since:
Figure 2.4 shows the Haar scaling function and wavelet at level O. Functions at other
levels are rescaled version of these functions.
Figure 2.4: The E l e r n e n t a ~ Haar Scaling Function and Wavelet
The multiresolution analysis provides an intuitive approach for image processing by
dissecting image into manageable pieces. CVe can view the elements of the spaces L i , k +
CCI as providing greater and greater detail, and we can view the projections of functions
to spaces \i. k + -cc as providing coarsening with reduced detail. From the axioms: we
can show that the subspace Ci c Vl has an orthogonal complement. CV', which in turn is
spanned by the wavelets This gives the orthogonal sum I; = 1.b + iV0. Sirnilarly,
![Page 32: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/32.jpg)
2.2. Wauelet Transfonn and Image Compression
we can get:
From the point of view of signal processing,
A-. f2-23)
if we start with a signal / E Io. we
can deconipose it as f = f - L + 6-L E Li + iL1. We c m view / - 1 as a first conrse
resolution approsirnation of /. and the differeiice 6-1 as the first detail signal. This
process c m continue. deriving coarser resolu t ion signals. ancl the corresponding detail
signals. Because of the orthogonal decompositions at each step. the original signal always
can be recovered from the final coarse resolution signal and the secpence of detail signais.
and the cleconiposition a t each step is a unitary transforni. ivliich meaiis the energ. of the
signal is prcserved. It c m be shown [Xi] that the coarse resolution signal can be obtained
by performing a lowpass filtering n i th coefficients {ho(n) }. followed by a downsample by
a factor of 2: and the detail signal can be obtained by perforrning a highpass filtering with
coefficients {h l (n) }. followed by a downsample by a factor of 2. Therefore orthonormal
wavelet bases give rise to lowpass and highpass filters, whose coefficients are given by
various dot products between the dilated and translated scaling and wavelet functions.
Conversel. these filters fully encapsulate the wavelet functions and are recoverable by
iterat ive procedures.
The idea described above can be encapsulated in what is commonly called a two-
channel perfect reconstruction filter. as shown in Figure 2.5. The filter Ho refers to the
lowpass filter. and Hl refers to the highpass filter. kVe know from the wavelet theon; above
that the highpass filter is derived from the lowpass filter, hl (n) = (- l )"ho(-n + 1). The
reconstruction filters are precisely the same filters, but applied after reversing. Therefore,
![Page 33: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/33.jpg)
2.2. Wavelet Trnnsform and Image Cornpression
t here is only one filter to design. Ho.
Analysis Synt hesis
Figure 2.5: ;\ Two-Channel Perfect Reconstruction Filter Bank
For typical images. the detail signal generally contains liniitecl riiergy. wliile the a p
prosimation signal usiially carries most of the energy. Since energv compaction is impor-
tant in compression. this decornposit ion mechanism is frequent ly applied to the lonrpass
signal for a nimber of levels. üs shown in Figure 2.6. Such decompositiori is called a n
octave-band or Mallat decorn position. The reconstruction process t hen duplicate the
decomposition in reverse. Since the transformation is unitary at each step. the concate-
nation of unitary transformations is still unita- and there is no loss of energv in the
process. The loss in information for compression is the result of quantization following
the decomposition.
To extend this one-dimensional structure to two dimensions. as needecl in image
processing, we simply need to apply this structure to row vectors first then to column
vectors. With the lowpass and highpass filters. this results in a decomposition into
quadrants, corresponding to four subsequent channels: Iow-low, low-high' high-low. and
high-high. as shonm in Figure 2.7. Again: the low-low subband typically contains the
![Page 34: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/34.jpg)
2.2. Wauelet Transfonn and hnage Compression
Figure 2.6: .An 1-D Octave-Band Decomposition
most energv and is decomposed several tirnes.
IMAGE
Figure 2.7: h 2-D Octave-Band Decomposition
With the filter bank structure developed, the next step is to design the filter sets
(Ho, Hl: Fo, f i } that satis. the perfect reconstruction property We will now describe a
general design approach. Refer to Figure 2.3, YovL are the output after analysis filtering
followed by downsampling, and 9 is the reconstruction after upsampling and synthesis
![Page 35: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/35.jpg)
2.2. Wavelet Transjonn and Image Compression 24
filtering. Let us first consider the effect of downsampling and upsampling a signal:
From the above equations, Ive can derive an equation for the reconstruction signal -s:
In Equation 2.29. the term proportional to S ( z ) is the desire signal. and ive want its
coefficients to be either unity or at most a delay 2-'. for some integer I . The term
proportional to -Y(-r) is called the alias term. which we would want to set to zero. This
leads to the following t m equations:
![Page 36: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/36.jpg)
2.2. CVu'avelet Transfonn and Image Compression 25
As mentioned before. the orthonormality of wavelets allow the design of only one filter.
Ho(:). and the other three can be derived from that:
To simplif- things, ive can let FQ(z) = HL(--). ancl Fi(:) = -Ho(-:). Thcn Equa-
tion 2.30 is satisfied. And let R ( z ) = z'~~(r)H~(r). after substituting to eqtiation 2.31.
we get:
Sote that if n e w i t e R(2) as a polynoniial in 2 . al1 even potvers must be zero. except
the coristant terni. whicli niiist be 1. The product filter c m be expresscd as:
Once R(:) is designed. Fo(z) and H&) can be obtainecl by factoring 2-'R(z). Finite
length two channel filter bank that satisfy the perfect recoristructioo property are called
the FIR perfect reconstruction Quadrature 41irror Filter (Q MF) Banks. A cletail accounts
of these filter banks can be lound in[10][36]. The? play an important role in image
compression.
2.2.4 Common Wavelet Schemes
The performance breakthrough of modern wavelet coders is due to the exploitation of
the correlation between parent and child subbands. Shapiro's embedded zerotree mavelet
(EZW) coder[30] is among the first to exploit this parent-child relationship and the set
partitioning in hierarchical tree (SPIHT) algorithm[29] further exploits the parent-child
![Page 37: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/37.jpg)
2.2. Wavelet Transfonn and Image Compression 26
relationship and zerotree structure. 90th EZW and SPIHT are well knorvn in the wavelet
community and this section provides a brief description for both coders.
Embedded Zerotree Wavelet Coder
The EZW is an effective wavelet coding algorithm for low bit-rate cocling because it
achieves scalability and efficiency mhile retaining a fairly low complesit. It has the
property that the bits in the bit stream are generated in order of importance. yielding
a fully embedded code. The embedded code represents a seqiience of binary decisions
that distinguish an image from the 'null" image. Using an embedded coding aigorithm.
an encoder can terminate the encoding at any point thereby allowing a target rate or
distortion metric to be met. The decoder can also cease decocling at any point in a giveri
bit stream and still produce the same image tliat would have been encodecl at the bit rate
corresponding to the truncated bit stream. EZW does not require training. pre-storage
or codebooks. or any prior knowledge of the image source.
The EZW algorithm contains a discrete wavelet transform: a zerotree structure whirh
provides compact multiresolution representation of significance map: successive appros-
imation of significance coefficients: a prioritization protocol which determines the order
of importance by precision. magnitude. scale and spatial location: adaptive mu1 t ilevel
anthmetic coding which performs entropy coding.
The discrete wavelet transform used in the algorithm employs octave-band decom-
position. The filters used are based on the 9-tap symmetric quadrature mirrors (QhIF).
One important aspect of low bit-rate coding is the use of zerotree coding. After scalar
quantization followed by entropy coding, the probability of the zero syrnbol is estrernely
high a t low bit-rate. Typically a large fraction of the bit budget must be spent on en-
coding the binary decision as to whether the coefficient has a zero or nonzero quantized
value. The zerotree structure can be used to irnprove the compression of significance map
![Page 38: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/38.jpg)
2.2. Wauelet Transfom and Image Compression 2 1
based on the hypothesis that if a wavelet coefficient at a coarse scale is insignificarit with
respect to a thresholtl T, then al1 wavelet coefficients of the same orientation in the same
spatial location at finer scales are also insignificant with respect to T. In a hierarchical
su bband systeni. ewry coefficient e t a given scale can be related to a set of coefficients
at the next scale of similar orientation. The coefficient corresponding to the same spatial
location a t the coarse scale is called the parent and al1 coefficients corresponding to the
same spatial location at the next finer scale of similar orientation are called children.
Figure 2.8 illustrates the parent-child dependencies.
Figure 2.8: Parent-Child Dependencies of Subbands
The coefficients are scanned in such a way that no child node is scanned before its
parent. An element of a zerotree is a zero root if it is not a descendant of a previous
found zerotree root. The significance map can be represented as a string of symbols kom
a 3-symbol alphabet: zerotree root, isolated zero and significant. Zerotree reduces the
cost of encoding the significance map using self-similarity. Zerotree like structures can
![Page 39: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/39.jpg)
8.2. ÇVavelet Transform and Image Compression
be applied to other subband configurations such as DCT. wavelet packets. etc.
After zerotree coding. successive approximation quantization is appliecl. The idea
is to sequentially apply a sequence of thresholds to determine significance. Each time
a coefficient is encoded as significant. its magnitude is appended to a list. This way.
embedded coding can be iichieved. Finally atlaptive arithmetic coding is used for entropy
coding. Since there are never more than three synbols. it is very quick for adaptation
algorithm to learn and keep track of changing syrnbol probabilities.
Image Codec Based on Set Partitioning in Hierarchical Trees
An alternative implementation based on the principles of EZW using set partitionirig
in hierarchical trees (SPIHT) was developecl. It surpasses the performance of the orig-
inal EZW and the encoding and decoding are estremel- fast. Three of the iinderlying
principles of EZW are employed: partial ordering of the transformed image elenierits by
magnitude. wit h transmission of order by a stibset partitioriing algorit hm t hat is dupli-
cated at the decoder: ordered bit plane t ratisniission of refinement bits: esploit at ion of the
self-similarity of image mavelet transform across clifferent scales. The crucial clifference of
this algorithm with EZW is the way subsets of coefficients are partitionetl and how the
significance information is conveyed. Also arithnietic coding of bit s t r e m s are necessary
in EZW, and in this algorithm, the subset partitioning is so effective that the significance
information is compact enough for bina. uncoded transmission achieves about the same
or better performance. The encoding algorithm can be stopped a t any cornpressecl rate
or let run until it achieves a nearly lossless image.
One of the main features of this algorithm is that the ordering data is not explicitly
transmitted. The encoder and decoder have the same sorting algorithm so that the
decoder can recover the ordering information from encoder's execution path. The sorting
algorithm divides the set of pixels into partitioning subsets and performs magnitude
![Page 40: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/40.jpg)
2-3. h n m n Visual System and Perceptuai hlodei 29
test on the maimiim coefficient of the subset against certain threshold. If the siibset is
insignificant . then al1 coefficients in the subset are insignificant . if the subset is significant.
then it will be partition into new subsets. This division continues until the magnitude
test is clone to al1 single coordinate significant subsets in order to identify each significant
coefficient. A set partitioning rule using ordering in the hierarchy defined by the siibband
pyramid is defined to reduce the nurnber of magnitude comparisons. The objective is
to create new partitious such that subsets expected to be insignificant contain a large
number of elements and subsets expected ta be insignificant contain only one element.
Similar to the zerotree, the spatial orientation tree naturally defines the spatial rela-
tionship on the hieraschical pyramid. Each node of the tree corresponds to a pisel. ancl
its direct tlescendants correspond to the pixels of the same spatial orientation in the next
finer level of the pyramid. The tree is defined in such a way that each node has either no
offspring or four offspring. which always forni a group of 2x2 adjacent pixels. This tree
structure is used in partitioning subsets in the sort ing algorit hm.
This SPIHT algorithm uses the principles of partial ordering by magnitude. set par-
titioning by significance of magnitudes mith respect to a sequence of octavely decreasing
threshold. ordered bit plane transmission. ancl self-similarity across scale in an image
wavelet transform. This algorithm realizes these principles in matched encoder and de-
coder? and its performance surpasses original EZW algorithm.
2.3 Human Visual System and Perceptual Mode1
One of the major limitations in digital image systems is the lack of nell-accepted image
quality metric. Commonly used error measures such as mean-squared-error (-VISE) or
peak signal-to-noise ratio (PSNR) operate on a pixel-by-pixel base and neglect the im-
portance of image content such as edges, textured regions, and large luminance variations;
![Page 41: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/41.jpg)
2.3. Hurnan h u a l Systern and Perceptual Mode1
and the viewing conditions on the actual visibility of artifacts. \+*hile these mesures are
simple. they do not correlate well with perceived quality. Therefore in many cases. de-
signers have to resort to subjective tests in order to obtain reliable ratings for the quality
of compressed images. These tests are usually complex and t ime-consuniing, and thus
often impractical.
The missing link between the physical paranieters and the subjective viewing of the
end users ni- be established using a visual model. which allows direct psychophysical
measurement as a function of physical parameters. In response to these problems. a
number of objective quality assessments that incorporate perceptual factors have been
proposed[l2\. There is a broad range of applications for objective cpality metrics includ-
ing:
evaluating and comparing image codecs:
quality monitoring and control:
perceptual image compression and restoration.
.Jayant[l7] gives a general description on the application of perceptual quality metrics in
signal processing. Ahumuda[?] provides a summary of perceptual models for image qüal-
ity assessment. Daly[9] reviews a number of visual factors that should be incorporated
in the perceptual models. An ideal metric based on models of human visual system can
achieve consistency and accuracy in quality assessment and improve visual quality in de-
signing compression algorîthms; howver. the human visual system is estremely complex
and is still not well understood. As we acquire additional knowledge of visual factors.
it can be expected that a mode1 providing consistent performance over a wide range of
images can be developed.
This section provides a brief review on several prominent phenornenon of the human
visual system? and how they may be incorporated in perceptual models. It outlines the
![Page 42: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/42.jpg)
2.3. Hurnan Visual System and Perceptual Mode1 31
advantages and limitations of a number of qiiality metrics. The section concludes with
the validation and evaluation of the perceptual quality metrics.
2.3.1 Quality Factors
The viewing condition and image content play an important role in quality assessrnent.
Research has shown that image quality deperids on viewing distance. ciisplay size. reso-
Iution. brightness, contrast. sharpness, colorfulness. and so on[3][44]. It is often useful
to relate these factors in visual modeling. For instance. the viewing distance is usu-
ally specified in terms of display size. One of the reasons for doing this is based on
an assurnption that the preferred viewing distance to screen height is constant. Recent
experiments show, however. that this assumption is only true for smail displays. where
the preferred viewing distance is around 6 to 7 screen heights. but the preferred viewing
distance approaches 3 to 4 screen heights with increasing display size[l]. Display reso-
lution is another important quality factor. In vision modeling. the size and resolution
of the image projected onto the retina are more meaningful measures. Given a viewing
distance d in inches. and a display resolution in r pisels/inch
resolution (DVR) u in pixeis/degree of visual angle is
the effective display visual
(2.37)
The maximum spatial frequency corresponds to the perceptual Nyquist frequenc- which
is half the display resolution: f,, = v / 3 . Sorne illustrative examples are given Table 2.1.
The optics of the eye constitute the first processing stage in the human visual system.
.4lthough the optical characteristics of each individual V a r y considerable, they are cor-
related in such a way that healthy eyes can produce sharp image of a distant object on
![Page 43: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/43.jpg)
2.3. Human Visual System and Pe~ceptvnl Mode1
Table 2.1: Examples of Visual Resolution for Various Displays[-!l]
the retina. The retinal image is a distort.ed version of the input. and the most noticeable
distortion is blurring. Due to a wriety of phpical and geornetrical optical factors, a point
object gives rise to a retinal distribution that is bell-shaped in cross section. It is cailecl
the point spreacl function. There are a number of visual factors that contribute to the
spreading of Light. For small pupil diameters up to 3-4 mm. the point spread function
approaches diffraction limit, which is given by:
Display
Computer Display
HDTV
Low Quality Pinting
High Quality Pinting
where d is the pupil diameter, A is the wavelength of light. p is the visual angle in radians.
and J1 is the first-order Bessel function. -1s the pupil diameter increases. the width of
the point spread function also increases because the distortion due to cornea and lens
imperfections becorne large compared to diffraction effects. Current best estimation of
the foveal point spread function of the hurnan eye is proposed by Westheirner[42]:
Resolution Distance DVR
(pkeIs/inch) (inches) pkels/degree
12 1'2 15.1
300 12 60.3 - 300 12 62.8
1200 12 251.4
where p is distance in minutes of arc from the image. This function is illustrated in
Figure 2.9, and it applies to standard viewing conditions of white targets with pupil
diameter in the vicinity of 3mm.
![Page 44: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/44.jpg)
2.3. Human Visual Systern and Perceptual iI.lodel
Figure 2.9: Point Spread Function of the Human Eye[42]
2.3.3 Color Space
After the image is projected ont0 the retina, the photoreceptors sarnple the image and
convert it to signals interpretable by the brain. There are tmo different types of pho-
toreceptors. rods and cones. Rods are responsible for vision at low light levels. and the?
can be neglected for the applications considered here. Cones are responsible for vision at
higher light levels. There are three types of cones: L-cones, SI-cones, and S-cones, and
they are sensitive to long, medium and short wavelengths, respectivelq-. see Figure 2.10.
They form the basiç of color perception.
Various color spaces have been developed for different purposes. The common red.
green, and blue (RGB) color model is used in color CRS monitors and color raster
graphies, and it employs a Cartesian coordinate system.
The hue, saturation and value (HVS) color model is user oriented, being based on the
intuitive appeal of the artist's tint, shade, and tone. The coordinate system is cylindricalt
and the subset of the space Nithin which the model is defined is a hexcone.
![Page 45: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/45.jpg)
2- 3. Human Visual Systern and Perceptual Mode1
Figure 2.10: Sensitivity of the Three Types of Cone
The opponent color theory states that the sensations of red and green as well as
blue and yeliow are encoded in separate visual pathways[44]. The principal cornpo-
nents of opponent-colors space are black-white (B-CV). red-green (R-G), and blue-yellow
(B-Y). The B-W channel, which encodes luminance, is determined mainly by medium
to long wavelengths. The R-G channel discriminates between medium and long wave-
lengths, while the B-Y channel discriminates between short and medium wavelengthç.
The opponent-color space have an advantage in psychophysical experiments based on
opponent-color stimuli because their channels can adapt to these stimuli. which facili-
tates mode1 design and analysis.
The perceptually uniform color spaces: CIE L'u'u* and CIE L'a'b*. have also been
proposed for vision rnodels. Thep are d e h e d çuch that the Euclidean distance between
color coordinates in these spaces provides an approximation to the perceived difference.
![Page 46: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/46.jpg)
2.3. Human Visval Systern and Perceptual Mode1 35
This can be advantageous in vision modeling since they tx-y to determine the amount of
perceiveci clifference between reference and test images.
The YtC',Ck color space is used in many standards. including PAL, NTSC. JPEG.
MPEG. etc. I t takes into account certain properties of the human visual system: the
opponent color theory? the fact that human is less sensitive to color than to luminance.
and the nonlinearity of the hunian visual system. It happens that the conventional CRT
displays also have a nonlinear relationship between signal voltage c and display intensity
1:
Applying the inverse of this function is referred to as gamma correction. Coincidentally.
the lightness sensitivity of hurnan vision is close to the inverse of the function 2.40.
Therefore. coding images in the gamma-corrected domain is not only more meaningful
perceptually. but also compensates for CRT nonlinearit ies. I "CBC', operates in gamma-
corrected domain, where Y' is luminance. CL is the difference between blue primary and
luminance. and Ck is the difference between red primary and luminance. The conversion
formula between YtCbCR and standard CIE 1931 XYZ crin be found in Appendix A.
2.3.4 Contrast Sensitivity
Contrast is a rneasure of the relative variation of luminance. It is an important concept
in human vision because we perceive light in terms of contrast rather than the absolute
luminance level. There does not exist a unique definition of contrast suitable for al1
stimuli. For periodic pattern of symmetrical deviations ranging from Lmin to Lm=?
Michelson contrast is often used:
![Page 47: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/47.jpg)
2.3. Hurnan Visual Systern and Perceptual Mudel
For pattern with a single increment or decrement I L to an uniform background lumi-
nance L. Weber contrast is often used:
Neither of these two definition is appropriate for rneasuring contrast in complex images.
Peli[25] proposed a local band-limiteci contrast for comples images. where the image is
decomposed into a pyramid of lowpass and bandpas subbands.
assignecl to every point in the image as a Eunction of the spatial
ivhere BP,(x. y) is the bandpass image of band i' and LP&.
and a contrast value is
freqiiency band:
y ) contains the energy
below band i. This definition is in good agreement with psychoph~sical contrat-masking
experiments with Gabor patches(-51. Xotice also that the subband coding For compression
bears a resemblance to the structure used here. which means that the contrast value using
this definition can be obtained easily mith the subband coding scheme.
The minimum contrast necessary for an observer to detect the difference is defined
as the contrast threshold. Contrast sensitivity is the inverse of the contrast threshold.
Contrast sensitivity functions (CSF) are used to quantify the dependency of the con-
t r a t sensitivity on frequency of the stimuli. There are a number of estimations of the
contrast sensitivity function in literature[l][9][18]. The shape of the CSF curve can be
altered greatly by various stimulus configurations. Generally the CSF is assurneci to
be a bandpass function. Achromatic contrast sensitivity is generally higher than chro-
matic- especially for high fiequencies. The full range of color is perceived only at low
fiequencies. As frequency increases. blue-yellow sensitivity declines first, then red-green
sensitivity begins to diminish, and the perception becomes achromatic.
![Page 48: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/48.jpg)
2.3- Humun Visual System and Perceptval Illudel
Masking refers to the phenomenon whereby the visibility of a signal is reduced due to
the presence of another signal. In the context of image compression. it is usually helpful
to regard the distortion being niasked by the original image acting as background. In
+ion mode!s7 t ~ m typer of n i ~ k i n g are d t ~ n mn+br~r l : in t rn-rhsnnd masking and
inter-channel masking. Intra-channel masking models the masking occurred between
stimuli located in the same frequency channel. It usually includes two types of masking,
luminance masking and testure masking.
The ability of human eyes to detect the magnitude difference between an object and
its background is dependent on the average value of background luminance. According to
WeberYs Law[23], if the luminance of a test stimulus is just noticeable from the surroiind-
ing luminance. the ratio of just noticeable luminance difference to stimiilus?~ luminance
is almost constant. However. due to the ambient illumination on the display. the noise in
dark areas tends to be less perceptible than that occurring in regions of high luminance.
In general, high visibility thresholds occur in regions of gray levels close to the niid-gray
luminance. A psychophysical esperirnent conducted by Safranek[2S] yields a brightness
adjustment curve: as s l o w in Figure 2.11.
Texture masking refers to the reduction in visibility of stimuli due to the increase in
spatial nonuniformity of background luminance. In many vision models, visibility thresh-
olds are defined as functions of the amplitude of luminance edge in nhich perturbation
is increased. Simple image structures such as edges or curves have only a srnall degree of
masking compared to texture regions because the observer typically has prior knowledge
of how those simple patterns look like. Homever, the visibility threshold in a texture
region may decrease as the observer becomes farniliar with the image.
Most vision models are limited to intra-channe1 masking. However, recent psy-
![Page 49: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/49.jpg)
23. Human Visval System and Perceptual Mode1
Correction n
Figure 2.11: Brightness Adjustnient Curve
chophsical experiments suggest that maçking also occurs between channels of different
orientation[ll]. Therefore. we should take into account the inter-channel masking in
vision modeling as well.
Care must be taken in incorporating masking for perceptual coding since the mask-
ing models obtained through experiments are highly dependent on the masker and the
target stimulus. The masking threshold Ml1 vary with the stimuli's bandwidth. phase,
orientation, as well as the familiarity of the observer to the stimuli. Incorrect predictions
of masking can be the prirnary cause of failures in perceptual modeling.
2.3.6 Multi-resolution Structure
The neurons in the primary visual cortex serve as oriented bandpass filters, and they
respond to a certain range of spatial frequencies and orientations about its center values.
![Page 50: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/50.jpg)
2.3. Human Visual Systern and Perceptual Mode1
For achromatic visual pat hways? it is estirnated that the spatial frequency bandwidth is
approximately 1 to 2 octaves and the orientation bandwidth is about 20 to 60 degrees.
The chromatic pathways are açsurned to have similar spatial frequency bandwidth. but
their orientation bandwidth are significantly larger? ranging from 60 to 130 degrees[44].
Given these bandwidths. the spatial frequency plane for the achromatic channel can be
covered by 4-6 spatial frequency-selective and 4-8 orientation-selective mechanisms. For
chromatic channels. 2-3 orientation-selective mechanisms are sufficient.
The fundamental requirement in incorporating the above visual characteristics in
vision rnodels is the joint localization in space. spatial frequency and orientation. The
design of pyramid structure with self-similar filters and d p d i c subsampling is appealing.
It has been adopted in many vision models. Again. we see that the nwele t transform
offers a natural pyramid structure for dealing with the multi-resolutional characteristics
of the visual system.
Error Summation
It is often necessary in rnany applications to use a single nurnber to indicate the image
quality. There is thus a need to integrate the 3-D distortion maps for various channels
and convert them into a scalar. It is believed that the brain integrateç information in
various channel according to
summation provides a good
where e(n) is the perceptual
rules of probability or vector summation[d-t]. The Minkowski
estimate for probability summation:
n
error a t location n. and the exponent ,$ determines the slope
of the psychornetric function near threshold. Different exponents ,8 have been found to
yield good results for different e-xpenments. ,L? = 2 corresponds to the ideal observer under
independent Gaussian noise, which assumes that the observer has complete knoivledge
![Page 51: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/51.jpg)
2.3. Hurnan Visual System and Perceptual iWodel
of the stimuli. Higher exponents (0 = 4: 5 ) are used based on the intuition that a few
high distortions tend to draw the viewer's attention more than many lower ones.
Alternatively, the distortion can be computed locally for every pixel. yielding a per-
ceptual distortion map for better visualization of the distribution of distort ions. Such
distortion maps can help the designer to better identify problems in the encoder.
2 A 8 Psychovisual Validation
Once a perceptual mode1 is developed? it must be validated by some subjective tests.
However, the accuracy and robustness of the validation are highly depenclent on the
psychovisual experiments used.
One simple approach is to use a rating scale to evaluate a set of images as suggested
in CCIR Recommendatioii 500-3. The obscners rate each image with a scale froni 1
to 5 . indicating bad. poor. fair, good or excellent. The scores are then a n a l p d by
some statistic techniques. The limitation of this approach is that i t can only differentiate
images with relatively large differences in quali t . It can also be inconsistent with different
type of artifacts such as ringing artifacts and block artifacts. Modifications such as rating
sub-regions of the image instead of the whole image are made to this rating approach.
h pair comparison approach is often used for compression tests. A set of images corn-
pressed with different methods or a t different ratios is compared with each other. The
CCIR Recommendation 500-3 recornmends a scale from -3 to 3 corresponding to much
worse, worse, slightly worse, same, slightly better, better, and much better. The advan-
tage of this approach is that images with difFerent types of artifacts can be compared.
in high bit-rate applications, it is often important to measure the just noticeable dis-
tortion (JND) point. There is a technique to ewluate the ability of the perceptual mode1
to predict the JND point. Both the original and the cornpressed images are displayed to
![Page 52: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/52.jpg)
2.3. Human Visual System and Perceptual klodel 41
the observers alternatively for a short period of time (Le.. 1 second), then the observes
decide which is the original image and which is the compressed image. The JND point is
typically defined as the compression point at which the observer correctly identifies the
compressed image 75% of the time. The display time and the familiarit- of the observers
to the artifacts in the images play an major role in the JND esperiments. To reduce in-
consistency, these experiments should be performed under consistent viewing conditions.
i.e., fked display tirne, with limiting number of times each image is displayed. and with
observers familiar with the type of artifacts introduced by the particular compression
process.
![Page 53: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/53.jpg)
Chapter 3
Perceptual Image Codec
The block diagram of the perceptual image codec is shown in Figure 3.1. The decoder
is the same as in the general transform coding scheme. and is fairly straight fonwird.
The encoder is able to cornpress the image witti a specified bit rate or perceptual quality.
And the use of rnulti-rate quantization and entropy coding allows the encoder to generate
ernbedded coding so that both the encoder and the decoder can terminate at an' point
and be able reconstruct the image at a l e s bit rate. In some applications a particular
region is more important than the rest of the image. and it might be desirable to have
certain quality control and encode it with a predefined perceptual quality. Therefore the
encoder includes a region of interest (ROI) request that can be set to encode the ROI
ivith a specified perceptual quality. The rest of the image is encoded with the remaining
bit budget. The encoder contains a perceptual model which produces a weighting factor
WJND for each subband coefficient. This factor indicates the importance of each coef-
ficient in contributing to the visibility of the image. When the ROI request is set. the
perceptual model aIso outputs the quantization step for the ROI region with the given
perceptual quality. The Y'CLCR color space is selected for the codec because it is widely
used and it takes into account some properties of HVS. Each component of the encoder
![Page 54: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/54.jpg)
3.1. Overview of the Algorithm
is addressed below, and the experimental results are give in the 1st section.
Percep tnd
a) encoder
Figure 3.1: Block Diagram of the Perceptual Image Codec
Rit
~irmrnl Dccoding
3.1 Overview of the Algorithm
The encoder is consisted of the foUoMng major functions:
lnvwse Inverse Dccodcd Image
Qumtization Transformation
![Page 55: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/55.jpg)
3.2 CVuvelet Transfom 44
0 Tmnsfonn performs the wavelet decompostion: it tâkes the original image and out-
puts the wavelet coefficients for each subband: the parameters include the number
of decompostion levels and filter coefficents.
a HGLnnodel uses the perceptual model to calculate the weighting factor for each
ro~ffirient: it takes in the wavelet coefficients and the original image and outputs
the weighing factors and the quantization step-size for the ROI if the option is
set: the parameters include viewing distance, display resolotion. and psychovisual
parameters.
a Qvantize performs the quantization on the wavelet coefficients: the quant izat ion
stepsizes for each subband are determined by minimizing the perceptual error using
the weighting factors From the H\.Srnodel: it takes in tlie wavelet coefficients and
the weighting factors and outputs the quantized coefficients: the parameters include
smallest quantization stepsize. maximum number of quantizers and precision for
quant ized coefficients.
Coder performs the entropy coding using adaptive arithmetic coding; it takes in the
quantized coefficients and outputs the bitstream: the parameters include histogram
capacity and adaptive model.
The decoder reverse the process and it includes Decode. Dequantire, and Inverse-
Transfonn. The perceptual model is not needed in the decoder.
3.2 Wavelet Transform
The wavelet transform in the perceptual codec employs an octave-band decomposition
using a two-channel perfect-reconst ruction analysis/synt hesis fil t er bank, as s h o m in
![Page 56: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/56.jpg)
Figure 2.6. Let level, 1. denotes the number of filter stages. and orientation. O. denotes
the four possible combinations of Iotvpass and highpass filters. The orientation is indered
as follows: {0,1.2.3} = {LL, HL: LH, HH}. Each combination of level and orientation
(1, O ) specifies a single band. Figure 3.2 illustrates this t e rn l i no lo~ using a three lerel
decomposi t ion.
Figure 3.2: Subband Indexing (1.0j
There are two factors needed to be determined in the transform stage: the number of
decomposition levels and the wavelet filterset. There is no rules in selecting the number
of decomposition levels. It usually ranges from 3 to 6 levels. After conducting some
experiments, we found a decomposition level of five is suitable for our purpose. Some of
the best known filters for image coding are included in Our codec. It includes the set of
filters evaluated by \.'illasenor[38], a linear-phase 9/7 pair from Odegard['L4]: and a fea
Dsubechies filters. Table 3.1 gives the coefficients of the filters used in our esperiments
in addition to the filters found in [38]. The performance difference between these filters
is mainly image dependent. It is possible to add a filter selection stage by applying al1
![Page 57: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/57.jpg)
3.3. Perceptual iModel
Table 3.1: Filter Coefficients (al1 coefficients start at zero, for biothorgonal filters. the first row
is analysis filter and the second row is synthesis filter. and they are symmetric about zero) - - -- -
Filter 1 Coefficients
Antonini ( 0.85269. 0.3'7740. -0.1 1062. -0.02355. -0.03783
Villa 1 0.37528. -0.02385. -0.1 1062. 0.3774. 0.8327
Oclegard
the different filters on the image. Then select the filter that produce the least amount
of significant coefficients. rvhich often results in highest compression ratio. as suggested
in[2f 1.
0.658848. 0.415092. -0.04069. -0.06454
O.6XZ 1. 0.38697. -0.0930'7. -0.03343. 0.0523'7
3.3 Perceptual Mode1
The perceptuai model used in Our codec takes into account contr ast sensi t ivi t~ r at different
frequency subband, local background luminance and texture. and contrast masking. It
produces a JND threshold for each subband coefficient, CVJND(x. y. 1. O ) . mhere (1, O)
specifies the subband and (x. y) is the spatial location of the coefficient. This threshold
can then be used to calculate the perceptual error for each coefficient.
![Page 58: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/58.jpg)
3.3. ferceptual iCfudel
3.3.1 Contrast Threshold Function
The contrast sensitivity threshold model used here is based on Watson's model[U]. This
intensity based model accounts for display resolution. viewing distance. and the level ancl
orientation of the snbband.
A set of psychovisual experiments performed in[41] inclicated the Following factors:
The contrast sensit ivity declines wit h increasing spatial freqiiency.
The size of the noise stimuli decreases wit h increasing spatial frequency.
The noise amplitudes are typically very close to the basis functiou amplitudes.
We can see that spatial frequency is the dominant factor in the contrast thresholcl frinc-
tion. The spatial frequency can be obtained From the effective display visual resolution.
as given in Equation 2.37. The discrete rvavelet transform operates essentially by bi-
secting a frequency band at each level. At the first level. the spatial freqiiency is taken
as the Nyquist frequency of the display resolution (Le.. @). The spatial frequency of
subsequent levels will be halved at each level. Therefore. for a display resolution of c
pkelsldegree. the spatial resolution at level 1 is:
f ( l ) = UT' cyclesldegree
The noise threshold for the luminance component is estimated as:
log kiWd = loga + k(10g f - log fogo)' (3.2)
where a = 0.495, k = 0.466, fo = 0.401. go = 1.5011 gi = g2 = 1. 93 = 0.534. The term
a defines the minimum threshold. The term go takes into account the effect of orientation
on spatial frequency. Orientation 0 = O is approximately a factor of two lower in spatial
frequency than orientation O = 1: 2. However, since the signal energy of orientation
![Page 59: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/59.jpg)
3.3. Perceptual IlIodel 48
O spread over al1 orientations' it is less visually efficient than to concentrate them at a
narrow range, as in orientation 1 and 2. Thus go is set to be less t han 2 . For orientation 3.
the spatial frequency is about fi above that of orientation 1 and 2 due to the Cartesian
splitting of the spectrum. But since the spectrum in orientation 3 is distributed over
two orthogonal orientat.ions (45'. 135'). again the parameter g : ~ is less than fi. For
the chrominance channels Cb and Cr. the effects of spatial freqiiency and orientation are
similar to those of the 1' chanuel. Horvever. their thresholds are generally higher by a
factor of two For Cr threshold and a factor of four for Cb threshold. Figure 3.3 shows
the thresholds obtainecl throiigh esperiments for al1 three color cornponents at different
orientation and spatial frequencies.
Spatial F requency (log cyldeg)
Figure 3.3: Estimated Thresliold for Y(bo t tom), Cr (middle) . and Cb (top) [4 11
Since the noise amplitude resulting from uniform quantization is approximately the
basis function amplitude, the contrast threshold function can be computed as:
where .-llae is the b a i s function amplitudes for the popular Antonini 917 DWT: as given
in Table 3.2. The basis function amplitudes for other wavelet basis can be obtained
through experiments.
![Page 60: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/60.jpg)
Table 3.2: Basis Function Magnitudes for 6 levels of an An-
tonini 917 DWT
Orientation Level
I 3 I 3 4 a 6 -
3.3.2 Luminance and Texture Masking
The visibility threshold due to average background luniinance and testure masking c m
be described by the following expression['i]:
where ft represents the visibility threshold due to texture masking, and fi represents the
visibility t hreshold due to average local background luminance. mg ( L . y) denotes the
maximal weighted average of luminance gradients arounci pixel @.y). which indicates the
busyness of that region; bg(x. y) is the average local background luminance. The twvo
functions ft and fi are defined as follows:
( qI(bg(xT y) - 127) + 3 for bg(x. y) > 127
a(bg(xl y)) = bg(x. y) 0.0001 + 0.115 (3.7)
3 1 where To = 17. y = A = - 2 '
![Page 61: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/61.jpg)
3.3. Perceptual ibhdel 50
The value of mg(r? y ) is determined by performing a weighted average of the lumi-
nance changes around the pkel ( x ~ ) in four directions (O0. 43'. 90": 180°). The weighting
coefficient decreases as the distance from pixel (r ,y) increases. Four gradient operators
are used to calculate the luminance changes in each direction:
Let the pixel value at (s.y) be p(x. y): the d u e of mg(r. y) can be calculated as follows:
The average local background luminance bg(x: y) is calculated using a weighted low-
![Page 62: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/62.jpg)
3.3. Perceptual Mode1
p a s operator:
The perceptual coefficients resulting from the luminance and texture rnasking for each
su bband are calculated as follows:
3.3.3 Contrast Masking
As described in Section 2.3.5. contrast rnasking refers to the reduction in visibility of
a signal by the presence of another. In image compressiono the signal ive want to be
masked is the quantization noise. In Our model. we consider the masking effects from
signals wit hin the same channel (intra-channel masking).
The increase in the visual threshold due to a large coefficient magnitude at the same
location in the same subband can be taken account into by an adjustment coefficient
, . It is modeled by a non-linear transducer function[34], as shown in Figure 3.4.
The adjustment coefficient can be calculated as:
where i ( x . y, 2: 6) is the subband coefficient, and c is the dope of the line in Figure 3.4,
![Page 63: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/63.jpg)
3.3. Percep tua1 Mode1
and it depends on the distortion measure. LVe found through experiments that in our
mode1 e = 0.32 is appropriate.
Figiire 3.4: Nonlinear Transciucer Funct.ion
3.3.4 Perceptual Distortion Metric
In order to minimize perceptual distortion resulted from compression. we need to have a
perceptual distortion metric. The probability summation mode! is used in determining
Our distortion metric. The probability of detecting distortion a t the location of a subband
coefficient is determined by the psychornetric function:
and /3 is chosen to be 4. e(x. y, 1,8) denotes the quantization error a t location (s.y) and
subband (1 O ) :
![Page 64: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/64.jpg)
3.4. Multi-layer Quantzzer and Entrop y Coder 53
The overall probability of the observer noticing the distortion at subband (1. O ) is:
It is clear that minimizing the probability of detecting a difference in the subband is
equivalent to minimizing the metric D(r,e).
3.3.5 Region of Interest Quantization
The region of interest (ROI) is quantized by a different set of cliiantizers ciefined to satisfy
certain perceptual qiiality. Since the maximum quantization error for uniform qtiantizer
with quantizer step Q is Q/2. the quantization step for each subband is calculated as:
where s is the quality adjustment factor. When .s is set to 1: the distortion is not
perceptible under the predefined condition.
3.4 Multi-layer Quantizer and Entropy Coder
A multi-layer uniform quantizer is used in our codec. This quantizer coupled with a
multi-layer entropy coder enables the embedded coding of the compressed images. Before
quantizing the subband coefficients. the quantizer step for each subband needs to be
determined for a given bit budget. This is accomplished through a rate-distortion based
![Page 65: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/65.jpg)
3.4. Multi-layer Quantizzr and Entropy Coder
algori tlirn using integer programrning(3lj. The goal is to minimize the overall clistortion
for a given bit rate. Let Rr be the total bit budget. R, be the bit rate for each subband'
and D be the overall distortion. then ive want to find the minimum D such that:
xhcrc I< is the total n r m b ~ r cf subbands Tn qimplify things. we define t h e overall
clistortion D as:
where Di is the perceptual distortion for each subband. Given these conditions. the
optimal solution can be achieved by the ive11 known constant-siope condition. Fint.
n e define a cost funtion conibining the rate and distortion through a positive Lagrange
mu1 t iplier:
Nest. let us express Di as a function of rate. and set the derivative of the cost funtion to
zero to find the minimum D with respect a specific Ri:
Di(&) is an operational distort ion-rate function which will depend on the quant izat ion
scheme for the subband. The solution to Equation 3.24 is unique if we assume Di(&)
to be ~ont~inuous and convex. In conclusion. for a solution to be optimal. the set of
chosen rates have to correspond to constant-slope points on their respective weighted
distortion-rate Cumes. This is illustrated in Figure 3.5.
To allocate the optimal bit rate for each subband, we first construct a bit allocation
tree consisting of K subtrees corresponding to al1 of the subbands? as shown in Figure 3.6.
![Page 66: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/66.jpg)
3.4. Multtlayer Quantizer and Entropy Coder
Figure 3.5: Rate-Distortion C u n w mith Optimal Solution
Each subtree is composed of N nodes. with each notle n corresponding to a specific point
(Ri ( n ) , D i ( n ) ) on the rate-distortion ciirve of that siibhand. ln our codec. ne loiincl thet
N=10 is sufficient. and the quantization steps at each node is set to exponcntial of two.
Le. Q(n) = 2". The rate increases and the distortion decreases when trav~rsing down
a subtree. The topmost node of a subtree corresponds to the zero-rate point wliere the
distortion is maximum. At each node n. ive also define the parameter A,(n) üs the ratio
between AD and AR, where A D and AR denote the magnitude differences of distortion
and rate between the current node and the leaf node X:
The bit allocation algorithm begins with the initial tree and obtained a series of pruned
trees iteratively. At each iteration: the node having the srnallest A&) is pruned since
this represents the best trade-off betmeen rate and distortion a t that step. The subtree
containing the pruned node has a new leaf node, and the A&) must be calculated for
al1 the remaining nodes in that subtree. At any iteration, the overall rate of the tree
![Page 67: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/67.jpg)
3.4. hlultz-layer Quantizer and Entropy Coder
can be calculated from the leaf nodes of al1 of the subtrees. The algorithni terminates
when the total rate falls below the target rate. Each subband is then assigned an optimal
quantization steps according to the leaf nodes of the final pruned tree.
Figure 3.6: Bit Allocation Tree
The multi-rate quantization scheme used in the codec is equivalent to the successive
approximation quantization (SAQ) used in Shapiro's embedded zerotree wavelet coder.
But instead of Huffman coding, the rnulti-rate quantization is coupled with an adaptive
anthmetic coding. The multi-rate coding is achieved by progressive quantization and
coding of each subband in a sequence of N layers, representing progressively finer quan-
tization step sizes. Let us define a set of N quantizers, QI, . . . , QN, and N quantization
![Page 68: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/68.jpg)
3.5. Experimental Results
Layers LI: . . . . L x . The symbols for quantizer QI are encoded into layer LI. while the
information necessary to recover the symbols for quantizer Q,. given that the symbols
for quantizers QI. . . . ? Q,- are known. is encoded into layer Ln. In this way. the decoder
is able to recover the subband coefficients quantized by any of the quantizers. Q,. by
decoding layer LI.. . . . Ln only. One of the objectives in this approach is for the total
number of bits to encode layers Li , . . . . Ln to be approximately the sanie as the number
of bits required to encode the output of quantizer Q.. This way the coding efficiency
is not sacrificed in obtaining the multirate property. It can be provecl that for pulse
coded modulation (PChI) coding, this coding efficiency goal can be achievecf if and only
if every quantization interval of Q,, 1,: is contained in sorne quantization intemal of
Qn-i. 1,-i[33j:
where ICn are integers. In Our codec Kn is set to two for al1 n.
Arithmetic coding is used in the multi-rate coder because it is able to approach the
information theoretical lowerbounds for encoding each laver arbitraril:; close. Huffman
coding is not a viable alternative here because it cannot realize bit rates of l e s than
one bit per syrnbol. Since the number of symbols in each layer is relatively small. the
arit hrnetic coder can adapt quickly.
3.5 Experimental Results
Four color images of size 512x512 were selected in our experiments: lenna. Bower. pepper,
and baboon. They are shown in Figure 3.7 - 3.10. These images include a wide varie-
of image features such as facial region, natural scene, texture region. bright colors. etc.
In addition to our perceptual wavelet coder (PFW), tmo more compression coders were
![Page 69: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/69.jpg)
used for cornparison purpose: basic wavelet coder wit hout perceptuai mode1 (BWC) and
JPEG coder. The images were compressed at nine different compression ratios, from
10:l to 90:l. The perceptual wavelet coder was designed For a visual resolution of 32
pkelsidegree. That is. for a display resolution of 40 pLuels/cm (typical resolution for
cornputer monitor), the coder is optimized for a viewing distance around 45 cm.
Figure 3.7: Original 'ienna'
Figure 3.8: Original 'flower'
![Page 70: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/70.jpg)
3.5. Experimental Results
Figure 3.9: Original 'peppert
Figure 3.10: Original 'baboon'
The peak signal-to-noise ratio (PSSR) was obtained for each image at different corn-
pression ratios. The resulting PSXR versus compression ratio Cumes for each image with
al1 t hree coders are shown in Figure 3.11. The ûrst thing we can notice from Figure 3.1 1
is that the PÇXR for JPEG coder is consistently lomer than the mavelet codew, especially
at high compression ratios. This is not surprishg since we knorv mavelet coding generaliy
![Page 71: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/71.jpg)
3.5. Ezperimental Results
offers better coding gain than DCT-based coding. The basic wavelet coder has a slightly
higher PSNR than the perceptual coder. This also was expected since the basic wavelet
coder is optimized to minimize the mean-square error (hISE) while the perceptual coder
is opt imized t O minirnize the percep t ual error met ric.
PSNR for Flower 180r 1
- - BWC
compression ratio
PSNR for Baboon
I - JPEG 1
compression ratio
PSNR for Lenna
compression ratio
PSNR for Pepper
- - BWC
O 20 40 60 80 100 compression ratio
Figure 3.11: PSNR Curves for .Al1 Three Coders
Let us now look at each image for subjective evaluation. First. we mil1 consider the
JPEG coder versus the wavelet coders. At low compression ratios, al1 three coders per-
Form quite well, and there are no perceptible distortions in the reconstructed images. As
![Page 72: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/72.jpg)
3.5. Eqerimental Results 61
the compression ratio increases. the distortions from wavelet coders appear as blurriness
around edges. also known as the ringing effect. The distortion resulting from JPEG coder
takes the forni of block artifacts. wliich is generally more distracting than the ringing
effect because it gives images a disjoint look. Figure 3.12 - 3.15 illustrate the two types of
distortion using 'flower' and 'baboon' compressed by the perceptual coder and the JPEG
coder. We can see that the perceptual quality of the wa~elet coded images is better than
that of the JPEG coded images. Also. a t high compression ratios. the image quality
using JPEG compression degrades niore rapidly than that using wavelet compression.
Next we will compare the subjective performance between the basic navelet coder and
the perceptual wavelet coder. The perceptual coder generally clemonstrates less ringing
effects t han the basic coder. As evident from Figure 3.16 - 3.19. the basic wavelet produces
more noticeable distortions at the edges in both 'lenna' and .flower'. The performances
of the two cotlers on 'pepper' are similar. escept at very high compression ratio. nhere
the perceptual coder seems to be able to preserve the testure of the pepper better than
the basic coder. as shown in Figure 3.20 and 3.21. For the 'baboon'. the texture region
of the reconstriicted images differs for the t~vo coder. The basic navelet coder produces
many srnüll white holes in the texture region. which is quite risible. The perceptual
wavelet coder bliirs out the region. which tends to be less perceptible. Figure 3.22 and
3.23 dernonstrate the two effects. -4 separate set of experiments were performed for high
resolution Ibl-L'i images to achieve perceptually lossless quality. And the results show
that around the compression ratio where distortion just starts to be noticeable. it is easier
to identify the distortion in JPEG coder than in the perceptual wavelet coder. In another
word? it is possible for the perceptual coder to achieve higher compression ratio while
still maintain perceptually lossless quality. However, due to reasons of confidentiali-.
these results can not be presented here. The option of separate quantization for region
of interest at a specified perceptual quality is tested on 'lenna', and the facial region tvas
![Page 73: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/73.jpg)
selected as the region of interest. The scale hc to r is set at one. And as the compression
ratio increases. the quality of the facial region remains the same. Figure 3.24 and 3.25
show the reconstructed images from the perceptual wwelet coder mith and without the
ROI option at high compression ratio. While the distortion is apparent in the facial
regioo in Figure 3.24. the facial region in Figure 3.23 remains perceptiially lossless under
the predefined condition. The sacrifice is that the background in Figure 3.25 is more
blurred than that in Figure 3.24.
![Page 74: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/74.jpg)
3.5. Experimental Results
Figure 3.1'2: Reconstructed 'flower' Csing JPEG Coder (90:l)
Figure 3.13: Reconstructed 'Kower' Using Perceptual Wavelet Coder (90: 1)
![Page 75: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/75.jpg)
Figure 3.14: Reconstructed 'baboon' Using JPEG Coder (80: 1)
Figure 3.15: Reconstructed 'baboon' Using Perceptual Wavelet Coder (80:l)
![Page 76: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/76.jpg)
Figure 3.16: Reconstructed 'lenna' Gsing Basic W e l e t Coder (50: 1)
Figure 3.17: Reconst ructed 'lenna' Gsing Percept ual Wavelet Coder (JO: 1)
![Page 77: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/77.jpg)
Figure 3.18: Reconstructed ' flower' Csing Basic Wavelet Coder (90: 1)
Figure 3.19: Reconstructed *flower7 Gsing Perceptual Wavelet Coder (90:l)
![Page 78: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/78.jpg)
Figure 3.20: Reconst nicted 'pepper Using Basic Wavelet Coder (80: 1)
Figure 3.2 1: Reconstmcted 'pepper' Csing Perceptual Wavelet Coder (80: 1)
![Page 79: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/79.jpg)
3.5. Experimental Results
Figure 3.22: Reconstructed 'baboon' Csing Basic Kavelet Coder (80:l)
Figure 3.23: Reconstructed 'baboon' Using Perceptual Chele t Coder (80:l)
![Page 80: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/80.jpg)
Figure 3.24: Reconst ructed *lennaT Csing Percept ual Wavelet Coder Wit hout ROI(80: 1)
Figure 3.25: Reconstmcted 'lenna? Using Percept ual Wavelet Coder Wit h ROI(8O: 1)
![Page 81: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/81.jpg)
Chapter 4
Quality Assessrnent Using Vision
Mode1
It is clear that an objective measure for evaluating image qualit? more accurate than
simple PSXR measure would be very helpful in designing compression algorithms. PSNR
measure often leads to inaccurate prediction of perceptual quality when coniparing two
different algorithrns because it operates on pixel by pixel basis. which is not how human
perceive images. Subjective tests are usually time-consuming. and tend to be inconsistent.
It is desirable to have a vision model that can mimic the way human perceive images
and express the quality numerically. In this chapter. we construct and investigate a
mechanistic model based on Sarnoff's visual discrimination model[20]. This vision mode1
is more elaborate than the one used in the compression schenie in the previous chapter
because it does not have some of the constraints imposed b~ the encoder. The qualitp
asessrnent using this vision model is then applied to the compressed images from the
previous chapter. The flom diagram of the model is s h o m in Figure 4.1 and each of its
components is described in the next section followed by a result section.
![Page 82: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/82.jpg)
3.5. Experirnental Results
Figure 4.1: Flow Diagram of the Vision Mode@O]
![Page 83: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/83.jpg)
4- 1 . Vision Mode1
4.1 Vision Mode1
The definition of mechanistic mode1 of human vision system is that each of its compo-
nents tries to model the functional response of physiological rnechanisms in the visual
pathways of the way. The vision model consists six stages, each representing a particu-
lar viwal rharart~ristirs. Earh s t a y will be acldressed below. Sirice most compression
coders distribute the distortions more or less equally between chromatic and achromatic
channels. the performance difference between a luminance-only quality assessrnent mea-
sure and its full-color extension is small[4-4]. Therefore. Ive will apply the vision model
only to the luminance component.
The input images are convolved with a function approximating the point spread function
given in Equation 2.39. The use of point spread function is jiistified by the fact that a
point object gives rise to a retinal light distribution that is bell-shaped in cross-section.
It can be viewed such that the intensity of one pisel spreads out to its neighboring pixels.
This concept is simple; we first need to calculate the physical clifference betwen pixels in
terms of visual angle. then convert the point spread function into a 2-D discrete function.
One of the concerns here may be that the point spread function is not separable. But since
its value fa11 off exponentially it is a fairly short FIR filter. and thus still reasonable to
cornpute. An additional operation is performed when the fixation depth does not match
the image depth. In this case, a blur spot will form at the retina. We need to calculate
the size of the blur circle using the distance from the exit pupil to the image surface and
the dept h information, and t hen convolve this disk-shaped convolu t ion kernel wit h t lie
image in a same fashion as the point spread function. This kernel can be combined with
the point spread function so that we on- need to perform convolution once.
![Page 84: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/84.jpg)
4.1. Vision Madel
4.1.2 Sampling
h Gaussian convolution and point sampling sequence of operations are used to simulate
the sampling of retinal cone mosaic. For foveal viewing, the image is sampled at 120
pixels per degree of visual angle. resulting in a retinal image of 512x512 pixels. For
non-foveal viewing. the sanipling density is calculated as:
mhere e is the eccentricity in degrees. and k is set to 0.4. as estimated from psychophysical
data by Watson (391.
4.1.3 Bandpass Contrast Responses
The raw luminance signal is converted to units of local contrast. Contrast is a basic
perceptual attribute of an image. The absoiute luminance does not rnean too much
to the human eye. it is the contrast that we perceive. A local band-limited contrast
rnethod for complex image is employed here[25]. The first step is to decompose the
image into a Laplacian pyrarnid. resulting in several levels of bandpaçs signals. each level
separated from its neighbors by one octave[5]. The resulting structure is ven similar
to the wavelet subband de composition^ cxcept that a simple Gaussian filter is used for
fast cornputation. After decomposition. at each point in each level. the Laplacian value
is divided by the corresponding point upsampled from the Gaussian pyramid level two
levels d o m in resolution:
where Cr(x: y) is the contrast at pyramid Ievel1, location (x,y); ï(x. y) is the input image;
and Gi (xt y) is a Gaussian convolution kemel:
![Page 85: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/85.jpg)
4.1 - Vision Model
where 01 = 2'-'a0. This operation results in a local nieasure of contrast. localized iri
both space and Frequency. Then a brightness adjustmcnt is addeci t o mode1 the reduced
visibility threshold at dark regions. The adjustrnent curve in Figure 2.11 is used.
4.1.4 Oriented Responses
At this point. we have a pyramid structure of contrast values. To ttake into accoiint the
orientation bandwidth of the bandpass signal. each pyramid level is convolved with four
pairs of spatially oriented filters. Each pair consists of a directional second derivative
of a Gaussian and its Hilber transform. This is a type of steeriible filter. meaning ttiat
a filter of arbitra- orientation cnn be synthesized as a linrar combination of a set of
b a i s filters[l-L]. Furthermore. the Gaussian derivative filter is separable. and its Hilbert
transform can be approximated by four b a i s functions. tvhich is also separable. Then to
have a meaningful analpsis of the local orientation. the orientation strength along each
of the four directions (O0. Xi0. 90°. 135') is calculated as the square sum of the output of
the pairs of orientation filters. resulting in a phase indeperident energy response:
where O and h are the oriented operator and its Hilbert transform. The advantage of
phase independence of the energy response is that it rnakes the mode1 less sensitive to
the exact location of an edge, a property d s o exhibited in hurnan visual system.
![Page 86: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/86.jpg)
Each energ- measure is first normalized by a value AG. which is close to the square of
the grating contrast detection threshold for that pyramid lerei and luminance:
1 -\ft(ul> L . ~ ( x : y)) =
a - c l 0 e +buIiJ-
IV here
and cf is the peak spatial frequency For pyraniid level 1. Li (1. y) is the local luminance
used in the contrast calculation described in previous section. IL. is the clisplay d t h
in degrees. This value can bc adjusted for more robust performance. Next. a sigrnoid
non-linearity is applied to the riormalized energy measure to reproduce the clipper shape
of contrast discrimination functions. These two operations c m be cornbinecl as one scalar
operation on each energy measure:
where 71 is chosen as 1.5, and w is set to 0.07. Again. calibration of this function using
a few typical images can be performed to improve the mode1 prediction accuracy. This
function has a number of interesting properties when considering a grating stimulus of
contrast c, and frequency ui. For srnail values of c: the maximum transducer output
at level I accelerates as cn. while for a large values of' c. the function is compressive as
P. For an intermediate value of c a t the contrast detection threshold for frequency v-
the transducer output is 1. An eccentricity dependent pooling stage can be added to
![Page 87: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/87.jpg)
4.2. Transducer
improve the performarice further. This is achievecl by averaging the transdiicer out put
over a srnall neighborhoocl by convolving with a disk-shape kernel of diameter 4 = 5 for
fovea inputs. For stimuliis oiitside the fovea. the diameter d, of this kernel increases as
linear funct ion of eccent rici tu:
where e is the eccentricity in degrees. and kp is a scaling factor. This eccentricity depen-
dent increase in pooling is iiseful in modeling the eccentricity-dependent loss in perfor-
mance.
4.2.1 Distance Metric
At this point. we have a Xl-dimensional vector for eacli spatial position of the image.
where SI is the number of pyramid levels multiplies the number of orientations. Before
calculating the distance between these vectors. we first nced to upsample each pyramid
levels to the full 512x512 size. which results in a set of II arrays P,(f). i = 1. . . . . r n for
each input image 2. Then a distance measure D between the two input images 6) and
.E2) is calculated as:
where Q is set at 2.4. The result from this stage is a 2-D distance map indicating the
perceptual difference between two images at each spatial location.
Since a single wlue is often useful in evaluating the image quality? we need to extract
a rneaningful wlue From the 2-D distance map. In practice. two different measure are
usually used: the average across the map and the maximum. ive also calculated a
histogram to get the number of points exceed 90% of the maximum wlue. This measure
may offer additional insight for interpreting the distance map.
![Page 88: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/88.jpg)
4.3. Eqerimental Result
4.3 Experimental Result
The vision mode1 is applied to the four images in the previous chapter. The distance
maps for the four images and their reconstructed images compressed a t 20:l are shown
in Figure 4.2 - 4.5. The bright regions indicate high probability of the distortion being
wtkd hy human rvhib the dark regions indicate low probabilitv. \ lé c m see
that the distance maps are generally able to predict areas where human eyes are likely
to be sensitive to. The distance map predicts that distortion in uniforrn regions are
more perceptible thün in the texture regions. which agrees with our expectation. It also
predicts that distortion in small regions are less likely to be detected.
Figure 4.2: Distance Map for 'lenna'
Three numerical measures were extracted from the distance map: mean distance.
maximum distance, and the number of points exceed 90% of the maximum distance.
Figure 4.6 - 4.8 show the plots of these three measures versus the compression ratio
for the four images. Two observations can be made from the mean distance curves:
![Page 89: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/89.jpg)
4.3. Ezperimental Result
Figure 4.3: Distance Map for *flower*
the first one is that the mean distance for the perceptual coder is generally lower than
the basic coder. and the second one is that the JPEG coder has lower mean distance
a t low compression ratios and higher mean distance at high compression ratios than
the wavelet coders. These results reasonably correspond to our expectations from the
subjective evaluation in the previous chapter. -1 discrepancy between the mean distance
measure and the subjective evaluation occurs for the image -baboon'. The mean distance
measure suggests that the JPEG coder consistently has a lower error than the wavelet
coders. Honiever. the reconstructed image using JPEG showed more apparent distortion
than those using wavelet coden starting a t the compression ratio 40:l. Let us now turn
Our attention to the masimum distance rneasure in Figure 4.7. Notice that there is a large
ele~ation in the maximum distance curve for JPEG coder starting a t the compression
ratio 4O:I. The reason that the mean distance cuwe for the .JPEG coder is lower than
that of the niavelet coders may be accounted for by the fact that the *baboonT has a
large texture region, and JPEG usually performs better in highly textured regions. From
![Page 90: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/90.jpg)
4.3. Ezperimental Result
Figure 4.4: Distance XIap for 'pepper'
the maximum distance measure. ive c m see that again. the perceptual coder has lower
distance values than the basic coder. and except at a few low compression points. the
perceptual coder has lower distance values than the .JPEG coder as well. Generally.
the maximum distance measure should not be used by itself to judge the overall picture
quality. since a few points with large error might lead to inaccurate prediction. At a few
places on the curve. the maximum distance decreases as the compression ratio increases.
but we knoiv that the picture quality should decrease with increased compression ratio.
The histogram plot in Figure 4.8 does not give us too mcich information. but there is
one interesting point worth noting. There is a sharp increase in the histograrn curve
of 'baboon' for the perceptual coder at the compression ratio 90~1. From subjective
inspection, t here is a sudden quality degradation for 'ba boon' using the perceptual coder
at compression ratio 90:l. Overall. the mean distance measure is a fair assessment for
evaluating image quality, but the maximum distance measure and the histogram measure
can provide some additional useful information for more accurate quality assessment.
![Page 91: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/91.jpg)
4.3. ExpeB'mental Result
Figure 4.5: Distance Map for 'baboon'
![Page 92: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/92.jpg)
4.3. Eqerimentd Result
mean distance for Flower 1.9 I 1
compression ratio
mean distance for Baboon
1
0.5 O 20 40 60 80 100
compression ratio
mean distance for Lenna
I - PWC l
compression ratio
mean distance for Pepper 1 -8 1 1
- JPEG - - BWC
compression ratio
Figure 4.6: Mean Distance bieasure from the Distance Map
![Page 93: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/93.jpg)
rnax distance for Flower 5.5 1
- JPEG - - BWC . - . PWC
3.5 O 20 40 60 80 100
compression ratio
max distance for Baboon
'n
1
O 20 40 60 80 IO0 compression ratio
max distance for Lenna
6r
l 1 - PWC
O 20 40 60 80 100 compression ratio
max distance for Pepper 5.5 p l
O 20 40 60 80 100 compression ratio
Figure 4.7: Maximum Distance Measure frorn the Distance SLap
![Page 94: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/94.jpg)
4.3. Elper ih en t al Result
histograrn (90°b) for Flower
20 40 60 80 100 compression ratio
histogram (90%) for Baboon
- JPEG - - BWC
compression ratio
histogram (90% for Lenna 200 1
compression ratio
histogram (90°/~) for Pepper
20 40 60 80 100 compression ratio
Figure 4.8: Histogram Bin (90% of k1âuirnurn) from the Distance hlap
![Page 95: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/95.jpg)
Chapter 5
Conclusion
5.1 Contributions
The focus of this thesis was to explore the use of hurnan visual charactenstics in image
compression and quality assessment. X perceptual wavelet coder was developed to sat isfy
a wide range of requirements. It consists of four major components: wavelet transform.
perceptual model. quantizer and entropy coder.
For wavelet transfom. the selection of levels of decomposition and wavelet filters was
investigated. We found that a decomposition level of five is suitable for Our purpose. We
included some of the best known filters for image compression in Our wavelet coder. and
the performance difference between these filters were found to be mainly image depen-
dent. The perceptual model mas designed to include several well-knoivn characteristics
of the HVS: contrast sensitivity. contrast masking, luminance and texture masking, and
probability summation. The model generates a weighting factor for each subband coeffi-
cient according to their visual importance. The perceptual distortion is then calculated
based on the weighted coefficients. X bit allocatim algorithm was used to determine
the quantization steps for each subband based on minimizing the perceptual error of the
![Page 96: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/96.jpg)
5.. Contributions 85
reconstructed image. The multi-rate quantization mas used in conjunction a i t h an adap-
tive arithmetic coder to generate fully embedded bit strearns so that scalability can be
achieved. The perceptual wavelet coder was compared with two other coders: a wavelet
coder without incorporating perceptual mode1 and JPEG coder. The stibjective results
showed that at lom compression ratios. the .JPEG coder and the wavelet coders perform
comparably- In fact in man? cases. the? can be considered perceptually lossless under
the specified riew conditions. -4s the compression ratio increases. the perceptual quality
of JPEG coder falls clown quicklh but the wavelet coder can still niaintain reasonable
qiiality at high compression ratios. Between the two wavelet coders. the percept ual coder
generally demonstrated better visual quality t han the wavelet coder wit hout perceptual
model. These results justifiecl the use of perceptual mode1 in image compression. Results
obtained lrom perceptually lossless compression for high resoliition ILIAS images also
showed that it is possible for the perceptual wwelet coder to achieve higgher compression
ratio for which the images are considered perceptually lossless. An option w u included
in the coder to allow a specified region of interest be compressed at a desired perceptual
quali ty.
To investigate the use of vision model in image quality assessment. a vision model
based on Sarnoff's visually discriminated model was implemented and applied to the
compressed images frorn previous experiments. The 2-D distortion maps produced by the
mode1 showed fairly accurate prediction about where the distortion is more noticeable by
human eyes. Three numencal measures were estracted from the distortion map: mean
distance. maximum distance. and the number of points exceed 90% of the maximum
distance. Results showed that reasonable assessrnent can be made from the mean distance
mesure, but mith additional information from the other two measures, more accurate
judgement can be made.
![Page 97: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/97.jpg)
5.2. Future Research
5.2 Future Research
Both perceptual coding of visual information and perceptual quality assessrnent have
important implications in many applications; therefore. it is worth exploring the topic
further. Some of the suggestions for future research are listed below.
-4 lot of work are remained to be done in designing an accurate human vision model.
From the point of view of vision science. extensive psychovisual experiments are
required to establish better understanding of the HVS. From the point of view of
image processing? better schemes for incorporating the vision niodel in coding can
potent ially improve the perceptual quality.
The correlation between the color components \vas not Ftilly esploited in the thesis.
it m q prove beneficial to incorporate ttiat in the vision mode[. Also it is wortliivhile
to inveçtigate other types of color spaces.
0 Higher compression ratios might be achieved using the Hierarchical Vector Quan-
tization scheme in wllvelet coder. I t shoiild also be interesting to investigate how
to espand our perceptual scheine for scalar quantization to vector quantization.
Finally? our perceptual wavelet coding scheme for still image can be estended to
video compression. TWO possible striictures can be used for video coding. In
the hybrid structure the intra-frame is coded using the wavelet scheme, and the
inter-frame is est imated From intra-frame by motion estimation and compensation.
Xnother possible video coding structure is the 3-D extension of wavelet coding to
the temporal domain. The HVS in the temporal domain also needs to be considered.
![Page 98: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/98.jpg)
Appendix A
Color Space Conversion
I T b C k color space is clefined in ITU-R Recommendation 601. Conversion from
k'CbCh to standard CIE 1931 S Y Z tristimulus values requires two linear transforma-
tions and a gamma correction. IptCbCh cocling uses 8 bits for each component: Y' is
coded with an offset of 16 and an amplitude range of 219. ahile CL and Ch are codecl
with an offset of 128 and an amplitude range of f 112. The estremes of the coding range
are reserved for synchronization and signal processing headrooni. which requires clipping
prior to conversion. Nonlinear RtG'B' values in the range of [0.1] are computed from
Gamma correction in Equation 2.40 is applied to R'G'B' to obtain linear RGB values.
For display with standard phosphors, these linear RGB values can be converted to CIE
![Page 99: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/99.jpg)
5.2. Future Research
![Page 100: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/100.jpg)
Bibliography
[11 A. .J. lhumuda and H. -1. Peterson. "Luminance-!dodel-Based DCT Quantization
for Color Image Compression." Human Vision. Visaal Processing. und Digital Dzs-
play III. pp. 365-37-1, 1992.
[-1 A. .J. Ahumuda. "Computational [mage Quality Uetrics: .A Review." Society /or
Information Display In ternational symposium. Digest of TechnicaL Papers. pp. 30.5-
308. 1993
[31 A. .J. Ahumiida and C. H. Null. " Image Quality: .A SIiilti-Dimensional Problem."
Digital Images and Hurnan Vision. MIT press. pp. 141- M. 1993.
[A] M. hrdito? M. Gunetti. and XI. Visca, "Preferred Viening Distance and Displ-
Parameters." MOSAIC Handbook, pp. 165-151. 1996
[5] P. J. B u a and E. H. Adelson. '*The Laplacian Pyrarnid as a Compact Image Code."
IEEE Trans. on Communications. vol. corn-31, no. 1. pp. 532-540. April, 1953.
[6] Y. T. Chan, " Wavelet Basics." Khwer Academic Publishers. 1995.
[ i l C. H. Chou and Y. C. Li. ''-4 Perceptually Tuned subband Image Coder Based on
the Measure of Just-Noticeable Distortion Profile." IEEE Circuits and Systems for
Video tech., vol. 5, pp -167-476. Dec. 1995.
![Page 101: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/101.jpg)
[SI R. J. Clark. "Digital Compression of Still Images ancl Vitleos." Academic Press.
1993.
[9] S. Da13 "The Visible Differences Predictor: An Algorithm for the ;\ssessment of
Image Fidelits" Digital Images and Hurnan Irision. pp. 179-206. Cambridge. MA:
MIT Press. 199.3.
[IO] 1. Daubechies. " Ten Lectures on Wavelets." SIAM. 1992.
[II] M. Dzniura and B. Singer. "Spatial Pooling of Contra t Gain Control." J. Opt. Soc.
of Amer. A. vol. 13' no. 11. pp. 2135-1140. 1996.
[12] I I . P. Eckert. A. P. Bradley. "Perceptual Quality hletrics Applied to Still Image
Compressionl" signal Processiny. vol. 70. no. 3. pp. 177-ZOO. 1998.
[13] G. Folland. ' Harmonic Analysis in Phase Space.' Princeton Linluersity Press. 1989
1 W. T. Freeman and E. H. Adelson. "The Design and Cse of Steerable Filters." IEEE
Tmns. on Pat tern .4nafysis and Machine Intelligence. vol. 13. no. 9. pp. $91-906.
September, 1991.
[13] D. J. Granrath. "The Role of Human Visual Models in Image Processing," Proceed-
ings of the IEEK vol. 69: pp. 552-361. 1981.
[16] C. F. Hall and E. L. Hall. "A Nonlinear Mode1 for the Spatial Characteristics of the
Human Visual System," IEEE TrBns. on Systern. Man, and Cybernetics. vol. 7. pp.
161-170. hlarch, 1977.
[17] Pl. Jayant, J. Johnston, and R. Safranek? "Signal Compression Based on blodels of
Human Perception," Proceedings of the IEEE. vol. 81, no. 10' pp. 1385-1422, 1993.
![Page 102: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/102.jpg)
[18] D. H. Kelly, " Moion and Vision. S tabilized Spatio-Temporal Threshold Surface.'' J.
Opt. Soc. Amer. vol. 69. no. 10. pp. 1340-1349. 1979.
[19] Y. H. Kim and J. Xlodestino: " Adaptive Entropy-Coded Subband Coding of Im-
ages." IEEE Trans. on Image Processing. vol. 1. pp. 31-15. January 1992.
[?O] J. Lubin. "The Use of Psychophysical Data and Uodeis in the Anaiysis of Dispiay
System Performance." Digital Images and Hvman Vision. MIT Press. pp. 163-178.
1993.
[21] E. hlajani. "Biorthogonal CVavelets for Image Conipression." froc. SPIE. WIP-9-4.
1994.
[Z] S. G. Mallat. '3Iiiltifreqiiency Channel Decomposition of Images and Wavelet h d -
els." IEEE Transactions on ..Lcoustics. Speech. and Signal Procrssing, vol. 37. no.
12. pp. 0091-2190. Decernber. 1989.
[23] A. N. Netravali and B. C. Haskell. "Digital Pictures: Representation and Compres-
sion?" N e w York: Plenum. 1985.
1241 J. E. Odegard and C. S. Burrus. " Srnooth Biorthogonal Wwelets for Applications
in Image Compression." Proceedings of DSP Workshop, Norway, Septem ber. 1996.
[25] E- Peli. "Contrast in Complex Images.'' J. Opt. Soc. Amer. il. voi.7. no. 10. pp.
2032-2040, 1990.
[26] W. Penebaker and J . Mitchell' "JPEG Still Image Data Compression Standard.l
Van Nostrand. 1993.
[27] Ido Rabinoiitch. " High Quality Image Compression Using the Wavelet Transform,"
Master Thesis, University of Toronto; 1996.
![Page 103: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/103.jpg)
[28] R. J. Safranek and J . D. Johnston. ''A Perceptiially Tued Subband Image coder with
Image Dependent Quantization and Post-qriantization Data Compression.'' IEEE
Internation al Conjerence on Acoustics. Speech. and Signal Processing. pp. 1943-
1948. 1989.
1291 . , A. Said and W. .A. Pearlman. " A Xew. Fast, and Efficient Image Codec Based on
Set Partitioning In Hierarchical Trees." IEEE Trans. on Circuits and Systems /or
Video Technology. vol. 6. pp. 243-250. June 1996.
[30] J . Shapiro. "Ernbedded Image Coding Using Zerotrees of CCavelet Coefficients."
IEEE Trans. on Signal Processing. vol. 41. pp. 3-44.3-3462. Dec. 1993.
[31] Y. Shoham and A. Gersho. "Efficient Bit Allocation for an Arbitra- Set of Quari-
tizers." IEEE Trans. on Acoustics. Speech. and Signal Processing. vol. 36. no. 9. pp.
1445-1453. September 1988.
[32] K. T. Soon. K. K. Pang, and K. Y. Ngan. "Classified Perceptual Coding tvith Aclap-
tive Qiiantization." IEEE Trans. on Circuits and Systems lor Video Techriolopj.
vol. 6. no. 4. pp. 375-388. 1996.
[33] D. Taubman and -4. Zakhor. ' hlultirate 3-D Subband Coding of Video." IEEE
Trans. on Image Processang. vol. 3. no. 5, pp. 572-38s. September 1994.
[34] P. C. Teo and D. J. Heeger. "Perceptual Image Distortion." lEEE International
Con ference on Image Processing, pp. 982-986. 1994.
[35] P. N. Topinrala (editor). " CVaveIet Image and Video Compressiont'' Kluwer rlcademic
Publishers, 1998.
(361 P. Vaidyanathan, "hlultirate systerns and filter Banks:" Prentice-Hall. 1993.
![Page 104: Quality Assessrnent - University of Toronto T-Space · Chapter 1 INTRODUCTION 1.1 Significance of the Research Over the past few decades. an increasing number of muit iniedia applicat](https://reader035.vdocument.in/reader035/viewer/2022070803/5f0332457e708231d40803b0/html5/thumbnails/104.jpg)
[37] 0. Rioul. XI. Vetterlio "Wavelet and Signal Processing." IEEE Signal Processing
Magazine. pp. 14-38. October 1991.
[35] J . Villasenor et al. " Filter Evaluation and Selection in b k e l e t Image Compression."
Proc. Data Compression Conference. IEEE. pp. 351-360. 'iIarch. 1994.
i393 -4. S. Vktwr i . " Detrct iuii aid Rccugnitiun uf Simple Spatial Forms." Phgsico! m d
Biological Processing of Images. 1983.
[-IO] A. B. Watson. 'DCTune: .A Technique for Visual Optimization of DCT Quantization
Matrices for Indicidual Images." Society for In/ormation Display Digest O/ Technical
Papers XXIV. pp. 946-949. SPIE. 1993.
[dl] -4. B. Watson. G. Y. Yang, .J. A. Solornon. and J . Villasenor. 'Tisibility of \Vivelet
Quantization Noise." IEEE Trans. on Image Processing. vol. 6. no.8. Aiigiist 1997.
[A?] G. Westheimer. "The Eye as an Optical Instrument." Hurrdbook of Perception and
Humun Perfo.mance, vol.1. chapter 4. John Wiley S( Sons. 1986.
[43] E. Whittaker. "On the Functions which are Representecl by the Expansions of
Interpolation Theorf Proc. Royal Soc.. Edinburgh. Section .A 35. pp. 181-194.
1915.
[U] S. Winkler, 'Issues in vision Modeling for Perceptual Video Quality ;\ssessment.*'
IEEE Signal Processing, vol. 78, pp 231-252? 1999.
[G] d. Woods and S. O'Neill "Subband Coding of Images." IEEE Trans. on îicowtic.
Speech. Signal Processing. vol. 34, pp. 1275-1285. October 1986.