Relations between local and global perceptual image qualityand visual masking
Md Mushfiqul Alam, Pranita Patil, Martin T. Hagan, and Damon M. Chandler
School of Electrical and Computer Engineering,Oklahoma State University, Stillwater, OK 74078
ABSTRACT
Perceptual quality assessment of digital images and videos are important for various image-processing appli-cations. For assessing the image quality, researchers have often used the idea of visual masking (or distortionvisibility) to design image-quality predictors specifically for the near-threshold distortions. However, it is stillunknown that while assessing the quality of natural images, how the local distortion visibilities relate with thelocal quality scores. Furthermore, the summing mechanism of the local quality scores to predict the global qualityscores is also crucial for better prediction of the perceptual image quality. In this paper, the local and globalqualities of six images and six distortion levels were measured using subjective experiments. Gabor-noise targetwas used as distortion in the quality-assessment experiments to be consistent with our previous study [Alam,Vilankar, Field, and Chandler, Journal of Vision, 2014], in which the local root-mean-square contrast detectionthresholds of detecting the Gabor-noise target were measured at each spatial location of the undistorted images.Comparison of the results of this quality-assessment experiment and the previous detection experiment showsthat masking predicted the local quality scores more than 95% correctly above 15 dB threshold within 5% subjectscores. Furthermore, it was found that an approximate squared summation of local-quality scores predicted theglobal quality scores suitably (Spearman rank-order correlation 0.97).
Keywords: Image quality, local image quality, visual masking, local detection thresholds, natural scenes.
1. INTRODUCTION
Perceptual quality assessment of digital images is important for maintaining quality services to the digitalmedia consumers. Even though the quality assessment of images and videos has become more challengingfor increasing variety of display forms,1 the consumer demands for high-quality images and videos have recentlybeen increased due to the better compression schemes, such as H.264, HEVC. Furthermore, several internet-basedmedia providers, such as Netflix, and Hulu Plus, and display device manufacturers, such as Sony, Samsung, andLG have made it possible for the consumers to watch high-quality ultra HD videos. Because the demand andavailability of high-quality media is increasing in recent days, better assessment of the perceptual quality ofhigh-quality images and videos is becoming more important.
Although the quality assessment of the high-quality images is important, most images in the current imagequality databases are heavily distorted. The subjective quality ratings in the image quality databases aregenerally expressed in terms of mean-opinion-scores (MOS) or difference-mean-opinion-scores (DMOS), and thehigher the MOS or lower the DMOS, the better the quality of the image. Although it is difficult to set up aspecific threshold to classify an image as low-quality or high-quality, roughly 70% of the images in the currentimage-quality databases are heavily distorted: in the LIVE image quality database,2 67.2% images (660 imagesout of 982 images) have DMOS above 25; in the CSIQ image quality database3 63.4% images (549 images outof 866 images) have DMOS above 20, and in the TID database4 86.8% images (1475 images out of 1700 images)have MOS above 2.5. Even though such heavily distorted images are important for better understanding theperceptual strategies to evaluate quality of various distortion levels, it is still an open research question that ifthe human observer really employs single or multiple strategies at different distortion levels.3,5, 6 Furthermore, ifthe human observer employs different strategies in different distortion levels, current image-processing algorithmscan benefit from a study exploring such strategies.
Further author information: E-mail: {mdma, pranita, mhagan, damon.chandler}@okstate.edu
Invited Paper
Human Vision and Electronic Imaging XX, edited by Bernice E. Rogowitz, Thrasyvoulos N. Pappas, Huib de Ridder,Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 9394, 93940M · © 2015 SPIE-IS&T
CCC code: 0277-786X/15/$18 · doi: 10.1117/12.2084935
Prof. of SPIE-IS&T Vol. 9394 93940M-1
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
Visual masking7 phenomenon has been used to estimate the relative distortion visibility in images and videos,specifically for near-threshold distortions.3,8–15 For example, Damera-Venkata et al.11 used a traditional contrastmasking model to develop a noise quality measure for natural images. Similarly, Chandler and Hemami13
presented a visual signal-to-noise ratio (VSNR), which applied two different schemes for near-threshold and supra-threshold distortions. Specifically, VSNR used a wavelet based model of visual masking and visual summationto measure the visual fidelity for near-threshold distortion. Recently, Laparra et al.16 used an improved divisivenormalization based masking model for better assessment of image quality. Similarly, Larson and Chandler3
proposed the most-apparent-distortion (MAD) image-quality assessment algorithm, which adopted a two-stagestrategy, namely, detection-based strategy for high-quality images, and appearance-based strategy for low-qualityimages. In the detection-based strategy of MAD, local luminance and contrast masking were used to model thedistortion visibility.
Although masking phenomenon has been often used for image-quality assessment models, two issues havenot yet been addressed properly: first, it is generally assumed that masking is more effective for assessing thequality of near-threshold distortions. However, such assumption has not been tested experimentally, which iscrucial for better assessment of image-quality at different distortion levels. Second, it is still a research questionthat how the masking in the local regions of an image might affect the global quality of the image. This paperaddresses these two issues by presenting the results of a series of controlled subjective experiments for assessingthe local and global quality of natural images. The local and global quality scores, along with the local detectionthresholds (masking thresholds) obtained from our previous study17,18 were used to analyze the relationshipsbetween masking and image quality, and local and global quality.
It should be noted that the experiments presented in this paper used a smaller but diverse set of images withonly a single type of distortion. Although testing with more number of images with more distortion types wouldbe interesting, we believe this study represents an important first step towards a better understanding of therelationships between masking and local quality, and local and global quality.
2. EXPERIMENT METHODS
We performed two experiments to measure: (a) local quality, and (b) global quality of the images. This sectionprovides details of the experiment apparatus, stimuli, procedure, and subjects employed to measure the localand global qualities of the natural images. Note that in our previous study17,18 on masking, we measured localdetection thresholds within natural scenes. In the experiments described in this paper, we kept the experimentenvironment and the stimuli consistent with our previous study17,18 so that the relationship between maskingand image quality could be explored consistently.
2.1 Apparatus
Stimuli were displayed on a Samsung SyncMaster S24B240 LED monitor. The monitor was driven by a computerwhich was equipped with an Intel core 2 quad q6600 processor, 3.0 GB of RAM, with an NVIDIA GeForce 8600GT graphics card. The screen dimensions were 23.6 inches diagonally, 20.5 inches horizontally, and 11.5 inchesvertically. The display resolution was 1920× 1080 pixels at a frame rate of 60 Hz. The total angle subtended bythe display was around 47 degrees. The maximum possible radial frequency was 20.4 cycles/degree (c/deg). Theminimum and maximum luminances of the display were set to 0.09 and 114 cd/m2, respectively. The relationshipbetween the digital pixel value and displayed luminance was linearized in software by using a lookup table withluminance measurements made via a Konica Minolta Chroma Meter (CS-100A). The subjects viewed the stimulibinocularly through natural pupils in a darkened room at a distance of approximately 60 cm. The local-qualityexperiment was conducted using a single monitor, and the global-quality experiment was conducted by placingtwo similar monitors with similar settings side-by-side.
2.2 Stimuli
This section describes details of the stimuli generation steps for the (a) global-quality experiment and (b) local-quality experiment.
Prof. of SPIE-IS&T Vol. 9394 93940M-2
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
1
��
Reference
image, !"
Distorted
image, !#
Distorted
image, !$ Distorted
image, !%
Distorted
image, !& Distorted
image, !�
Gabor noise, '(
�% �& �$ �#
Reference
patch, !),"
Distorted
patch, !),�
Distorted
patch, !),&
Distorted
patch, !),%
Distorted
patch, !),$
Distorted
patch, !),#
Figure 1. Stimuli of the experiments. The full-sized distorted images Dl, (l = 0, ..., 5) were the stimuli of global-qualityassessment experiment. The total angle subtended by each stimulus of the global-quality assessment experiment was 12.5degrees. The distorted patches Dp,l, (p = 1, ..., 36 and l = 0, ..., 5) were the stimuli of local-quality assessment experiment.The total angle subtended by each stimulus of the local-quality assessment experiment was 2.1 degrees without the context,and 5.2 degrees with context. The details of the stimuli generation steps are given in the Section 2.2.
2.2.1 Stimuli of Global-Quality Assessment Experiment
Six reference images (D0 or R ∈ {log seaside, swarm, elk,native american,monument, aerial city}) were chosenas mask images from the CSIQ database.3 Each of the six images was chosen from six different categories. Thetop row of Figure 4 shows the mask images along with their names and corresponding categories. It should benoted that each of these images was normalized to span an 8-bit digital range of 0− 255, which produced sharp,high-contrast images that are useful for image-quality assessment, but this also indicates that the contrasts werenot necessarily identical to the original scenes. The dimension of each image was 510× 510 pixels (12.5 degrees).
The target (distortion) was a vertically oriented Gabor-noise pattern which had a center radial frequency of3.7 c/deg and one-octave bandwidth. The top image of Figure 1 shows the Gabor-noise pattern (GN ), whichwas 510 × 510 pixels in dimension, and ranged between −1 to +1 with zero mean. The details of the targetgeneration steps can be found in the methods section of our previous study on masking.18
Figure 1 shows how the stimuli were generated. Note that including the reference image, there are sixdistorted images Dl (l = 0, ..., 5). The distorted images were calculated via:
Dl(x, y) = ⌊ml ×GN (x, y) +D0(x, y)⌉ ,
Dl(x, y) =
0 if Dl(x, y) < 0,
255 if Dl(x, y) > 255,
Dl(x, y) otherwise.
(1)
Prof. of SPIE-IS&T Vol. 9394 93940M-3
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
Table 1. Target contrasts (Cl, l = 0, ..., 5) and the multipliers (ml) at six distortion levels (l = 0, ..., 5) of six images(R ∈ {log seaside, swarm, elk,native american,monument, aerial city}).
Target contrasts: C0 C1 C2 C3 C4 C5
Units:linear 0 0.052 0.103 0.207 0.413 0.826
decibel (dB) −∞ -25.74 -19.72 -13.69 -7.68 -1.66multipliers: m0 m1 m2 m3 m4 m5
Images:
log seaside 0 14.24 28.79 59.19 127.08 293.61swarm 0 11.12 22.48 45.32 90.82 178.34
elk 0 11.63 23.39 46.78 93.57 187.18native american 0 14.92 30.53 63.13 133.62 319.15
monument 0 13.82 27.88 56.31 116.41 269.81aerial city 0 10.96 22.06 44.04 87.25 172.81
where GN is the Gabor-noise pattern, ml (l = 0, ..., 5) are the multipliers, ⌊ ⌉ denotes the rounding operation, D0
is the undistorted mask image, x = 1, ..., 510, and y = 1, ..., 510,. The values of the multipliers (ml) were chosensuch that the RMS contrasts19 of the target were certain multiples of the maximum RMS contrast detectionthreshold found from our previous study.18 From our previous study, we found the maximum local detectionthreshold for patches having luminance > 3 cd/m2 was CT,max = 0.207 (−13.69 dB). The top three rows ofTable 1 show the target RMS contrasts at the six distortion levels (Cl, l = 0, ..., 5). The relationship betweenCT,max and Cl (in linear RMS contrast units) can be shown via:
C0 = 0× CT,max, C1 = 0.25× CT,max, C2 = 0.5× CT,max,
C3 = 1.0× CT,max, C4 = 2.0× CT,max, C5 = 4.0× CT,max.(2)
The bottom seven rows of Table 1 show the values of the multipliers (ml) for the six distortion levels (l =0, ..., 5) of the six images (R ∈ {log seaside, swarm, elk,native american,monument, aerial city}) to achieve thecorresponding target contrasts. Thus, we generated total 36 distorted images each having 510× 510 pixels (12.5degrees), and used the 36 distorted images for global-quality assessment experiment.
2.2.2 Stimuli of Local-Quality Assessment Experiment
Each of the reference images (R or D0) and the Gabor-noise target (GN ) was of size 510 × 510 pixels. Formeasuring the local quality scores, each of the reference images was divided into 36 patches (Dp,0, p = 1, ..., 36)of size 85× 85 pixels (around 2.1 degrees). The Gabor noise target (GN ) was also divided into 36 correspondingpatches (GNp , p = 1, ..., 36) of size 85×85 pixels (2.1 degrees). The stimuli (Dp,l) were generated by multiplyingthe Gabor-noise patch (GNp) with a scalar ml corresponding to the image I and the distortion level l, and addingthe multiplication output with the undistorted image patch Dp,0 via:
Dp,l(x, y) = ml ×GNp(x, y) +Dp,0(x, y), (3)
where p = 1, ..., 36, x = 1, ..., 85, and y = 1, ..., 85.
To better simulate the spatially-localized condition, the patches (Dp,l) were additionally padded with 64pixels (around 1.6 degrees) of context from the reference image. The angle subtended by the stimuli was 5.2degrees with the context, and 2.1 degrees without the context. Lets denote the stimuli with context as Dp,l.To reduce edge effects, before adding to the undistorted image patch Dp,0, the Gabor-noise patch (GNp) wasmultiplied with a circular window (2.1 deg or 85× 85-pixels) given by,
w(r) = 1− 1
1 + exp (γ − r/β), (4)
where r =√u2 + v2, β = 3, γ = 10, u = −42.5,−41.5, ..., 41.5, and v = −42.5,−41.5, ..., 41.5. Similarly, the
context-padded stimuli (Dp,l) were gradually alpha-blended with the background luminance (14 cd/m2) via,
Dp,l = w × Dp,l + (1− w)× Γ, (5)
Prof. of SPIE-IS&T Vol. 9394 93940M-4
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
Within-patch local quality
Inter-image local quality
Within-image local quality g
Within-patch local quality
aerial_city
swarm
(a) Procedure of global-quality assessment experiment (b) Procedure of local-quality assessment experiment
Figure 2. Illustration of the procedure of (a) global-quality assessment experiment, and (b) local-quality assessmentexperiment.
where Dp,l is the windowed padded stimuli, and where Γ was a digital value of 105, yielding a backgroundluminance of 14 cd/m2. The circular window w (5.2 deg or 213× 213 pixels) was generated via Equation 4 withthe parameters β = 3, γ = 30, u = −106.5,−105.5, ..., 105.5, and v = −106.5,−105.5, ..., 105.5. The windowedpadded stimuli (Dp,l) were viewed by the subjects during local quality-assessment experiment. Example stimuliare shown at the bottom row of the Figure 1. Total number of stimuli for the local-quality assessment experimentwas 1296 (6 images × 6 distortion levels per image × 36 patches per distortion level).
2.3 Procedure
The subjective scores of the images and image patches were measured based on a linear displacement of the imagesand patches across calibrated monitors placed side by side with equal viewing distance to the observer.3 In thissection, first the procedure of the global-quality assessment experiment is described, and then the procedure ofthe local-quality assessment experiment is described.
2.3.1 Procedure of Global-Quality Assessment Experiment
The distorted images (Dl) were displayed in two side-by-side placed monitors, such that the six distorted images(l = 0, ..., 5) of the same image appeared in the same row, and different images appeared in different rows. Figure2(a) shows the initial arrangement of the images shown in the displays.
The score of the images were indicated only by the horizontal positions of the images. Each of the 36 imagescould be dragged and dropped to another position of the display by using mouse inputs. Subjects were instructedto place poorer quality images to the right, and better quality images to the left within the display. Subjectswere also instructed to carefully compare the relative horizontal distance of one image with different distortedlevels of the same image, as well as with distorted levels of other images.
During horizontal movement of one image, the position of only that image changed horizontally. However, forbetter comparison between two different images, when one image was moved vertically, all six distorted imagesof that image moved vertically altogether, keeping their horizontal positions the same. For better viewing, if oneimage was selected using mouse input, all six distorted levels of that image appeared in front of other images.Subjects viewed the images in a darkened room, and were not given any time limitation for the quality assessmenttask. The horizontal positions of the images were saved at the end of the experiment.
Prof. of SPIE-IS&T Vol. 9394 93940M-5
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
2.3.2 Procedure of Local-Quality Assessment Experiment
The number of patches to score in the local-quality assessment experiment was significantly higher (1296 patches)compared to the global-quality assessment experiment. Thus, we adopted a three steps procedure to measure thelocal quality scores of the patches: within-patch local quality assessment, within-image local quality assessment,and inter-image local quality assessment. The illustration of the three steps procedure is shown in Figure 2(b).The steps are discussed in the following:
First step: Within-patch local quality. The pth patch of an image R had six distortion levels (l = 0, ..., 5).We denote the local quality scores of the six distortion levels of a fixed patch p, and a fixed image R as within-patch local quality. In this step, six distorted-level patches of the same patch p of the same image R weredisplayed in a single monitor. The initial horizontal positions of the six patches were random (the patches werenot placed according to their distortion levels), and the vertical positions of the patches were same. Subjectswere instructed to place the better quality patches to the left, and the poorer quality patches to the right usingmouse inputs. Subjects could score new sets of patches, or change the scores of previous sets of patches byusing two separate push buttons. At the end of this step, the horizontal positions of the patches were saved aswithin-patch local quality scores.
Second step: Within-image local quality. In this step, the six distortion levels of different patches(different p) of the same image (same R) were viewed altogether. The initial horizontal positions of the patcheswere set from the results of the previous within-patch quality step. Subjects were instructed to carefully move thepatches, such that the relative horizontal positions of the patches reflect the quality variations due to distortionlevels as well as the patch contents. At the end of this step, the horizontal positions of the patches were savedas within-image local quality scores.
Third Step: Inter-image local quality. From the first two steps, the quality variations due to localcontents, and distortion levels were reflected. However, the local quality may also vary due to the global contentvariations in different images (different R). To account for the global content variations, in this step, four setsof distorted-level patches were chosen from each image R. The four sets contained the poorest quality patchesmeasured from the previous two steps. Subjects viewed different sets of patches coming from different images(different R) altogether. Subjects were instructed to move the position of the poorest quality patch of each setby comparing with other sets. Subjects were not allowed to change the positions of other patches except thepoorest patch in a set. At the end of this step, the horizontal positions of the patches were saved. The averagemovement of the four sets per image R, were used to calculate six multiplication factors with which the scoresfound from the second step were multiplied to incorporate the inter-image local quality variations.
2.4 Subjects
Four adults (MA, YZ, TP, and JH) including the author (MA) participated in the experiment. All subjectshad normal or corrected-to-normal visual acuity. All subjects were experienced with subjective image-qualityassessment experiments. Each of the distorted images and distorted patches was scored at least once by eachsubject.
3. EXPERIMENT RESULTS AND ANALYSIS
In this section, first the subject consistency of the experiment is shown. Then, qualitative observations on thelocal quality scores and detection thresholds are given. After that, quantitative analysis of the local qualityscores and detection thresholds is given. At the end of this section, the relationship between the local qualityscores and global quality scores is discussed.
3.1 Subject Consistency
Subjects were consistent in judging both the local and global image quality scores. Table 2 shows the Pearsoncorrelation coefficient (CC) and the Spearman correlation coefficient (SROCC) between two different subjectsfor both the local quality and global quality experiments. Note that before calculating the CC, the scores weretransformed through a logistic transform3,20 to remove any nonlinearity between the scores. The average CC andSROCC values for local quality experiment were 0.931 and 0.907, respectively, and the average CC and SROCC
Prof. of SPIE-IS&T Vol. 9394 93940M-6
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
Table 2. Subject consistency in terms of Pearson correlation coefficient (CC) and Spearman rank order correlation coeffi-cient (SROCC).
MA/YZ MA/TP MA/JH YZ/TP YZ/JH TP/JH Average
Local qualityCC 0.959 0.930 0.925 0.941 0.927 0.903 0.931
SROCC 0.951 0.938 0.884 0.935 0.876 0.862 0.907
Global qualityCC 0.995 0.990 0.993 0.995 0.993 0.985 0.992
SROCC 0.986 0.981 0.988 0.995 0.987 0.977 0.986
log_
seaside
swarm
elk
native_
american
monument
aerial_
city
LQM
LQM
LQM
LQM
LQM
LQM
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LQM
colorbar
Better
quality
����:
-∞ dB
���!:
-25.7 dB
���":
-19.7 dB
���#:
-13.7 dB
���$:
-7.7 dB
���%:
-1.7 dB
����:
-∞ dB
���!:
-25.7 dB
���":
-19.7 dB
���#:
-13.7 dB
���$:
-7.7 dB
���%:
-1.7 dB
Figure 3. Local Quality Maps (LQM) of six images (R ∈ {log seaside, swarm, elk,native american,monument, aerial city})and six different distortion levels (l = 0, ..., 5). Note that the distortion levels are defined by the target contrasts over thefull-sized images as described in Table 1. The local quality (LQ) scores were calculated by averaging the scores from fourdifferent subjects. The scores corresponding to the gray-scale values in the LQMs are indicated by the colorbar at theright.
Threshold
Colorbar (dB)
70
-50
-40
-30
-20
-10
0
1010
0
-10
-20
-30
-40
-50
Lower
distortion
visibility
Image
Masking
map /
DVM
Landscape
log_seaside
Plant
swarm
Animal
elk
People
native_american
Structure
monument
Urban
aerial_city
Figure 4. Masking maps drawn from the results of our previous study.18 Each patch in the masking map denotes theRMS contrast detection threshold for detecting a Gabor-noise target (as shown in Figure 1) placed over the correspondingpatch in the mask image. The thresholds corresponding to the gray scale values in the masking maps are indicated bythe colorbar at the right. Note that in this paper, masking maps are also denoted by distortion visibility maps (DVM).
values for global-quality experiment were 0.992 and 0.986, indicating that the subjects were very consistent witheach other.
3.2 Local Quality and Masking: Qualitative Observations
The local quality scores are presented in forms of Local Quality Maps (LQM). Figure 3 shows the average LQMsof six reference images and six distortion levels. The local quality (LQ) scores were calculated by first averaging
Prof. of SPIE-IS&T Vol. 9394 93940M-7
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
Table 3. The number of patches in which the target contrasts (Cp) are above the target detection thresholds (CT,p). ldenote the distortion levels. In this table, l = 1, ..., 5. The image names are shown at the top row.
log seside swarm elk native american monument aerial cityl = 1 32 33 15 36 36 36l = 2 35 36 31 36 36 36
l = 3, ..., 5 36 36 36 36 36 36
the scores from four different subjects, and then normalizing to fall within the range 0 to 1. Note that in Figure3, the local quality scores corresponding to the gray scale values of the LQMs are indicated by the colorbar atthe right.
The standard deviations of the local quality scores resulting from four different subjects’ scores are shown inFigure 5 in the forms of standard deviation (LQM std) maps. From Figure 5, note that for lower and highestdistortion levels (l = 0, 1, and l = 5) the standard deviations were lower compared to the other distortion levels(l = 2, ..., 4). Overall, the average, minimum, and maximum standard deviations of the local quality scores were0.076, 0, and 0.338.
To observe the qualitative relation between the local quality and masking, in Figure 4 the local detectionthresholds (CT,p) of the six images are shown in forms of masking maps or distortion visibility maps (DVM). Thedetection thresholds were measured in our previous study18 using the same Gabor-noise target used in this study.Note that the detection thresholds corresponding to the gray scale values in the masking maps are indicated bythe colorbar at the right.
Score saturation at very-low distortions. At distortion level l = 0 (C0 = −∞ dB) almost all the patchesof all six images show best scores (LQs very close to 1.0) in the LQMs of figure 3. Note that many patches indistortion levels l = 1 (C1 = −25.7 dB) and l = 2 (C2 = −19.7 dB) also show scores very close to 1.0. Forexample, the scores of most of the patches in the elk image have saturated to values close to 1.0 at distortionlevels, l = 0, ..., 2. Such saturations are also visible at the center region of the LQMs of the swarm image, and inthe building regions of the LQMs of the aerial city image.
The score saturation at higher distortion levels is less visible compared to the lower distortion levels. Thereason for score saturation at the lower distortion levels could be the fact that the local target contrasts at thoselevels were below the detection thresholds. Table 3 shows the number of patches in which the target contrasts(Cp) are above the target detection thresholds (CT,p). Note that except the elk, log seaside, and swarm imagesat distortion levels l = 1 and l = 2, all the patches contained supra-threshold distortions. However, the LQMs inFigure 3 suggests that many patches showed score saturation even in the supra-threshold regime (see the brightpatches in distortion levels l = 1 and l = 2 of monument, aerial city, and native american).
Map similarities at medium target contrasts. A visual inspection of the LQMs in Figure 3 and maskingmaps in Figure 4 reveals that the patterns of the LQMs of an image follow the same pattern of the masking mapof the same image. For example, the masking map of the image swarm shows lower distortion visibility at thecenter of the map. Similarly, the LQMs of swarm (distortion levels l = 1, ..., 5) show higher quality regions atthe center. However, for most of the images, the pattern similarity is visible at the medium to higher distortionlevels (l = 3, 4, 5, Cl > −13.7 dB). For example, for the image elk, the pattern similarity between the maskingmaps and the LQMs is better visible at the distortion levels l = 3, 4, and 5. Other images also show such patternsimilarities.
Score uncertainties at very-low distortions. During the local-quality experiment, subjects scored thereference patches along with the distorted patches. Notice that for several reference patches, the scores wereless than the maximum score 1.0. For example, a careful visual examination of the LQMs of monument, elk,log seaside, and swarm at distortion level l = 0 shows a few undistorted patches are slightly gray (smaller than1.0) compared to other bright patches of the corresponding LQMs.
During the experiment, subjects were aware of the fact that the undistorted patches were present among thestimuli. However, because the horizontal positions of the patches at different distortion levels were randomized,during the experiment subjects had to identify the undistorted patch by visual examination of the patches. Itshould be noted that the initial randomization was crucial to remove the possibility of locational bias to the
Prof. of SPIE-IS&T Vol. 9394 93940M-8
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
log_
seaside
swarm
elk
native_
american
monument
aerial_
city
LQM
std
LQM
std
LQM
std
LQM
std
LQM
std
LQM
std
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LQM
colorbar
Higher
standard
deviation
����:
-∞ dB
���!:
-25.7 dB
���":
-19.7 dB
���#:
-13.7 dB
���$:
-7.7 dB
���%:
-1.7 dB
����:
-∞ dB
���!:
-25.7 dB
���":
-19.7 dB
���#:
-13.7 dB
���$:
-7.7 dB
���%:
-1.7 dB
Figure 5. The standard deviations of the local quality maps (LQM) of six different images with six distortion levels.The standard deviations were calculated from the scores coming from four different subjects. The standard deviationscorresponding to the gray scale values in the LQM-std maps are indicated by the colorbar at the right.
scores. However, the initial randomization also added uncertainty about the identification of the undistortedpatches during the experiment, which resulted in scores of slightly less than 1.0 for some undistorted patches.
3.3 Local Quality and Masking: Quantitative Analysis
In this section, the relationship of the local qualities and the detection thresholds are discussed using quantitativemeasures.
3.3.1 Local Quality Prediction using Threshold Elevation
Figure 6 shows the log local-image-quality versus target threshold elevation plots at five distortion levels (l =1, , 5). The log of local quality scores are shown for better visualization. The left side of the green dotted linedenote the below threshold region, and the right side of the green dotted line denote the supra-threshold region.Data were fitted using a sigmoid function,
log (LQ) =τ1 − τ2
1 + exp(−(∆C − τ3)/τ4)+ τ2, (6)
and the fitted curves are shown by using red solid lines in Figure 6. The fit parameters τ1, τ2, τ3, and τ4, arealso shown in Figure 6. From Figure 6 note that for lower distortion levels (l ≤ −13.7 dB), many patches werescored as perfect quality (LQ ≈ 1) even at the supra-threshold region. Specifically, note that at target contrastCl=1 = −25.7 dB and Cl=2 = −19.7 dB many patches were scored as perfect quality till 10 dB of thresholdelevation. At target contrast Cl=3 = −13.7 dB, patches were scored as perfect quality till 5 dB of thresholdelevation. Beyond target contrast −13.7 dB (Cl=4,5), very few patches were scored as perfect quality.
Furthermore from Figure 6 note that as the target contrast increases the fall-off from highest quality to lowerquality becomes wider. For example, for Cl=1 and Cl=2 the fall-off occurs at threshold elevations of around 17 dBand 20 dB, respectively. However, for Cl>2 the fall-off becomes wider and the relation between log local-qualityand threshold elevation becomes more linear.
3.3.2 Local Quality Prediction Performance using Threshold Elevation
The first research goal of this paper is to measure how far beyond near-threshold masking is valid for imagequality prediction. We quantified the prediction performance of local quality only by using threshold elevation
Prof. of SPIE-IS&T Vol. 9394 93940M-9
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
-10 0 10 20 30 40
-0.4
-0.3
-0.2
-0.1
0
log
L
Q
-10 0 10 20 30 40
-0.5
-0.4
-0.3
-0.2
-0.1
0
-10 0 10 20 30 40
-1
-0.8
-0.6
-0.4
-0.2
0
-10 0 10 20 30 40
-1.5
-1
-0.5
0
-10 0 10 20 30 40
-2
-1.5
-1
-0.5
0
0
Below
Threshold,
! < ",!
Supra
Threshold,
! > ",!
Threshold elevation, Δ = ! − ",!
(a) %&' = −25.7 dB
(b) %&( = −19.7 dB
(c) %&) = −13.7 dB
(d) %&* = −7.7 dB
(a) %&+ = −1.7 dB
At %&' = −25.7 dB:
• Perfect quality at below threshold, Δ < 0
• Perfect quality even at -/ = 4~64 dB
• Fall off approximately at 68 dB
• Sigmoid parameters:
:6 :; :? :@
-0.25 -0.018 17.48 0.19
At %&( = −19.7 dB:
• Perfect quality at below threshold, Δ < 0
• Perfect quality even at -/ = 4~64 dB
• Fall off approximately at ;4 dB
• Sigmoid parameters:
:6 :; :? :@
-0.58 -0.052 20.67 0.76
At %&) = −13.7 dB:
• Perfect quality at below threshold, A/ < 4
• Perfect quality even at -/ = 4~B dB
• Wider fall off between 10 dB at 20 dB
• Sigmoid parameters:
:6 :; :? :@
-1.14 -0.032 25.42 4.61
At %&* = −7.7 dB:
• Perfect quality at below threshold, Δ < 0
• Few perfect quality at Δ > 0 dB
• Much wider fall off than %&)
• Sigmoid parameters:
:6 :; :? :@
-0.11 -2.02 30.55 -7.81
At %&+ = −1.7 dB:
• Few patches at below threshold, Δ < 0
• No perfect quality at Δ > 0 dB
• Much wider fall off than %&*
• Sigmoid parameters:
:6 :; :? :@
-4.61 0.23 36.84 13.07
Figure 6. Relation between local image quality and target threshold elevation at five distortion levels (l = 1, ..., 5). Theleft side of the green dotted line denote the below threshold region, and the right side of the green dotted line denotethe supra-threshold region. Data were fitted using a sigmoid function, and the fitted curves are shown by using red solidlines. The fit parameters τ1, τ2, τ3, and τ4, are also shown.
by using a “percent-correct-prediction” measure. First, both the experiment local-quality scores and predicted
Prof. of SPIE-IS&T Vol. 9394 93940M-10
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
log_seaside elk
monument aerial_city
Experiment
LQM
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LQM
colorbar
Better
quality
��:
−25.7 dB
�!:
−19.7 dB
�":
−13.7 dB
�#:
−7.7 dB
�$:
−1.7 dB
94.4
100
100 97.2
100 97.2
100
86.1
94.4 94.4
72.2 94.4
97.2
80.6
94.4 88.9
72.2 88.9
80.6
69.4
97.2 75
80.6 80.6
61.1
41.7
75 58.3
66.7 72.2
94.4 100 94.4 94.4 94.4 83.3 86.1 94.4 80.6 72.2 88.9 61.1 66.7 77.8 55.6
100 100 88.9 86.1 77.8 94.4 80.6 69.4 83.3 86.1 77.8 75 58.3 69.4 66.7
��:
−25.7 dB
�!:
−19.7 dB
�":
−13.7 dB
�#:
−7.7 dB
�$:
−1.7 dB
��:
−25.7 dB
�!:
−19.7 dB
�":
−13.7 dB
�#:
−7.7 dB
�$:
−1.7 dB
��:
−25.7 dB
�!:
−19.7 dB
�":
−13.7 dB
�#:
−7.7 dB
�$:
−1.7 dB
��:
−25.7 dB
�!:
−19.7 dB
�":
−13.7 dB
�#:
−7.7 dB
�$:
−1.7 dB
��:
−25.7 dB
�!:
−19.7 dB
�":
−13.7 dB
�#:
−7.7 dB
�$:
−1.7 dB
Predicted
LQM (I)
Predicted
LQM (II)
% correct:
% correct:
Experiment
LQM
Predicted
LQM (I)
Predicted
LQM (II)
% correct:
% correct:
swarm
native_american
Figure 7. The predicted and experimental local quality maps (LQMs) for six images log seaside, swarm, elk, na-tive american, monument, and aerial city. For each image the top row shows the distorted images and the second row shows the local quality maps from the experiment. Third row shows the predicted local quality maps created by using sigmoid fitting on the threshold elevations measured from our previous masking experiment.18 Fourth row shows the predicted local quality maps created by using sigmoid fitting on the threshold elevations measured from Watson and Solomon’s masking model,21 which was fitted to predict our previous masking experiment data.18 The experiment data along with model codes are available in http://vision.okstate.edu/masking/.
Threshold elevation, Δ�
% C
orr
ect
loca
l
qual
ity p
redic
tion
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40 45
% Correct local quality prediction using
experiment detection threshold
% Correct local quality prediction using
masking model
Figure 8. Percent correct prediction of the local quality scores using only the threshold elevation. The green line indicatesthe prediction using experimental detection thresholds, and the red line indicates prediction using Watson and Solomon’smasking model21 which was fitted on our masking database.
local-quality (via the sigmoid fitting) scores for below-threshold region were made perfect-quality via,
LQ =
{1, ∆C ≤ 0
LQ, ∆C > 0,(7)
Prof. of SPIE-IS&T Vol. 9394 93940M-11
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
0 0.5 10
0.2
0.4
0.6
0.8
1
Global quality, GQ
GQ
!: 0.1
!: 1.8
!: 35
!
RMSE
between
GQ and GQ
100
102
10-2
10-1
100
!: 1.8,
RMSE: 0.072
(a) (b)
Figure 9. The global quality prediction for varying β: (a) scatter plot between GQ and GQ, and (b) root-mean-square-error
between GQ and GQ for varying β.
and,
LQ =
{1, ∆C ≤ 0
LQ, ∆C > 0.(8)
In the “percent-correct-prediction” measure, a correct prediction occur when LQ is within ±5% of LQ, and awrong prediction occur when LQ is outside of ±5% of LQ.
We have predicted the local quality via a sigmoid fitting by using the threshold elevations found directly fromour previous masking study.18 Furthermore, we predicted the local quality scores via sigmoid fitting by usingthe threshold elevations found from Watson and Solomon’s masking model which was trained on our maskingdatabase.18
Figure 7 shows the predicted and experimental local quality maps (LQMs) for six images log seaside, swarm,elk, native american, monument, and aerial city. For each image the top row shows the distorted images andthe second row shows the local quality maps from the experiment. Third row shows the predicted local qualitymaps from experimental detection threshold. Fourth row shows the predicted local quality maps from fit-ted Watson and Solomon’s masking model.21 The experiment data along with model codes are available inhttp://vision.okstate.edu/masking/.
From Figure 7 note that the percent correct prediction both using the experiment and model thresholdelevations are quite higher at lower target contrast levels, and decreases at higher target contrast levels. However,note that for most of the images, even at fourth level of distortion Cl=4 : −7.7 dB, masking thresholds alonecould predict the local quality scores more than 80% correctly.
Figure 8 summarizes the prediction performance of the local quality scores using only the threshold elevation.The green line indicates the prediction using experimental detection thresholds, and the red line indicates pre-diction using Watson and Solomon’s masking model21 which was fitted on our masking database. By observingFigure 8 we can summarize that masking predicted the local quality scores more than 95% correctly above 15dB threshold within 5% subject scores.
3.4 Relation between Local Quality and Global Quality
From the global-quality assessment experiment, we measured global quality scores of the 36 full-size images. Theglobal quality scores were particularly measured to explore the summing mechanism of the local quality scoresto generate global quality scores. For each of the 36 full-size images we measured 36 local quality scores from
Prof. of SPIE-IS&T Vol. 9394 93940M-12
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
the local quality assessment experiment. Although the local quality scores can be summed up using variousschemes,22,23 we used a simple summation of powered local-quality scores:
GQ =
36∑p=1
(|LQp|β
), (9)
where β is the power. β was optimized to achieve minimum prediction error. Figure 9(a) shows the scatter plots
of experiment global quality (GQ) and predicted global quality (GQ) for three different β. Figure 9(b) showsthe root-mean-square-error (RMSE) between the experiment and predicted global quality scores for varying β.We found that the best prediction performance occur at β = 1.8 (Spearman rank-order correlation 0.97 between
GQ and GQ at β = 1.8). Thus, an approximate squared summation of local-quality scores predicted the globalquality scores suitably. However, we currently have only 36 global-quality scores, and 1296 local quality scoresusing only one distortion type. More data at each distortion level would help explore the summation rule atvarying distortion levels.
4. CONCLUSIONS AND FUTURE WORKS
The local and global qualities of six images with six distortion levels were measured using subjective experiments.Gabor-noise was used as distortions in the quality-assessment experiments to be consistent with our previousstudy, in which the local RMS contrast detection thresholds of detecting the Gabor noise target were measuredat each spatial location of the undistorted images. Results of the experiment showed that masking predicted thelocal quality scores more than 95% correctly above 15 dB threshold within 5% subject scores. Furthermore, wefound that an approximate squared summation of local-quality scores predicted the global quality scores suitably(Spearman rank-order correlation 0.97). Our future work includes designing perceptual model to predict the localimage-quality scores via neural networks.
ACKNOWLEDGMENTS
This material is based upon work supported by, or in part by, the National Science Foundation, Grant Number1054612, and the U.S. Army Research Laboratory (USARL) and the U.S. Army Research Office (USARO) undercontract/grant number W911NF-10-1-0015.
REFERENCES
[1] Bovik, A. C., “Automatic prediction of perceptual image and video quality,” (2013).
[2] Sheikh, H. R., Sabir, M. F., and Bovik, A. C., “A statistical evaluation of recent full reference image qualityassessment algorithms,” Image Processing, IEEE Transactions on 15(11), 3440–3451 (2006).
[3] Larson, E. C. and Chandler, D. M., “Most apparent distortion: full-reference image quality assessment andthe role of strategy,” Journal of Electronic Imaging 19(1), 011006 (2010).
[4] Ponomarenko, N., Lukin, V., Zelensky, A., Egiazarian, K., Carli, M., and Battisti, F., “Tid2008-a databasefor evaluation of full-reference visual quality assessment metrics,” Advances of Modern Radioelectron-ics 10(4), 30–45 (2009).
[5] Chandler, D. M., Alam, M. M., and Phan, T. D., “Seven challenges for image quality research,” in[IS&T/SPIE Electronic Imaging ], 901402–901402, International Society for Optics and Photonics (2014).
[6] Chandler, D. M., “Seven challenges in image quality assessment: past, present, and future research,” ISRNSignal Processing 2013 (2013).
[7] Legge, G. E. and Foley, J. M., “Contrast masking in human vision,” J. of Opt. Soc. Am. 70, 1458–1470(1980).
[8] Daly, S. J., “Visible differences predictor: an algorithm for the assessment of image fidelity,” in [DigitalImages and Human Vision ], Watson, A. B., ed., 179–206 (1993).
[9] Heeger, D. J. and Teo, P. C., “A model of perceptual image fidelity,” in [Proceedings of the InternationalConference on Image Processing, 1995. ], 2, 343–345, IEEE (1995).
Prof. of SPIE-IS&T Vol. 9394 93940M-13
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
[10] Watson, A. B., Borthwick, R., and Taylor, M., “Image quality and entropy masking,” Proceedings ofSPIE 3016 (1997).
[11] Damera-Venkata, N., Kite, T. D., Geisler, W. S., Evans, B. L., and Bovik, A. C., “Image quality assessmentbased on a degradation model,” Image Processing, IEEE Transactions on 9(4), 636–650 (2000).
[12] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P., “Image quality assessment: from error visibilityto structural similarity,” Image Processing, IEEE Transactions on 13(4), 600–612 (2004).
[13] Chandler, D. M. and Hemami, S. S., “Vsnr: A wavelet-based visual signal-to-noise ratio for natural images,”IEEE Transactions on Image Processing 16(9), 2284–2298 (2007).
[14] Ninassi, A., Meur, O. L., Callet, P. L., Barba, D., et al., “On the performance of human visual system basedimage quality assessment metric using wavelet domain,” in [Proceedings of the SPIE Conference HumanVision and Electronic Imaging XIII ], 6806 (2008).
[15] Aydin, T. O., Cadık, M., Myszkowski, K., and Seidel, H.-P., “Video quality assessment for computer graphicsapplications,” in [ACM Transactions on Graphics (TOG) ], 29(6), 161, ACM (2010).
[16] Laparra, V., Munoz-Marı, J., and Malo, J., “Divisive normalization image quality metric revisited,” JOSAA 27(4), 852–864 (2010).
[17] Alam, M. M., Vilankar, K. P., and Chandler, D. M., “A database of local masking thresholds in natural im-ages,” in [IS&T/SPIE Electronic Imaging ], 86510G–86510G, International Society for Optics and Photonics(2013).
[18] Alam, M. M., Vilankar, K. P., Field, D. J., and Chandler, D. M., “Local masking in natu-ral images: A database and analysis,” Journal of vision 14(8), 22 (2014. Database available at:http://vision.okstate.edu/masking/).
[19] Moulden, B., Kingdom, F. A. A., and Gatley, L. F., “The standard deviation of luminance as a metric forcontrast in random-dot images,” Perception 19, 79–101 (1990).
[20] VQEG, “Final report from the video quality experts group on the validation of objective models of videoquality assessment, phase ii,” (August 2003). http://www.vqeg.org.
[21] Watson, A. B. and Solomon, J. A., “A model of visual contrast gain control and pattern masking,” J. ofOpt. Soc. Am. A 14(9), 2379–2391 (1997).
[22] Wang, Z. and Shang, X., “Spatial pooling strategies for perceptual image quality assessment,” in [ImageProcessing, 2006 IEEE International Conference on ], 2945–2948, IEEE (2006).
[23] Chandler, D. M. and Hemami, S. S., “Effects of natural images on the detectability of simple and compoundwavelet subband quantization distortions,” J. Opt. Soc. Am. A 20 (July 2003).
Prof. of SPIE-IS&T Vol. 9394 93940M-14
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx