perceptual image quality assessment using deep...
TRANSCRIPT
-
Objective image quality metric design is composed of feature crafting and pooling strategy selection.
CNNs are used in the literature to do both feature design and pooling. However, they overlook the local
information and spatial correlation by using a large pooling windows that covers most of the image patch.
Initially, we designed CNNs that are more sensitive to local information by using smaller pooling windows.
We also increased the level of abstraction by using more convolutional and learning layers but they did not
lead to successful subjective score perdition.
We decided to use SAEs to leverage spatially correlated local information instead of a direct pooling as in
the CNNs. SAEs are trained to pool quality attributes by learning the spatial pooling patterns . As an input
to the network, we downsample the images. Even after the downsampling, we still have redundancies and
we need a more descriptive representation of the image so SAE provide the low dimensional and
descriptive features that can be used to estimate the subjective quality of the images using Softmax
classifier.
Perceptual Image Quality Assessment Using Deep Networks
School of Electrical and Computer Engineering, Center for Signal and Information Processing - CSIP
PROBLEM MODEL VALIDATION/VERIFICATION MODEL - DATABASES [5-6]IMAGE QUALITY ASSESSMMENT FRAMEWORK
DoganCan Temel
Georgia Institute of Technology, USA
CONCLUSION
METRICS Jp2K Jpeg Wn Gblur FF All
Correlation (SCC)
Error 0.953 0.931 0.991 0.873 0.936 0.909
SSIM [1] 0.980 0.962 0.982 0.971 0.974 0.949
PerSIM [2] 0.977 0.958 0.991 0.973 0.945 0.950
SAE-Error 0.967 0.972 0.954 0.865 0.962 0.939
SAE-SSIM 0.951 0.956 0.931 0.955 0.950 0.927
SAE-PerSIM 0.936 0.927 0.923 0.932 0.903 0.908
CNN-based models that successfully work for no reference image quality assessment do not work for full reference image
quality assessment because of lack of structure in the pixel-wise error maps.
Stacked autoencoder-based pooling work successfully for residual (error) maps that have sparse structures but they can not
be directly used with quality/degradation attributes maps which already have hand-crafted feature characteristics.
Parameters including but not limited to sparsity, number of layers, resolution and optimization types are crucial in designing
the stacked autoencoders.
In the image quality modelling applications using stacked autoencoders, fine-tuning is important especially when we have
low sparsity requirements.
Pre-processing the training images such as decreasing the resolution and patch selection can be important since learning
the training set as it is can lead to overfitting.
Capture Store Transfer Display Perceive
http://www.cambridgeincolour.com/tutorials/i
mage-noise.htm
Color Calibration
Color
saturation
Color
quantization
with dither
Data Storage
Jpeg Jp2k Quantization
Data Communication
Jpeg
transmission
Jp2K
transmission
Data Acquisition
Non eccentricity
Sparse sampling and
reconstruction
Sparsity
Resolution
# Layers
DESIGN OF STACKED AUTOENCODERS
USING ERROR MAPS
DESIGN OF STACKED AUTOENCODERS
USING QUALITY ATTRIBUTES
RESULTS
SAE-PERSIM
REFERENCES
[1] Z. Wang, A. C.Bovik, H. R. Sheikh and E. P. Simoncelli, " Image Quality Assessment: From Error Visibility to Structural Similarity,” the IEEE ITransactionson Image Processing ,col. 13, no. 4. pp. 600-12, Apr. 2004
[2] D. Temel and G. AlRegib, " PerSIM: Multi-Resolution Image Quality Assessment in the Perceptually Uniform Color Domain,” the IEEE International Conference on Image Processing ,Quebec, Canada, Sept. 27-30, 2015
[3] L. Kang, Y. Peng, Y. Li, and D. Doermann, “Convolutional Neural Networks for No-Reference Image Quality Assessment, Computer Vision and Pattern Recognition, Jun. 2014.
[4] A. Chetouani, A. Beghdadi, S. Chen, and G. Mostafaoui, “A Novel Free Reference Image Quality Metric Using Neural Network Approach ,” Int. Workshop Video Process, Qual. Metrics Cons. Electron, 2010.
[5] N. Ponomarenko, O. Leremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. Jay Kuo, “Color image database TID2013: Peculiarities and preliminary,” Proceedings
of 4th Europian Workshop on Visual Information Processing, 2013.
[6] H.R. Sheikh, Z.Wang, L. Cormack and A.C. Bovik, "LIVE Image Quality Assessment Database Release 2", http://live.ece.utexas.edu/research/quality.
SAE-SSIM SAE-ERROR
Optimization Type
FRReference Image
Distorted Image
Residual
Images
(data)
Stacked
Autoencoder
Image Scores (labels)
Image Quality
Assessment (IQA)
Model
Training Framework
FRReference Image
Distorted Image
Residual
Images
(data)
IQA
ModelQuality
scores
Testing Framework labels
Validation Metric
Performance
STACKED AUTOENCODER BASED LEARNING FRAMEWORK
SAE structure
[𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5, 𝑥6, … , 𝑥𝑁1]: Vectorized input image (grayscale)
[𝑦1, 𝑦2, 𝑦3, 𝑦4, 𝑦5, 𝑦6, … , 𝑦𝑁2]: Quality score vector
[ℎ1𝑛, ℎ2
𝑛, ℎ3𝑛, ℎ4
𝑛, ℎ5𝑛, ℎ6
𝑛 … , ℎ𝐻𝑛𝑛 ]: Hidden units
𝑁1: Number of pixels in the input image
𝐻𝑛: Number hidden units in the 𝑛𝑡ℎ level
𝑁2: Number of quality levels
In the best performing design, 𝑁1 = 625,𝐻1 = 𝐻2 = 𝐻3 = 100,𝑁2 = 100
Full-reference (FR)
Difference
SAE
Databases Patches (#) Images (#)
LIVE (Full Set) 188,544 982
TID2013 (Full Set) 576,000 3,000
Test Set- LIVE 11,784 982
Train Set - TID2013 27,000 3,000
Training with the full set leads to overfitting.
Downsampled and randomized patches are
used to avoid overfitting.
LITERATURE
VERSUS
THE PROPOSED APPROACH
SAE-PerSIM
SAE-SSIM
State of the art
Proposed approach
Most of the learning-based IQA models are not solely data-driven.
The authors in[1] propose a CNN-based approach. However, they
overlook local information by using a 26x26 pooling windows for a
32x32 feature map.
SAE-based: not used in the analyzed image quality literature before.
SAE-based state of the art feature pooling: Almost all of the state of the
art quality metrics use mean pooling over the full resolution feature maps.
However, we can use SAEs to obtain more descriptive representations of
the quality features to map the distortion/quality maps to final scores.
ARCHITECTURE (WHY SAE?)
Spearman Correlation
Coefficient (SCC)
1 −6 𝑖=1
𝑁 𝑥𝑖 − 𝑦𝑖2
𝑁(𝑁2 − 1)
𝑋𝑖 , 𝑌𝑖 𝑥𝑖 , 𝑦𝑖
http://www.cambridgeincolour.com/tutorials/image-noise.htm