perceptual image quality assessment using deep...

Objective image quality metric design is composed of feature crafting and pooling strategy selection.

CNNs are used in the literature to do both feature design and pooling. However, they overlook the local

information and spatial correlation by using a large pooling windows that covers most of the image patch.

Initially, we designed CNNs that are more sensitive to local information by using smaller pooling windows.

We also increased the level of abstraction by using more convolutional and learning layers but they did not

lead to successful subjective score perdition.

We decided to use SAEs to leverage spatially correlated local information instead of a direct pooling as in

the CNNs. SAEs are trained to pool quality attributes by learning the spatial pooling patterns . As an input

to the network, we downsample the images. Even after the downsampling, we still have redundancies and

we need a more descriptive representation of the image so SAE provide the low dimensional and

descriptive features that can be used to estimate the subjective quality of the images using Softmax

classifier.

Perceptual Image Quality Assessment Using Deep Networks

School of Electrical and Computer Engineering, Center for Signal and Information Processing - CSIP

PROBLEM MODEL VALIDATION/VERIFICATION MODEL - DATABASES [5-6]IMAGE QUALITY ASSESSMMENT FRAMEWORK

[email protected]

DoganCan Temel

Georgia Institute of Technology, USA

CONCLUSION

METRICS Jp2K Jpeg Wn Gblur FF All

Correlation (SCC)

Error 0.953 0.931 0.991 0.873 0.936 0.909

SSIM [1] 0.980 0.962 0.982 0.971 0.974 0.949

PerSIM [2] 0.977 0.958 0.991 0.973 0.945 0.950

SAE-Error 0.967 0.972 0.954 0.865 0.962 0.939

SAE-SSIM 0.951 0.956 0.931 0.955 0.950 0.927

SAE-PerSIM 0.936 0.927 0.923 0.932 0.903 0.908

CNN-based models that successfully work for no reference image quality assessment do not work for full reference image

quality assessment because of lack of structure in the pixel-wise error maps.

Stacked autoencoder-based pooling work successfully for residual (error) maps that have sparse structures but they can not

be directly used with quality/degradation attributes maps which already have hand-crafted feature characteristics.

Parameters including but not limited to sparsity, number of layers, resolution and optimization types are crucial in designing

the stacked autoencoders.

In the image quality modelling applications using stacked autoencoders, fine-tuning is important especially when we have

low sparsity requirements.

Pre-processing the training images such as decreasing the resolution and patch selection can be important since learning

the training set as it is can lead to overfitting.

Capture Store Transfer Display Perceive

http://www.cambridgeincolour.com/tutorials/i

mage-noise.htm

Color Calibration

Color

saturation

Color

quantization

with dither

Data Storage

Jpeg Jp2k Quantization

Data Communication

Jpeg

transmission

Jp2K

transmission

Data Acquisition

Non eccentricity

Sparse sampling and

reconstruction

Sparsity

Resolution

# Layers

DESIGN OF STACKED AUTOENCODERS

USING ERROR MAPS

DESIGN OF STACKED AUTOENCODERS

USING QUALITY ATTRIBUTES

RESULTS

SAE-PERSIM

REFERENCES

[1] Z. Wang, A. C.Bovik, H. R. Sheikh and E. P. Simoncelli, " Image Quality Assessment: From Error Visibility to Structural Similarity,” the IEEE ITransactionson Image Processing ,col. 13, no. 4. pp. 600-12, Apr. 2004

[2] D. Temel and G. AlRegib, " PerSIM: Multi-Resolution Image Quality Assessment in the Perceptually Uniform Color Domain,” the IEEE International Conference on Image Processing ,Quebec, Canada, Sept. 27-30, 2015

[3] L. Kang, Y. Peng, Y. Li, and D. Doermann, “Convolutional Neural Networks for No-Reference Image Quality Assessment, Computer Vision and Pattern Recognition, Jun. 2014.

[4] A. Chetouani, A. Beghdadi, S. Chen, and G. Mostafaoui, “A Novel Free Reference Image Quality Metric Using Neural Network Approach ,” Int. Workshop Video Process, Qual. Metrics Cons. Electron, 2010.

[5] N. Ponomarenko, O. Leremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. Jay Kuo, “Color image database TID2013: Peculiarities and preliminary,” Proceedings

of 4th Europian Workshop on Visual Information Processing, 2013.

[6] H.R. Sheikh, Z.Wang, L. Cormack and A.C. Bovik, "LIVE Image Quality Assessment Database Release 2", http://live.ece.utexas.edu/research/quality.

SAE-SSIM SAE-ERROR

Optimization Type

FRReference Image

Distorted Image

Residual

Images

(data)

Stacked

Autoencoder

Image Scores (labels)

Image Quality

Assessment (IQA)

Model

Training Framework

FRReference Image

Distorted Image

Residual

Images

(data)

IQA

ModelQuality

scores

Testing Framework labels

Validation Metric

Performance

STACKED AUTOENCODER BASED LEARNING FRAMEWORK

SAE structure

[𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5, 𝑥6, … , 𝑥𝑁1]: Vectorized input image (grayscale)

[𝑦1, 𝑦2, 𝑦3, 𝑦4, 𝑦5, 𝑦6, … , 𝑦𝑁2]: Quality score vector

[ℎ1𝑛, ℎ2

𝑛, ℎ3𝑛, ℎ4

𝑛, ℎ5𝑛, ℎ6

𝑛 … , ℎ𝐻𝑛𝑛 ]: Hidden units

𝑁1: Number of pixels in the input image

𝐻𝑛: Number hidden units in the 𝑛𝑡ℎ level

𝑁2: Number of quality levels

In the best performing design, 𝑁1 = 625,𝐻1 = 𝐻2 = 𝐻3 = 100,𝑁2 = 100

Full-reference (FR)

Difference

SAE

Databases Patches (#) Images (#)

LIVE (Full Set) 188,544 982

TID2013 (Full Set) 576,000 3,000

Test Set- LIVE 11,784 982

Train Set - TID2013 27,000 3,000

Training with the full set leads to overfitting.

Downsampled and randomized patches are

used to avoid overfitting.

LITERATURE

VERSUS

THE PROPOSED APPROACH

SAE-PerSIM

SAE-SSIM

State of the art

Proposed approach

Most of the learning-based IQA models are not solely data-driven.

The authors in[1] propose a CNN-based approach. However, they

overlook local information by using a 26x26 pooling windows for a

32x32 feature map.

SAE-based: not used in the analyzed image quality literature before.

SAE-based state of the art feature pooling: Almost all of the state of the

art quality metrics use mean pooling over the full resolution feature maps.

However, we can use SAEs to obtain more descriptive representations of

the quality features to map the distortion/quality maps to final scores.

ARCHITECTURE (WHY SAE?)

Spearman Correlation

Coefficient (SCC)

1 −6 𝑖=1

𝑁 𝑥𝑖 − 𝑦𝑖2

𝑁(𝑁2 − 1)

𝑋𝑖 , 𝑌𝑖 𝑥𝑖 , 𝑦𝑖
http://www.cambridgeincolour.com/tutorials/image-noise.htm

perceptual image quality assessment using deep...

Documents