perceptual image quality assessment using deep...

1
Objective image quality metric design is composed of feature crafting and pooling strategy selection. CNNs are used in the literature to do both feature design and pooling. However, they overlook the local information and spatial correlation by using a large pooling windows that covers most of the image patch. Initially, we designed CNNs that are more sensitive to local information by using smaller pooling windows. We also increased the level of abstraction by using more convolutional and learning layers but they did not lead to successful subjective score perdition. We decided to use SAEs to leverage spatially correlated local information instead of a direct pooling as in the CNNs. SAEs are trained to pool quality attributes by learning the spatial pooling patterns . As an input to the network, we downsample the images. Even after the downsampling, we still have redundancies and we need a more descriptive representation of the image so SAE provide the low dimensional and descriptive features that can be used to estimate the subjective quality of the images using Softmax classifier. Perceptual Image Quality Assessment Using Deep Networks School of Electrical and Computer Engineering, Center for Signal and Information Processing - CSIP PROBLEM MODEL VALIDATION/VERIFICATION MODEL - DATABASES [5-6] IMAGE QUALITY ASSESSMMENT FRAMEWORK [email protected] DoganCan Temel Georgia Institute of Technology, USA CONCLUSION METRICS Jp2K Jpeg Wn Gblur FF All Correlation (SCC) Error 0.953 0.931 0.991 0.873 0.936 0.909 SSIM [1] 0.980 0.962 0.982 0.971 0.974 0.949 PerSIM [2] 0.977 0.958 0.991 0.973 0.945 0.950 SAE-Error 0.967 0.972 0.954 0.865 0.962 0.939 SAE-SSIM 0.951 0.956 0.931 0.955 0.950 0.927 SAE-PerSIM 0.936 0.927 0.923 0.932 0.903 0.908 CNN-based models that successfully work for no reference image quality assessment do not work for full reference image quality assessment because of lack of structure in the pixel-wise error maps. Stacked autoencoder-based pooling work successfully for residual (error) maps that have sparse structures but they can not be directly used with quality/degradation attributes maps which already have hand-crafted feature characteristics. Parameters including but not limited to sparsity, number of layers, resolution and optimization types are crucial in designing the stacked autoencoders. In the image quality modelling applications using stacked autoencoders, fine-tuning is important especially when we have low sparsity requirements. Pre-processing the training images such as decreasing the resolution and patch selection can be important since learning the training set as it is can lead to overfitting. Capture Store Transfer Display Perceive http:// www.cambridgeincolour.com/tutorials/i mage-noise.htm Color Calibration Color saturation Color quantization with dither Data Storage Jpeg Jp2k Quantization Data Communication Jpeg transmission Jp2K transmission Data Acquisition Non eccentricity Sparse sampling and reconstruction Sparsity Resolution # Layers DESIGN OF STACKED AUTOENCODERS USING ERROR MAPS DESIGN OF STACKED AUTOENCODERS USING QUALITY ATTRIBUTES RESULTS SAE-PERSIM REFERENCES [1] Z. Wang, A. C.Bovik, H. R. Sheikh and E. P. Simoncelli, " Image Quality Assessment: From Error Visibility to Structural Similarity,” the IEEE ITransactionson Image Processing ,col. 13, no. 4. pp. 600-12, Apr. 2004 [2] D. Temel and G. AlRegib, " PerSIM: Multi-Resolution Image Quality Assessment in the Perceptually Uniform Color Domain,” the IEEE International Conference on Image Processing ,Quebec, Canada, Sept. 27-30, 2015 [3] L. Kang, Y. Peng, Y. Li, and D. Doermann, “Convolutional Neural Networks for No-Reference Image Quality Assessment, Computer Vision and Pattern Recognition, Jun. 2014. [4] A. Chetouani, A. Beghdadi, S. Chen, and G. Mostafaoui, “A Novel Free Reference Image Quality Metric Using Neural Network Approach ,” Int. Workshop Video Process, Qual. Metrics Cons. Electron, 2010. [5] N. Ponomarenko, O. Leremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. Jay Kuo, Color image database TID2013: Peculiarities and preliminary,” Proceedings of 4th Europian Workshop on Visual Information Processing, 2013. [6] H.R. Sheikh, Z.Wang, L. Cormack and A.C. Bovik, "LIVE Image Quality Assessment Database Release 2", http://live.ece.utexas.edu/research/quality. SAE-SSIM SAE-ERROR Optimization Type FR Reference Image Distorted Image Residual Images (data) Stacked Autoencoder Image Scores (labels) Image Quality Assessment (IQA) Model Training Framework FR Reference Image Distorted Image Residual Images (data) IQA Model Quality scores Testing Framework labels Validation Metric Performance STACKED AUTOENCODER BASED LEARNING FRAMEWORK SAE structure [ 1 , 2 , 3 , 4 , 5 , 6 ,…, 1 ]: Vectorized input image (grayscale) [ 1 , 2 , 3 , 4 , 5 , 6 ,…, 2 ]: Quality score vector [ℎ 1 ,ℎ 2 ,ℎ 3 ,ℎ 4 ,ℎ 5 ,ℎ 6 …,ℎ ]: Hidden units 1 : Number of pixels in the input image : Number hidden units in the level 2 : Number of quality levels In the best performing design, 1 = 625, 1 = 2 = 3 = 100, 2 = 100 Full-reference (FR) Difference SAE Databases Patches (#) Images (#) LIVE (Full Set) 188,544 982 TID2013 (Full Set) 576,000 3,000 Test Set- LIVE 11,784 982 Train Set - TID2013 27,000 3,000 Training with the full set leads to overfitting. Downsampled and randomized patches are used to avoid overfitting. LITERATURE VERSUS THE PROPOSED APPROACH SAE-PerSIM SAE-SSIM State of the art Proposed approach Most of the learning-based IQA models are not solely data-driven. The authors in[1] propose a CNN-based approach. However, they overlook local information by using a 26x26 pooling windows for a 32x32 feature map. SAE-based: not used in the analyzed image quality literature before. SAE-based state of the art feature pooling: Almost all of the state of the art quality metrics use mean pooling over the full resolution feature maps. However, we can use SAEs to obtain more descriptive representations of the quality features to map the distortion/quality maps to final scores. ARCHITECTURE (WHY SAE?) Spearman Correlation Coefficient (SCC) 1− 6 =1 2 ( 2 − 1) , ,

Upload: others

Post on 22-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Objective image quality metric design is composed of feature crafting and pooling strategy selection.

    CNNs are used in the literature to do both feature design and pooling. However, they overlook the local

    information and spatial correlation by using a large pooling windows that covers most of the image patch.

    Initially, we designed CNNs that are more sensitive to local information by using smaller pooling windows.

    We also increased the level of abstraction by using more convolutional and learning layers but they did not

    lead to successful subjective score perdition.

    We decided to use SAEs to leverage spatially correlated local information instead of a direct pooling as in

    the CNNs. SAEs are trained to pool quality attributes by learning the spatial pooling patterns . As an input

    to the network, we downsample the images. Even after the downsampling, we still have redundancies and

    we need a more descriptive representation of the image so SAE provide the low dimensional and

    descriptive features that can be used to estimate the subjective quality of the images using Softmax

    classifier.

    Perceptual Image Quality Assessment Using Deep Networks

    School of Electrical and Computer Engineering, Center for Signal and Information Processing - CSIP

    PROBLEM MODEL VALIDATION/VERIFICATION MODEL - DATABASES [5-6]IMAGE QUALITY ASSESSMMENT FRAMEWORK

    [email protected]

    DoganCan Temel

    Georgia Institute of Technology, USA

    CONCLUSION

    METRICS Jp2K Jpeg Wn Gblur FF All

    Correlation (SCC)

    Error 0.953 0.931 0.991 0.873 0.936 0.909

    SSIM [1] 0.980 0.962 0.982 0.971 0.974 0.949

    PerSIM [2] 0.977 0.958 0.991 0.973 0.945 0.950

    SAE-Error 0.967 0.972 0.954 0.865 0.962 0.939

    SAE-SSIM 0.951 0.956 0.931 0.955 0.950 0.927

    SAE-PerSIM 0.936 0.927 0.923 0.932 0.903 0.908

    CNN-based models that successfully work for no reference image quality assessment do not work for full reference image

    quality assessment because of lack of structure in the pixel-wise error maps.

    Stacked autoencoder-based pooling work successfully for residual (error) maps that have sparse structures but they can not

    be directly used with quality/degradation attributes maps which already have hand-crafted feature characteristics.

    Parameters including but not limited to sparsity, number of layers, resolution and optimization types are crucial in designing

    the stacked autoencoders.

    In the image quality modelling applications using stacked autoencoders, fine-tuning is important especially when we have

    low sparsity requirements.

    Pre-processing the training images such as decreasing the resolution and patch selection can be important since learning

    the training set as it is can lead to overfitting.

    Capture Store Transfer Display Perceive

    http://www.cambridgeincolour.com/tutorials/i

    mage-noise.htm

    Color Calibration

    Color

    saturation

    Color

    quantization

    with dither

    Data Storage

    Jpeg Jp2k Quantization

    Data Communication

    Jpeg

    transmission

    Jp2K

    transmission

    Data Acquisition

    Non eccentricity

    Sparse sampling and

    reconstruction

    Sparsity

    Resolution

    # Layers

    DESIGN OF STACKED AUTOENCODERS

    USING ERROR MAPS

    DESIGN OF STACKED AUTOENCODERS

    USING QUALITY ATTRIBUTES

    RESULTS

    SAE-PERSIM

    REFERENCES

    [1] Z. Wang, A. C.Bovik, H. R. Sheikh and E. P. Simoncelli, " Image Quality Assessment: From Error Visibility to Structural Similarity,” the IEEE ITransactionson Image Processing ,col. 13, no. 4. pp. 600-12, Apr. 2004

    [2] D. Temel and G. AlRegib, " PerSIM: Multi-Resolution Image Quality Assessment in the Perceptually Uniform Color Domain,” the IEEE International Conference on Image Processing ,Quebec, Canada, Sept. 27-30, 2015

    [3] L. Kang, Y. Peng, Y. Li, and D. Doermann, “Convolutional Neural Networks for No-Reference Image Quality Assessment, Computer Vision and Pattern Recognition, Jun. 2014.

    [4] A. Chetouani, A. Beghdadi, S. Chen, and G. Mostafaoui, “A Novel Free Reference Image Quality Metric Using Neural Network Approach ,” Int. Workshop Video Process, Qual. Metrics Cons. Electron, 2010.

    [5] N. Ponomarenko, O. Leremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. Jay Kuo, “Color image database TID2013: Peculiarities and preliminary,” Proceedings

    of 4th Europian Workshop on Visual Information Processing, 2013.

    [6] H.R. Sheikh, Z.Wang, L. Cormack and A.C. Bovik, "LIVE Image Quality Assessment Database Release 2", http://live.ece.utexas.edu/research/quality.

    SAE-SSIM SAE-ERROR

    Optimization Type

    FRReference Image

    Distorted Image

    Residual

    Images

    (data)

    Stacked

    Autoencoder

    Image Scores (labels)

    Image Quality

    Assessment (IQA)

    Model

    Training Framework

    FRReference Image

    Distorted Image

    Residual

    Images

    (data)

    IQA

    ModelQuality

    scores

    Testing Framework labels

    Validation Metric

    Performance

    STACKED AUTOENCODER BASED LEARNING FRAMEWORK

    SAE structure

    [𝑥1, 𝑥2, 𝑥3, 𝑥4, 𝑥5, 𝑥6, … , 𝑥𝑁1]: Vectorized input image (grayscale)

    [𝑦1, 𝑦2, 𝑦3, 𝑦4, 𝑦5, 𝑦6, … , 𝑦𝑁2]: Quality score vector

    [ℎ1𝑛, ℎ2

    𝑛, ℎ3𝑛, ℎ4

    𝑛, ℎ5𝑛, ℎ6

    𝑛 … , ℎ𝐻𝑛𝑛 ]: Hidden units

    𝑁1: Number of pixels in the input image

    𝐻𝑛: Number hidden units in the 𝑛𝑡ℎ level

    𝑁2: Number of quality levels

    In the best performing design, 𝑁1 = 625,𝐻1 = 𝐻2 = 𝐻3 = 100,𝑁2 = 100

    Full-reference (FR)

    Difference

    SAE

    Databases Patches (#) Images (#)

    LIVE (Full Set) 188,544 982

    TID2013 (Full Set) 576,000 3,000

    Test Set- LIVE 11,784 982

    Train Set - TID2013 27,000 3,000

    Training with the full set leads to overfitting.

    Downsampled and randomized patches are

    used to avoid overfitting.

    LITERATURE

    VERSUS

    THE PROPOSED APPROACH

    SAE-PerSIM

    SAE-SSIM

    State of the art

    Proposed approach

    Most of the learning-based IQA models are not solely data-driven.

    The authors in[1] propose a CNN-based approach. However, they

    overlook local information by using a 26x26 pooling windows for a

    32x32 feature map.

    SAE-based: not used in the analyzed image quality literature before.

    SAE-based state of the art feature pooling: Almost all of the state of the

    art quality metrics use mean pooling over the full resolution feature maps.

    However, we can use SAEs to obtain more descriptive representations of

    the quality features to map the distortion/quality maps to final scores.

    ARCHITECTURE (WHY SAE?)

    Spearman Correlation

    Coefficient (SCC)

    1 −6 𝑖=1

    𝑁 𝑥𝑖 − 𝑦𝑖2

    𝑁(𝑁2 − 1)

    𝑋𝑖 , 𝑌𝑖 𝑥𝑖 , 𝑦𝑖

    http://www.cambridgeincolour.com/tutorials/image-noise.htm