perceptuallymotivatedmethodforimageinpaintingcomparison ...ceur-ws.org/vol-2485/paper30.pdfeach...

5
Perceptually Motivated Method for Image Inpainting Comparison I.A. Molodetskikh 1 , M.V. Erofeev 1 , D.S. Vatolin 1 [email protected]|[email protected]|[email protected] 1 Lomonosov Moscow State University, Moscow, Russia The field of automatic image inpainting has progressed rapidly in recent years, but no one has yet proposed a standard method of evaluating algorithms. This absence is due to the problem’s challenging nature: imageinpainting algorithms strive for realism in the resulting images, but realism is a subjective concept intrinsic to human perception. Existing objective imagequality metrics provide a poor approximation of what humans consider more or less realistic. To improve the situation and to better organize both prior and future research in this field, we conducted a subjective comparison of nine stateoftheart inpainting algorithms and propose objective quality metrics that exhibit high correlation with the results of our comparison. Keywords: image inpainting, objective quality metric, quality perception, subjective evaluation, deep learning. 1. Introduction Image inpainting, or hole filling, is the task of filling in missing parts of an image. Given an incomplete image and a hole mask, an inpainting algorithm must generate the missing parts so that the result looks realistic. Inpainting is a widely researched topic. Many classical algorithms have been proposed [5, 26], but over the past few years most re search has focused on using deep neural networks to solve this problem [12, 16, 17, 19, 23, 31, 32]. Because of the many avenues of research in this field, the need to evaluate algorithms emerges. The goal of an inpainting algorithm is to make the final image as realis tic as possible, but image realism is a concept intrinsic to humans. Therefore, the most accurate way to evaluate an algorithm’s performance is a subjective experiment where many participants compare the outcomes of different algo rithms and choose the one they consider the most realistic. Unfortunately, conducting a subjective experiment in volves considerable time and resources, so many authors re sort to evaluating their proposed methods using traditional objective imagesimilarity metrics such as PSNR, SSIM and mean l 2 loss relative to the groundtruth image. This strategy, however, is inadequate. One reason is that eval uation by measuring similarity to the groundtruth image assumes that only a single, best inpainting result exists—a false assumption in most cases. Thus, a perceptually motivated objective metric for inpaintingquality assessment is desirable. The objective metric should approximate the notion of image realism and yield results similar to those of a subjective study when comparing outputs from different algorithms. We conducted a subjective evaluation of nine stateof theart classical and deeplearningbased approaches to im age inpainting. Using the results, we examine different methods of objective inpaintingquality evaluation, includ ing both fullreference methods (taking both the resulting image and the groundtruth image as an input) and no reference methods (taking the resulting image as an input). 2. Related work Little work has been done on objective image inpaintingquality evaluation or on inpainting detection in general. The somewhat related field of manipulatedimage detection has seen moderate research, including both classi cal and deeplearningbased approaches. This field focuses on detecting altered image regions, usually involving a set of common manipulations: copymove (copying an image fragment and pasting it elsewhere in the same image), splic ing (pasting a fragment from another image), fragment re moval (deleting an image fragment and then performing ei ther a copymove or inpainting to fill in the missing area), various effects such as Gaussian blur, and recompression. Among these manipulations, the most interesting for this work is fragment removal with inpainting. The approaches to imagemanipulation detection can be divided into classical [13, 20], and deeplearningbased approaches [2, 21, 34, 35]. These algorithms aim to locate the manipulated image regions by outputting a mask or a set of bounding boxes enclosing suspicious regions. Unfortu nately, they are not directly applicable to inpaintingquality estimation because they have a different goal: whereas an objective qualityestimation metric should strive to accu rately compare realistically inpainted images similar to the originals, a forgerydetection algorithm should strive to ac curately tell one apart from the other. 3. Inpainting subjective evaluation The gold standard for evaluating imageinpainting al gorithms is human perception, since each algorithm strives to produce images that look the most realistic to hu mans. Thus, to obtain a baseline for creating an objective inpaintingquality metric, we conducted a subjective evalu ation of multiple stateoftheart algorithms, including both classical and deeplearningbased ones. To assess the over all quality and applicability of the current approaches and to see how they compare with manual photo editing, we also asked professional photo editors to fill in missing re gions of the test photos. 3.1 Test data set Since human photo editors were to perform inpainting, our data set could not include publicly available images. We therefore created our own private set of test images by taking photographs of various outdoor scenes, which are the most likely target for inpainting. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Upload: others

Post on 14-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PerceptuallyMotivatedMethodforImageInpaintingComparison ...ceur-ws.org/Vol-2485/paper30.pdfEach testimage was512× 512pixelswith a square hole in the middle measuring 180×180 pixels.W

Perceptually Motivated Method for Image Inpainting ComparisonI.A. Molodetskikh1, M.V. Erofeev1, D.S. Vatolin1

[email protected]|[email protected]|[email protected] Moscow State University, Moscow, Russia

The field of automatic image inpainting has progressed rapidly in recent years, but no one has yet proposed a standardmethod of evaluating algorithms. This absence is due to the problem’s challenging nature: image­inpainting algorithmsstrive for realism in the resulting images, but realism is a subjective concept intrinsic to human perception. Existing objectiveimage­quality metrics provide a poor approximation of what humans consider more or less realistic.

To improve the situation and to better organize both prior and future research in this field, we conducted a subjectivecomparison of nine state­of­the­art inpainting algorithms and propose objective quality metrics that exhibit high correlationwith the results of our comparison.

Keywords: image inpainting, objective quality metric, quality perception, subjective evaluation, deep learning.

1. Introduction

Image inpainting, or hole filling, is the task of fillingin missing parts of an image. Given an incomplete imageand a hole mask, an inpainting algorithm must generate themissing parts so that the result looks realistic. Inpainting isa widely researched topic. Many classical algorithms havebeen proposed [5, 26], but over the past few years most re­search has focused on using deep neural networks to solvethis problem [12, 16, 17, 19, 23, 31, 32].

Because of the many avenues of research in this field,the need to evaluate algorithms emerges. The goal of aninpainting algorithm is to make the final image as realis­tic as possible, but image realism is a concept intrinsic tohumans. Therefore, the most accurate way to evaluate analgorithm’s performance is a subjective experiment wheremany participants compare the outcomes of different algo­rithms and choose the one they consider the most realistic.

Unfortunately, conducting a subjective experiment in­volves considerable time and resources, somany authors re­sort to evaluating their proposed methods using traditionalobjective image­similarity metrics such as PSNR, SSIMand mean l2 loss relative to the ground­truth image. Thisstrategy, however, is inadequate. One reason is that eval­uation by measuring similarity to the ground­truth imageassumes that only a single, best inpainting result exists—afalse assumption in most cases.

Thus, a perceptually motivated objective metric forinpainting­quality assessment is desirable. The objectivemetric should approximate the notion of image realism andyield results similar to those of a subjective study whencomparing outputs from different algorithms.

We conducted a subjective evaluation of nine state­of­the­art classical and deep­learning­based approaches to im­age inpainting. Using the results, we examine differentmethods of objective inpainting­quality evaluation, includ­ing both full­reference methods (taking both the resultingimage and the ground­truth image as an input) and no­reference methods (taking the resulting image as an input).

2. Related work

Little work has been done on objective imageinpainting­quality evaluation or on inpainting detection ingeneral. The somewhat related field of manipulated­image

detection has seenmoderate research, including both classi­cal and deep­learning­based approaches. This field focuseson detecting altered image regions, usually involving a setof common manipulations: copy­move (copying an imagefragment and pasting it elsewhere in the same image), splic­ing (pasting a fragment from another image), fragment re­moval (deleting an image fragment and then performing ei­ther a copy­move or inpainting to fill in the missing area),various effects such as Gaussian blur, and recompression.Among these manipulations, the most interesting for thiswork is fragment removal with inpainting.

The approaches to image­manipulation detection canbe divided into classical [13, 20], and deep­learning­basedapproaches [2, 21, 34, 35]. These algorithms aim to locatethemanipulated image regions by outputting amask or a setof bounding boxes enclosing suspicious regions. Unfortu­nately, they are not directly applicable to inpainting­qualityestimation because they have a different goal: whereas anobjective quality­estimation metric should strive to accu­rately compare realistically inpainted images similar to theoriginals, a forgery­detection algorithm should strive to ac­curately tell one apart from the other.

3. Inpainting subjective evaluation

The gold standard for evaluating image­inpainting al­gorithms is human perception, since each algorithm strivesto produce images that look the most realistic to hu­mans. Thus, to obtain a baseline for creating an objectiveinpainting­quality metric, we conducted a subjective evalu­ation of multiple state­of­the­art algorithms, including bothclassical and deep­learning­based ones. To assess the over­all quality and applicability of the current approaches andto see how they compare with manual photo editing, wealso asked professional photo editors to fill in missing re­gions of the test photos.

3.1 Test data set

Since human photo editors were to perform inpainting,our data set could not include publicly available images.We therefore created our own private set of test images bytaking photographs of various outdoor scenes, which arethe most likely target for inpainting.

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Page 2: PerceptuallyMotivatedMethodforImageInpaintingComparison ...ceur-ws.org/Vol-2485/paper30.pdfEach testimage was512× 512pixelswith a square hole in the middle measuring 180×180 pixels.W

Fig. 1. Images for the subjective inpainting comparison. The black square in the center is the area to be inpainted.

0 1 2 3 4 5Subjective Score

Deep Image PriorShift­Net

Globally and Locally ConsistentHigh­Resolution Inpainting

Partial ConvolutionsGenerative Inpainting (ImageNet)

Exemplar­Based (patch size 9)Exemplar­Based (patch size 13)

Statistics of Patch OffsetsContent­Aware Fill

Generative Inpainting (Places2)Artist #1Artist #3Artist #2

Ground Truth

Overall (3 images)

Ground TruthHuman Artist

Classical MethodDeep Learning­Based Method

Fig. 2. Subjective­comparison results across three images inpainted by human artists.

Each test image was 512 × 512 pixels with a square hole in the middle measuring 180×180 pixels. We chose a square instead of a free­form shape because one algorithm in our comparison [30] lacks the ability to fill in free­form holes. The data set comprised 33 images in total. Fig. 1 shows examples.

3.2 Inpainting methodsWe evaluated three classical [1, 5, 7] and six deep­

learning­based approaches [10, 16, 27, 29, 30, 32]. Ad­ditionally, we hired three professional photo­restoration and photo­retouching artists to manually inpaint three ran­domly selected images from our test data set.

3.3 Test methodThe subjective evaluation took place through the

http://subjectify.us platform. Human observers were shown pairs of images and asked to pick from each pair the one they found most realistic. Each pair consisted of two different inpainting results for the same picture (the set also contained the original image). In total, 6945 valid pairwise judgements were collected from 215 participants.

The judgements were then used to fit a Bradley­Terry model [3]. The resulting subjective scores maximize like­lihood given the pairwise judgements.

3.4 Results of the subjective comparisonFig. 2 shows the results for the three images in­

painted by the human artists. The artists outperformed all

0 1 2 3 4Subjective Score

Deep Image PriorShift­Net

Globally and Locally ConsistentHigh­Resolution Inpainting

Partial ConvolutionsExemplar­Based (patch size 13)

Statistics of Patch OffsetsGenerative Inpainting (ImageNet)

Content­Aware FillGenerative Inpainting (Places2)

Ground Truth

Overall (33 images)

Ground Truth Classical Method Deep Learning­Based Method

Fig. 3. Subjective­comparison results for 33 images inpainted using automatic methods.

Artist #1 Statistics of Patch Offsets [7]

Fig. 4. Comparison of inpainting results from Artist #1 and statistics of patch offsets [7] (preferred in the

subjective comparison).

automatic algorithms, and out of the deep­learning­based methods, only generative image inpainting [32] outper­formed the classical inpainting methods.

The individual results for each of these three images ap­pear in Fig. 5. In only one case did an algorithm beat an artist: statistics of patch offsets [7] scored higher than one artist on the “Urban Flowers” photo. Fig. 4 shows the respective results. Additionally, for the “Splashing Sea” photo, two artists actually “outperformed” the original im­age: their results turned out to be more realistic.

We additionally performed a subjective comparison of various inpainting algorithms among the entire 33­image test set, collecting 3969 valid pairwise judgements across 147 participants. The overall results appear in Fig. 3.

Page 3: PerceptuallyMotivatedMethodforImageInpaintingComparison ...ceur-ws.org/Vol-2485/paper30.pdfEach testimage was512× 512pixelswith a square hole in the middle measuring 180×180 pixels.W

0 1 2 3 4 5 6Subjective Score

Deep Image PriorShift­Net

Globally and Locally ConsistentPartial Convolutions

Exemplar­Based (patch size 9)High­Resolution Inpainting

Generative Inpainting (ImageNet)Exemplar­Based (patch size 13)Generative Inpainting (Places2)

Content­Aware FillArtist #1

Statistics of Patch OffsetsArtist #2Artist #3

Ground Truth

Urban Flowers

0 1 2 3 4 5Subjective Score

Shift­NetGlobally and Locally Consistent

Deep Image PriorPartial Convolutions

High­Resolution InpaintingStatistics of Patch Offsets

Content­Aware FillGenerative Inpainting (ImageNet)

Generative Inpainting (Places2)Exemplar­Based (patch size 13)Exemplar­Based (patch size 9)

Artist #1Ground Truth

Artist #2Artist #3

Splashing Sea

0 1 2 3 4 5 6Subjective Score

Deep Image PriorStatistics of Patch Offsets

Shift­NetGenerative Inpainting (ImageNet)

High­Resolution InpaintingExemplar­Based (patch size 13)Globally and Locally Consistent

Exemplar­Based (patch size 9)Content­Aware Fill

Generative Inpainting (Places2)Partial Convolutions

Artist #3Artist #1Artist #2

Ground Truth

Forest Trail

Ground Truth Human Artist Classical Method Deep Learning­Based Method

Fig. 5. Results of the subjective study comparing images inpainted by human artists with images inpainted by conventional and deep­learning­based methods.

They confirm our observations from the first comparison:among the deep­learning­based approaches we evaluated,generative image inpainting [32] seems to be the only onethat can outperform the classical methods.

4. Objective inpainting­quality estimation

Using the results we obtained from the subjective com­parison, we evaluated several approaches to objectiveinpainting­quality estimation. In particular, we used theseobjective metrics to estimate the inpainting quality of theimages from our test set and then compared them with thesubjective results. For each of the 33 images, we appliedevery tested metric to every inpainting result (as well asto the ground­truth image) and computed the Pearson andSpearman correlation coefficients with the subjective re­sult. The final value was an average of the correlationsover all 33 test images.

4.1 Full­reference metricsTo construct a full­reference metric that encourages se­

mantic similarity rather than per­pixel similarity, as in [11],we evaluated metrics that compute the difference betweenthe ground­truth and inpainted­image feature maps pro­duced by an image­classification neural network. We se­lected five of the most popular architectures: VGG [22](16­ and 19­layer deep variants), ResNet­V1­50 [8], Incep­tion­V3 [25], Inception­ResNet­V2 [24] and Xception [4].We used the models pretrained on the ImageNet [6] dataset. The mean squared error between the feature maps wasthe metric result.

We additionally included the structural­similarity(SSIM) index [28] as a full­reference metric. SSIM iswidely used to compare image quality, but it falls shortwhen applied to inpainting­quality estimation.

4.2 No­reference metricsWe picked several popular image­classification neural­

network architectures and trained them to differentiate im­ages without any inpainting from partially inpainted im­ages. The architectures included VGG [22] (16­ and 19­

layer deep), ResNet­V1­50 [8], ResNet­V2­50 [9], Incep­tion­V3 [25], Inception­V4 [24] and PNASNet­Large [15].

For training, we used clean and inpainted images based on the COCO [14] data set. To create the inpainted images, we used five inpainting algorithms [5, 7, 10, 29, 32] in eight total configurations.

The network architectures take a square image as an in­put and output the score—a single number where 0 means the image contains inpainted regions and 1 means the im­age is “clean.” The loss function was mean squared error. Some network architectures were additionally trained to output the predicted class using one­hot encoding (similar to binary classification); the loss function for this case was softmax cross­entropy.

The network architectures were identical to the ones used for image classification, with one difference: we al­tered the number of outputs from the last fully connected layer. This change allowed us to initialize the weights of all previous layers from the models pretrained on ImageNet, greatly improving the results compared with training from random initialization.

For some experiments we tried using the RGB noise features [34] and the spectral weight normalization [18].

In addition to the typical validation on part of the data set, we also monitored correlation of network predictions with the subjective scores collected in Section 3. We used the networks to estimate the inpainting quality of the 33­image test set, then computed correlations with subjective results in the same way as the final comparison. The train­ing of each network was stopped once the correlation of the network predictions with the subjective scores peaked and started to decrease (possibly because the networks were overfitting to the inpainting results of the algorithms we used to create the training data set).

4.3 ResultsFig. 6 shows the overall results. The no­reference

methods achieve slightly weaker correlation with the subjective­evaluation responses than do the best full­reference methods. But the results of most no­reference methods are still considerably better than those of the

Page 4: PerceptuallyMotivatedMethodforImageInpaintingComparison ...ceur-ws.org/Vol-2485/paper30.pdfEach testimage was512× 512pixelswith a square hole in the middle measuring 180×180 pixels.W

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9PNASNET­Large

ResNet­V2­50 (RGB noise)Inception­V4

Inception­ResNet­V2 (spec. norm.)VGG­16

Inception­V4 (two­class)ResNet­V2­50ResNet­V1­50

ResNet­V1­50 (spec. norm.)VGG­16 (spec. norm.)

SSIMInception­V3 (two­class)

ResNet­V1­50 (RGB noise)Inception­ResNet­V2 (two­class)

Inception­V3 (RGB noise)Inception­V3

Inception­V3 (spec. norm.)Inception­ResNet­V2 (conv_7b_ac)

VGG­16 (block5_conv1)Inception­V4 (spec. norm.)

Inception­ResNet­V2Inception­ResNet­V2 (conv_7b)

Xception (block14_sepconv2)VGG­16 (block5_pool)

Inception­V3 (mixed10)Xception (block14_sepconv2_act)

VGG­19 (block5_pool)VGG­16 (block5_conv2)VGG­19 (block5_conv4)

ResNet­V1­50 (res5c_branch2c)VGG­16 (block5_conv3)

Pearson

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9PNASNET­Large

SSIMInception­ResNet­V2 (spec. norm.)

ResNet­V2­50 (RGB noise)VGG­16

ResNet­V1­50 (spec. norm.)ResNet­V1­50ResNet­V2­50

Inception­V4Inception­V3

VGG­16 (spec. norm.)Inception­V3 (spec. norm.)

Inception­V4 (two­class)ResNet­V1­50 (RGB noise)

VGG­16 (block5_conv1)Inception­V3 (two­class)

Inception­ResNet­V2 (two­class)Inception­ResNet­V2

Inception­V3 (RGB noise)Inception­V4 (spec. norm.)

Inception­V3 (mixed10)Xception (block14_sepconv2)

Inception­ResNet­V2 (conv_7b_ac)VGG­16 (block5_conv2)

VGG­19 (block5_pool)VGG­16 (block5_conv3)

ResNet­V1­50 (res5c_branch2c)VGG­19 (block5_conv4)

Xception (block14_sepconv2_act)Inception­ResNet­V2 (conv_7b)

VGG­16 (block5_pool)Spearman

Full­Reference No­Reference

Fig. 6. Mean Pearson and Spearman correlations between objective inpainting­quality metrics and subjective human comparisons. The error bars show the standard deviations.

full­reference SSIM. The best correlation among the no­referencemethods came from the Inception­V4model withspectral weight normalization.

It is important to emphasize that we did not train thenetworks to maximize correlation with human responses.We trained them to distinguish “clean” images from in­painted images, yet their output showed good correlationwith human responses. This confirms the observationsmade in [33] that deep features are good for modelling hu­man perception.

5. Conclusion

We have proposed a number of perceptually moti­vated no­reference and full­reference objective metrics forimage­inpainting quality. We evaluated the metrics by cor­relating themwith human responses from a subjective com­parison of state­of­the­art image­inpainting algorithms.

The results of the subjective comparison indicate thatalthough a deep­learning­based approach to image inpaint­ing holds the lead, classical algorithms remain among thebest in the field.

We achieved good correlation with the subjective­comparison results without specifically training ourproposed objective quality­evaluation metrics on thesubjective­comparison response data set.

6. Acknowledgement

This work was partially supported by Russian Founda­tion for Basic Research under Grant 19 01 00785 a.

7. References

[1] https://research.adobe.com/project/content­aware­fill/.

[2] J. H. Bappy, A. K. Roy­Chowdhury, J. Bunk,L. Nataraj, and B. S. Manjunath. Exploiting spatialstructure for localizingmanipulated image regions. InThe IEEE International Conference on Computer Vi­sion (ICCV), Oct 2017.

[3] R. A. Bradley and M. E. Terry. Rank analysis of in­complete block designs: I. the method of paired com­parisons. Biometrika, 39(3/4):324–345, 1952.

[4] F. Chollet. Xception: Deep learning with depthwiseseparable convolutions. In The IEEE Conference onComputer Vision and Pattern Recognition (CVPR),July 2017.

[5] A. Criminisi, P. Pérez, and K. Toyama. Region fill­ing and object removal by exemplar­based image in­painting. IEEE Transactions on Image Processing,13(9):1200–1212, 2004.

[6] J. Deng, W. Dong, R. Socher, L.­J. Li, K. Li, andL. Fei­Fei. Imagenet: A large­scale hierarchical im­age database. In 2009 IEEE conference on computervision and pattern recognition, pages 248–255, 2009.

[7] K. He and J. Sun. Statistics of patch offsets for imagecompletion. In European Conference on ComputerVision, pages 16–29. Springer, 2012.

[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep resid­ual learning for image recognition. In The IEEE Con­ference on Computer Vision and Pattern Recognition(CVPR), June 2016.

Page 5: PerceptuallyMotivatedMethodforImageInpaintingComparison ...ceur-ws.org/Vol-2485/paper30.pdfEach testimage was512× 512pixelswith a square hole in the middle measuring 180×180 pixels.W

[9] K. He, X. Zhang, S. Ren, and J. Sun. Identity map­pings in deep residual networks. In European confer­ence on computer vision, pages 630–645, 2016.

[10] S. Iizuka, E. Simo­Serra, and H. Ishikawa. Globallyand locally consistent image completion. ACMTrans­actions on Graphics (ToG), 36(4):107, 2017.

[11] J. Johnson, A. Alahi, and L. Fei­Fei. Perceptual lossesfor real­time style transfer and super­resolution. InEuropean conference on computer vision, pages 694–711. Springer, 2016.

[12] H. Li, G. Li, L. Lin, H. Yu, and Y. Yu. Context­awaresemantic inpainting. IEEE Transactions on Cybernet­ics, 2018.

[13] H. Li, W. Luo, X. Qiu, and J. Huang. Imageforgery localization via integrating tampering pos­sibility maps. IEEE Transactions on InformationForensics and Security, 12(5):1240–1252, 2017.

[14] T.­Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona,D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoftcoco: Common objects in context. In European con­ference on computer vision, pages 740–755, 2014.

[15] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua,L.­J. Li, L. Fei­Fei, A. Yuille, J. Huang, and K. Mur­phy. Progressive neural architecture search. In TheEuropean Conference on Computer Vision (ECCV),September 2018.

[16] G. Liu, F. A. Reda, K. J. Shih, T.­C. Wang, A. Tao,and B. Catanzaro. Image inpainting for irregularholes using partial convolutions. In The EuropeanConference on Computer Vision (ECCV), 2018.

[17] P. Liu, X. Qi, P. He, Y. Li, M. R. Lyu, and I. King.Semantically consistent image completion with fine­grained details. arXiv preprint arXiv:1711.09345,2017.

[18] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida.Spectral normalization for generative adversarial net­works. arXiv preprint arXiv:1802.05957, 2018.

[19] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, andA. A. Efros. Context encoders: Feature learning byinpainting. In The IEEE Conference on Computer Vi­sion and Pattern Recognition (CVPR), June 2016.

[20] C.­M. Pun, X.­C. Yuan, and X.­L. Bi. Image forgerydetection using adaptive oversegmentation and fea­ture point matching. IEEE Transactions on Informa­tion Forensics and Security, 10(8):1705–1716, 2015.

[21] R. Salloum, Y. Ren, and C.­C. J. Kuo. Image splicinglocalization using amulti­task fully convolutional net­work (mfcn). Journal of Visual Communication andImage Representation, 51:201–209, 2018.

[22] K. Simonyan and A. Zisserman. Very deep convo­lutional networks for large­scale image recognition.arXiv preprint arXiv:1409.1556, 2014.

[23] Y. Song, C. Yang, Z. Lin, H. Li, Q. Huang, and C. J.Kuo. Image inpainting using multi­scale feature im­age translation. arXiv preprint arXiv:1711.08590, 2,2017.

[24] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A.Alemi. Inception­v4, inception­resnet and the impactof residual connections on learning. In Thirty­First

AAAI Conference on Artificial Intelligence, 2017.[25] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and

Z. Wojna. Rethinking the inception architecture forcomputer vision. In The IEEE Conference on Com­puter Vision and Pattern Recognition (CVPR), June2016.

[26] A. Telea. An image inpainting technique based onthe fast marching method. Journal of Graphics Tools,9(1):23–34, 2004.

[27] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deepimage prior. In The IEEE Conference on ComputerVision and Pattern Recognition (CVPR), June 2018.

[28] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli,et al. Image quality assessment: from error visibilityto structural similarity. IEEE transactions on imageprocessing, 13(4):600–612, 2004.

[29] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan. Shift­net: Image inpainting via deep feature rearrangement.In The European Conference on Computer Vision(ECCV), September 2018.

[30] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, andH. Li. High­resolution image inpainting using multi­scale neural patch synthesis. In The IEEE ConferenceonComputer Vision and Pattern Recognition (CVPR),July 2017.

[31] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S.Huang. Free­form image inpainting with gated con­volution. arXiv preprint arXiv:1806.03589, 2018.

[32] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S.Huang. Generative image inpainting with contextualattention. In The IEEE Conference on Computer Vi­sion and Pattern Recognition (CVPR), June 2018.

[33] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, andO.Wang. The unreasonable effectiveness of deep fea­tures as a perceptual metric. In The IEEE ConferenceonComputer Vision and Pattern Recognition (CVPR),June 2018.

[34] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis.Learning rich features for image manipulation detec­tion. In The IEEE Conference on Computer Visionand Pattern Recognition (CVPR), June 2018.

[35] X. Zhu, Y. Qian, X. Zhao, B. Sun, and Y. Sun. Adeep learning approach to patch­based image inpaint­ing forensics. Signal Processing: Image Communica­tion, 67:90–99, 2018.