distinguishing photographic images and photorealistic computer graphics using visual vocabulary on...

Distinguishing Photographic Images and

Photorealistic Computer Graphics Using Visual Vocabulary on Local

Image Edges

Rong Zhang,Rand-Ding Wang, and Tian-Tsong Ng

23/4/18

Definition

photographic images (PIM): generated from natural scene by digital imaging tools, also called as natural images.

photorealistic computer graphics (PRCG): created by a variety of rendering software with high photorealism.

Outline

Introduction Data Sets Image Classification Based on

Image Edge Vocabulary Experimental Results Conclusions and Future Work

Introduction

Acquisition of PIM

Introduction

Creating of PRCG By skilled artists or professional programmers Using artificial models Virtual scene Simplified generation process due to time-cost

and computation complexity

Introduction

We are attempting to identify natural images and photorealistic computer graphics.

Basic idea: To exploit the statistical property of local edge patches in images.

Outline



Data Sets

We collect two data sets, namely PIM data set and PRCG data set, respectively consisting of 1000 PIM and 900 PRCG.

Data Sets

Considerations To explore the essential properties of natural

images, we don’t take those images from Internet, which may undergo various unknown post-processing and compression at various quality factors.

We collect all images in PIM set with high quality JPEG format from 8 consumer-end cameras and without any experience of post-process outside the cameras.

Data Sets

Considerations PRCG set contains 800 images from Columbia

PRCG data set and 100 CG images with high visual realism from website www.raph.com.

Outline



Image Classification Based on Image Edge Vocabulary

Image Edge Vocabulary It is derived from bag-of-words model in text

categorization and bag-of-visual-words model in visual categorization.

A visual word corresponds to a cluster center and the vocabulary is constructed by a set of cluster centers.

The basic idea of the bag-of-visual-words helps us efficiently capture the significant difference in statistical distribution of geometrical structure of local edge patches between PIM and PRCG.


Presentation of 3×3 Edge Patches Convert color to grayscale. Each patch is regarded as 9-tuple of real number

(log of gray values), i.e. a vector in

Detect

edge

Detect

edge Define neighborhood

blocks around edge points

Define neighborhood blocks around edge points

Randomly select 1000 3×3 local patches

Randomly select 1000 3×3 local patches


Data preprocessing Define contrast ||X||D (D-norm) :

where i ~j represents the 4-connected neighborhoodwhere i ~j represents the 4-connected neighborhood..

ji

jiD xx~

2)(|||| x


Data preprocessing Subtracting the mean and contrast normalizing lead

to a new vector:

D||~||

~

x

xy

wherewhere

9

19

1xx~

i ix


Data preprocessing Make change of basis with 2-dimensional Discrete

Cosine Transform (DCT) basis corresponding to image patches:

v yAwherewhere

2 2 21 2 8, (1/ || || ,1/ || || ,...,1/ || || ) TA B diag e e e

1 8,...,and are contrast normalized non-constant DCT basis.e e


Data preprocessing v is located on 7-sphere in a 8-D Euclidean space:

87 8

1

v : 0,|| v || 1ii

S v

R


Data preprocessing Calculate the angular distance between two points :

1

1

21 2 2( , )

|| 1 |||| 2 ||

x xx x 1-cos<x ,x > 1-

x xd


Construction of Visual Vocabulary 17,520 Voronoi cells with roughly the same size and

efficiently covering the 7–sphere are selected as the sampling point set:

1 2{ , ,..., }, 17520NO o o o N

Where OWhere Oi i is the sampling point in the ith lattice and is a 8-D is the sampling point in the ith lattice and is a 8-D

vector.vector.


Construction of Visual Vocabulary Now, the problem to observe the distribution of data

points on the 7-sphere is converted to the one, which we should calculate the possibilities that data points fall into the corresponding Voronoi tessellations.

we can use the histograms with 17,520 bins to respectively describe the possibility distribution of geometrical structure of edge patches of PIM and PRCG.


Construction of Visual Vocabulary Figure: Probabilities of edge patches in Voronoi cells that

are sorted according to decreasing probability: (left) 200K edge patches from 200 PIM images; (right) 98,599 edge patches from 100 CG images collected by ourselves.

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.005

0.01

0.015

0.02

0.025

Voronoi cells(sorted)

Pro

babi

lity

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Voronoi cells(sorted)

Pro

babi

lity


Construction of Visual Vocabulary We find a smaller set of Voronoi cells can pick up

the majority of the patches for both PIM and PRCG.

We look upon the sampling points corresponding to those Voronoi cells with larger possibility as key sampling points.


Construction of Visual Vocabulary We construct image edge vocabulary based on

these key sampling points.


Image Classification

Outline



Experimental Results

We use the remaining 800 PIM and 800 PRCG images from our data sets to evaluate the proposed method.

The 1,600 images are used for evaluating our method through 10-fold cross-validation.


Image Classification We determine vocabulary size according to different

possibility thresholds. Table: Classification accuracy of different vocabulary sizes

As a performance-cost tradeoff

On the same datasets, the results of Farid’s method


Generalization Capability To verify the generalization capability, we remove all (50)

images taken by the Samsung camera from the PIM samples and train the classifier with the remaining 750 PIM images and 800 Columbia PRCG images.

Only one image is incorrectly classified (classification accuracy 98%) with the proposed method while all images fail to be correctly detected with Farid’s.

This result may indicate that the visual vocabulary based on local edge patch can characterize the general property of a special image source better.

We need more experiments to ensure it.


Compression Attack 1600 images are compressed respectively with quality

factor 90, 70, and 40. Table: Comparative experimental results of the proposed

approach and Farid’s([5]) on datasets with different JPEG compression factors.

Outline



Conclusions and Future Work Conclusions

we have proposed a new approach of PIM and PRCG classification based on the idea of bag-of-visual-words.

By projecting the image patch data onto a 7-dimensional sphere with a series of transforms, we observe the distribution of data points in individual Voronoi lattice.

Then, visual vocabulary is constructed through determining the key sampling points corresponding to Voronoi cells.

And then, a given image is represented as a histogram of visual words.

Finally, we employ SVM classifier.

Conclusions and Future Work Conclusions

Our experimental results demonstrate the efficient discrimination of the features.

It is revealed that the intrinsic difference between PIM and PRCG may be captured by the geometry structure of local edge patches.

Our conclusions is of great significance for digital image forensic as well as photorealism evaluation for computer graphics.

Conclusions and Future Work Future work

To modify the proposed method, we are considering a closer analogy to document retrieval.

We wish to make sense of the visual vocabulary set. We have had attempts to evaluate the generalization and

resistance to compression of the proposed approach. More experiments are being done on a wider range of images.

Thank [email protected]@nbu.edu.cn

[email protected]@nbu.edu.cn

[email protected]@i2r.a-star.edu.sg

distinguishing photographic images and photorealistic computer graphics using visual vocabulary on...

Documents