classification of digital cameras based...
TRANSCRIPT
CLASSIFICATION OF DIGITAL CAMERAS BASED ON DEMOSAICING ARTIFACTS
Sevinc Bayram a, Husrev T. Sencar b, Nasir Memon b
E-mail: [email protected], [email protected], [email protected] Dept. of Electrical and Computer Eng., Polytechnic University, Brooklyn, NY, USA
b Dept. of Computer and Information Sci., Polytechnic University, Brooklyn, NY, USA
ABSTRACTIn this paper, we consider using traces of demosaicing operation in digital cameras to identify the
source digital camera-model of an image. For this purpose, we employ two methods and define a
set of image characteristic. The first method detects simple forms of demosaicing based on
extracting periodicity features, and the second method tries to estimate the demosaicing
parameters assuming a linear model. The features obtained from the two methods are used to
design classifiers to distinguish digital cameras. To determine the reliability of the designated
image features in differentiating the source camera-model, we consider both images taken under
similar settings at fixed sceneries and images taken under independent conditions. We provide
experimental results and show reasonable accuracy on identifying source among two, three, four
and five camera models.
INTRODUCTION
In the analog world, an image (a photograph) has generally been accepted as a proof of
occurrence of the depicted event. In today's digital age, very sophisticated and low-cost hardware
and software tools that enabled easy creation, distribution and modification of digital images are
widely available and can be used easily. This trend has brought with it new challenges
concerning the integrity and authenticity of digital images. As a consequence, we can no longer
take the authenticity of images, analog or digital, for granted. This is especially true when it
comes to legal photographic evidence. Image forensics, in this context, is concerned with
determining some underlying fact about an image. For example image forensics is the body of
techniques that attempt to provide authoritative answers to questions such as:
• Is this image an original image or was it created by cut and paste operations from
different images?
• Was this image captured by a camera manufactured by vendor X or vendor Y?
• Did this image originate from camera X as claimed? At time Y? At location Z?
• Does this image truly represent the original scene or was it digitally tampered to deceive
the viewer? For example, was this coffee stain actually a blood stain that was re-colored?
• Does this image show a real scenery or is it generated by a computer?
• Was this image manipulated to embed a secret message? That is, is this image a stego-
image or a cover-image.
The above questions are just a few examples of issues faced routinely in investigations by law
enforcement agencies.
One way to provide image authenticity is by the use of digital watermark techniques which
essentially requires that a watermark be embedded during the creation of the image [5].
Motivated by this concern, today, some manufactures offer digital cameras with watermarking
capabilities. For example, Epson is offering an image authentication system (IAS). The IAS
software, when uploaded to the camera, enables adding invisible watermarks to images to detect
tampering [19]. Similarly, Kodak DC-290 offers a digital camera which has built-in capabilities
to add watermarking capabilities. In [17], the authors further extended these approaches to enable
creating a link with the photographer by also embedding photographer’s iris image (obtained
through the viewfinder). The embedded data can be later extracted to verify the image integrity,
establish the image origin and verify the image authenticity (identify the camera and the
photographer). On the other hand, it is a fact that the overwhelming majority of images captured
today do not contain a digital watermark. Therefore, watermarking cannot be offered as a general
solution to the problem of image forensics. Consequently, to determine origin, veracity and
nature of digital images, alternative approaches need to be considered. The setting of this
problem is further complicated by the requirements that the methods should require as little as
possible prior knowledge on the digital camera and the actual conditions under which the image
has been captured. At the present time, there is a severe lack of techniques that could achieve
these goals.
The past few years have seen a growth of research on image forensics. The work has focused
mainly on two types of problems. The first is the source identification problem where the aim is
to determine through what means a given image was generated (e.g., digital-camera, camcorder,
computer graphics, scanner, etc.) and then associating the image with a class of sources that have
common characteristics [14] or matching it to a specific source [1,2,3,4,16]. The second problem
is determining whether a given image has undergone any form of modification or processing after
it was initially captured [13,14,15]. In this paper, we focus on source camera-model identification
problem. That is, given an image can we determine the model of the digital camera that was used
to capture it?
The underlying assumption for the success of our approach is that all images produced by a
digital camera will exhibit certain characteristics regardless of the captured scene, which are
unique to that camera, due to its proprietary image formation pipeline. (It should be noted that all
digital cameras encode the camera model, type, date, time, and compression information in the
EXIF image header; however, it is not possible to authenticate this information.) For this purpose,
we rely on detecting and classifying effects of a common processing technique in digital cameras,
namely demosaicing, which is central to operation of most digital cameras. Due to its proprietary
design and implementation, demosaicing operation provides a unique opportunity to determine
the source camera-model of a given digital image.
In our initial approach to source camera-model identification problem [1], we utilized a set of
image features based on the premise that every digital camera will exhibit certain characteristics
regardless of the image content. These characteristic include various color processing features,
image quality metrics [6], and higher-order statistics of images [7]. This approach is essentially
based on construction of a classifier that is able to capture the variations in the designated image
features among various digital camera-models. In [2], we exploit the fact that most state-of-the-
art digital cameras employ a single color filter array (CFA) rather than having different filters for
each color band. Our approach in [2] was inspired by the technique proposed by Popescu et al.
which is intended for image tamper detection [8]. The rationale for their technique is that the
process of image tampering very often requires up-sampling operation which in turn introduces
certain artifacts that can be detected by the use of statistical measures. In a similar manner, we
have applied variants of such measures to characterize the specifics of the deployed interpolation
algorithm. Later, in [3] we improved our approach based on the premise that most proprietary
algorithms in smooth image parts will deploy simpler forms of interpolation, and therefore, they
can be captured more effectively (as opposed to busy image parts where interpolation requires
more careful processing). For this purpose, we utilize the results of [9] where the characteristics
of the second order derivative of interpolated signals are analyzed. In this paper, the features
considered in [2] and [3] are combined and tested with image-sets acquired under both similar
circumstances and independently. Results obtained for up to five digital camera-models show the
success of the proposed approach.
The rest of this paper is organized as follows: In Section 2, we briefly describe the image
formation process in digital cameras. Details of our approach to source camera-model
identification by detecting traces of demosaicing operation are provided in Section 3. We present
our experimental results in Section 4 and conclude in Section 5.
2. DEMOSAICING AND IMAGE FORMATION IN DIGITAL CAMERAS
The basic architecture and sequence of processing steps remain very similar in all digital cameras
(despite the proprietary nature of the underlying technology). The basic structure of image
formation pipeline in digital cameras can be illustrated as given in Figure 1. In a digital camera,
the light entering the camera through the lens is first filtered (the most important being an anti-
aliasing filter) and focused onto an array of charge-coupled device (CCD) elements, i.e., pixels.
The CCD array is the main and most expensive component of a digital camera. Each light sensing
element of CCD array integrates the incident light over the whole spectrum and obtains an
electric signal representation of the scenery. Since each CCD element is essentially
monochromatic, capturing color images requires separate CCD arrays for each color component.
However, due to cost considerations, in most digital cameras, only a single CCD array is used by
arranging them in a pattern where each element has a different spectral filter, typically one of red,
green or blue (RGB). This mask in front of the sensor and it is called the color filter array (CFA).
Consequently, each CCD element only senses one band of wave-lengths, and the raw image
collected from the array is a mosaic of red, green and blue pixels. Since human visual system is
more sensitive to green light, CFA patterns typically have more green values than red and blue.
Figure-2 displays two common CFA patterns used in RGB and CMYK color spaces.
Figure 1. The more important stages of camera pipeline
(a) (b)
Figure 2. (a) CFA pattern using RGB values. (b) CFA pattern using CMYK values.
As a result of the use of a CFA, each pixel in the image has only one color component associated
with it. The missing RGB values are calculated based on the neighboring pixels values by an
operation called demosaicing. Demosaicing is essentially a form of interpolation that uses color
information from neighboring pixels to obtain the color value of a pixel. This is carried out by
properly weighting and combining the color values of pixels selected within a window around the
missing value. In practice, interpolation operation is defined by the size of the window and how
the weighting coefficients are determined (interpolation kernel). Although each manufacturer
uses proprietary methods, i.e., kernels with different sizes, shapes and different interpolation
algorithms, demosaicing techniques can be grouped in two classes. The first class includes well-
known interpolation techniques such as nearest neighborhood, bilinear and bicubic interpolation.
These techniques treat all color channels as three independent images and basically just use the
neighbor pixels with different interpolation kernels to calculate the missing color component. In
smooth regions of an image, these single-channel algorithms can provide satisfactory results but
they usually fail in high frequency regions, especially along the edges and leave heavy
interpolation artifacts. Better performance can be achieved using the fact that three color channels
are highly correlated to each other. The second class of algorithms uses these inter-channel
correlations in addition to in-channel interpolations, e.g., edge directed interpolation, constant-
hue-based interpolation, second-order gradients, alias canceling interpolation, homogeneity-
directed interpolation, pattern matching, vector based interpolation, fourier based filtering etc. A
survey of these second-class algorithms can be found in [11].
Following demosaicing, white balancing operation is performed which is basically the process of
removing unrealistic color casts, so that objects which appear white in human visual system are
rendered white in the digital image. This is followed by colorimetric interpretation, and gamma
correction. Gamma correction is necessary to redistribute tonal information to more closely
correspond to the way the human eye perceives brightness as digital cameras perceive light in a
linear fashion but human visual system perceives light in logarithmic fashion. After these, noise
reduction, anti-aliasing and sharpening are performed to avoid color artifacts. At the end the
image is compressed and saved in the memory.
Despite the similarity in their architectures, the processing details at all stages vary widely from
one manufacturer to other, and even in different camera models produced by the same
manufactures. It should also be noted that many components in the image formation pipeline of
various digital cameras, (e.g., lens, optical filters, CCD array) are produced by a limited number
of manufactures. Therefore, due to this overlap, different cameras may exhibit similar qualities,
and this should be taken into consideration in associating image features with the properties of
digital cameras. However, interpolation (demosaicing) algorithm and the design of the CFA
pattern remain proprietary to each digital camera model, and the variations in the interpolated
pixel values can be exploited to classify the images as originating from a certain class of digital
cameras.
3. IDENTFYING TRACES OF INTERPOLATION
Images taken by digital cameras with a single CCD sensor are evidently interpolated using the
CFA and the designated demosaicing algorithms. These algorithms are fine-tuned to prevent
visual artifacts, in forms of over-smoothed edges or poor color transitions, in busy parts of the
images. Therefore, they are highly non-linear, making it strongly dependent on the nature of the
depicted scenery. On the other hand, in relatively smooth parts of the image, these algorithms
exhibit a rather linear characteristic. As a result, interpolation operation is less likely to depend on
the nature of the captured content. Thus, in our analysis we primarily focus on image parts that do
not exhibit significant spatial variation.
The interpolation algorithm, used to estimate the missing color values, introduces correlation
between neighbor pixels. As CFA patterns are typically periodic these correlations will be
periodic. For example consider the Bayer pattern shown in Figure 2-a. To calculate the missing
green values over the red and blue values we can basically use bicubic interpolation which is just
averaging of four nearest neighbors. In this case, one out of two pixels will be correlated to its
neighbor pixels with the same weights of 0.25 and the interpolation kernel will be 3x3. As
another example, let’s assume the case where interpolation kernel is 7x7. For calculating green
values one can use following weighting coefficients; for the 4 nearest neighbors weighting
coefficients can be 0.22, and for the other 12 neighbors it can be 0.01 or for the nearest ones it
can be 0.19 and 0.02 for the others. In our approach we start assuming that each pixel value is
correlated to its neighbors with the associated weighting coefficients and each camera
manufacturer uses different interpolation kernel and/or different weighting coefficients. The crux
of our approach lies on estimating these coefficients and associating them with the digital
camera-model used to capture the images.
To detect and classify the traces of interpolation operation in images we rely on two methods.
The first method is based on analyzing inter-pixel differences [9]. The second method is based on
the use of Expectation-Maximization algorithm which analyzes the correlation of each pixel
value to its neighbors [8]. Among the two methods, the former is better suited for extremely
smooth parts of the image where a simpler form of interpolation is deployed. On the other hand,
the latter one can be also applied to less smooth parts of the image where the variation is not
significant. In our analysis, we partition the image into blocks based on the level of smoothness
decided using relative deviation of the variance in each block.
3.1 Use of Second-Order Derivatives
Low-order interpolation introduces periodicity in the variance of the second order derivative of
an interpolated signal which can be used to determine the algorithm the rate of interpolation [9].
In this regard, the method first obtains the second order derivative of each row and averages it
over all rows. To better clarify, let :),(ir denote a row of the image, and let R be the number of
rows and C be the number of columns. The second order derivative of each row can be
computed as follows:
)1,()1,(),(2 −++−= jirjirjirsd r (3.1)
where 11 −<≤ Cj . After obtaining the second order derivatives of each row, the magnitudes of
these rows are averaged together to form a pseudo-variance signal. If the image is interpolated,
this pseudo-variance signal exhibits a periodicity. When observed in the frequency domain the
locations of the peaks of the variance signal reveal the interpolation rate and the magnitude of the
peaks determine the interpolation method. We employed a similar methodology to characterize
the interpolation rate and the method employed by a digital camera.
It should be noted that most digital cameras encode and compress images in JPEG format. Due to
8x8 block coding, the DC coefficients may also introduce peaks in the second-order derivative
implying the presence of some form of interpolation operation at a rate of 8. Therefore, in
detecting the interpolation algorithm, the peaks due to JPEG compression have to be ignored. The
variation in magnitude and indicates that there are differences in the deployed interpolation
algorithm. Therefore, the features extracted from each camera include the location of the (peaks
except for the ones due to JPEG compression), their magnitudes, and the energy of each
frequency component with respect to other frequency components at all color bands.
3.2 Use of Expectation-Maximization Algorithm
The Expectation/Maximization (EM) algorithm consists of two major steps: an expectation step,
followed by a maximization step [10]. The expectation is with respect to the unknown variables,
using the current estimate of the parameters, and conditioned upon the observations. The
maximization step then provides a new estimate of the parameters. These two steps are iterated
until convergence. The EM algorithm generates two outputs. One is a two-dimensional data
array, called probability map, in which each entry indicate the similarity of each image pixel to
one of the two groups of samples, namely, the ones correlated to their neighbors and those ones
that are not, in a selected kernel. The other output is the estimate of the weighting (interpolation)
coefficients which designate the amount of contribution from each pixel in the interpolation
kernel.
Since color filter arrays have more green color values, the red and blue color channels are more
heavily interpolated. Therefore, we employed EM algorithm only on red channel; however, it can
be trivially extended to other channels. Since no a priori information is assumed on the size of
interpolation kernel (which designates the number of neighboring components used in estimating
the value of a missing color component) probability maps and weighting coefficients are obtained
for varying sizes of kernels. Let ),( jip denote the pixel value for a selected color channel. We
assume this pixel value is correlated to its neighbors by given equation:
(3.2)
In this equation N shows the interpolation kernel size and we denote the weighting coefficient
matrix by β . In the expectation step first we assume a random β matrix and using this matrix
we calculate the probability of each pixel correlated to its neighbors. In the maximization step we
estimate new β parameters according to the probability calculated in the expectation step. We
execute these steps iteratively until the convergence of β . When designing our classifiers, we use
these β parameters (weighting coefficients) as features in our classifier.
In our earlier work [2], we extracted features assuming different sizes of interpolation kernels of
sizes 3x3, 4x4 and 5x5 which were subsequently used to design classifiers in distinguishing
camera models. Experimental results showed that the accuracy improves with the increasing
kernel sizes. Therefore, in this paper, we assume 5x5 interpolation kernel. Hence, the EM
∑−=
−−=N
Nmn
mjnipmnjip,
),(*),(),( β
algorithm is used to estimate the weighting coefficients in 5x5 neighborhoods. In [3], we also
showed that combining features which are extracted using second order derivative method and
EM algorithm improves the performance. Based on this, in this work, instead of designing
classifiers for each set of features we design our classifiers for the joint feature set of 78 features
in total.
4. EXPERIMENTAL RESULTS
To test the effectiveness of the proposed features in identifying the source camera-model, we
performed two sets of experiments. First, we tested the success of the proposed method on
images obtained by subsequently photographing the same scenery under similar settings with
different camera-models. In the second set of experiments, we used images taken by digital
cameras with no prior constraint on the scenery and the acquisition settings. In our experiments,
we used a publicly available support vector machine implementation as a classifier [18]. To
reduce the computational complexity to a minimum, each image is portioned into blocks of
100x100 pixels and categorized into three levels based on the degree of variation in the blocks
and features are extracted only from low-activity blocks as described in Sections 3.1 and 3.2.
Also, the sequential forward floating search (SFSS) [20] algorithm is used to reduce the
dimensionality of the feature vector by selecting the most distinguishing features.
In the first part of experiments, we considered 5 distinct camera models: Canon Powershot A80,
Datron DC4300, HP Photosmart 635, Kodak Easyshare LS420 and Sony DSC-P72 with image
sizes 1024x768, 1280x960, 1600x1200, 1752x1168, and 1280x960, respectively. For each
camera, we captured 200 images at default settings. As exemplified in Figure-5, the pictures were
taken from the same scene. This ensures that camera properties are not detected based on textural
attributes of the scenery. Half of the images were used for training in building the classifier. Then
the previously unseen half of the images is used for testing in designed classifiers.
(a) Canon Powershot A80 (b) Datron DC4300 (c) HP Photosmart 635
(d) Kodak Easyshare LS420 (e) Sony Cybershot DSC-P72Figure 5. Sample set of pictures taken by 5 camera models.
We first designed a classifier for discriminating 4 cameras: Canon, Datron, HP and Sony. The
accuracy is measured as 88% and corresponding confusion matrix is as given in Table 1. We
also designed a classifier considering 5 cameras. In this case, detection accuracy is measured to
be 84.8%. The corresponding confusion matrix is provided in Table 2. We can conclude from
Table-2 that Kodak and Canon camera models are likely to use similar interpolation algorithms
for demosaicing as they are misclassified among each other.
Table 1- Confusion table for 4 cameras
PredictedCanon Datron HP Sony
Actual
Canon 84 7 2 6Datron 4 87 1 3
HP 9 6 93 3Sony 3 0 4 88
Table 2- Confusion table for 5 cameras
PredictedCanon Datron HP Kodak Sony
Actual
Canon 73 6 0 10 8Datron 4 88 0 3 1
HP 0 0 96 4 1Kodak 16 4 2 78 5Sony 7 2 2 5 85
In the second part of the experiments, to evaluate the reliability of features in identification of
different camera-models, we collected images taken under independent circumstances. We
consider three cameras: Sony Cybershot DSC-P72, Sony Cybershot DSC-S90 and Canon
Powershot S1 IS, with image sizes 2304x1728, 2304x1728 and 1048x1536, respectively.
Similarly, we obtained 200 images for each camera and used half of them for training our
classifier and the other half for testing purposes. When classifiers are designed to distinguish
between all possible pairs, the accuracies of the classifiers are measured as 100% in all cases. In
the case of identifying the source camera-model among three cameras, the accuracy of the
classifier is measured to be 95.66% with confusion matrix as given in Table 3. We also provide
scatter plots in Figure-6 to show how well we can discriminate these cameras. Figure-6 a and b
displays scattering of two features obtained from Sony P72&Sony S90 and Sony S90&Canon S1,
respectively. Similarly, in Figure-6 c scatter plot for three features are given. In all figures the
cameras are separated into clusters very well.
Table 3- Confusion table for 3 cameras (with different scenes)
PredictedSony P72 Sony S90 Canon S1
Actual
Sony P72 96 0 7Sony S90 0 99 1Canon S1 4 1 92
(a) (b) (c)
Figure-6 The scatter diagrams of features (a) Sony P72 and Sony S90 (b)Sony S90 and Canon S1
(c) Sony P72, Sony S90 and Canon S1
5. DISCUSSION
In this work, we provide a methodology to differentiate different digital camera-models based on
demosaicing artifacts. To determine the reliability of the proposed features in distinguishing
camera-models, we performed two sets of experiments. In the first set of experiments, we design
our classifiers using images taken from the same scene under similar settings. In these
experiments, the accuracy results for identifying 4 and 5 cameras are measured to be 88% and
85%, respectively. These results indicate that even if the scene and camera settings are the same
demosaicing artifacts can be used to detect and classify interpolation algorithms used by different
camera models. In the second set of experiments, we considered a separate set of images taken
under independent conditions with no relation to each other. In this case, when classifying
between 2 and 3 cameras we are able to achieve 100% and 96% accuracy, respectively. These
results show that this methodology can be extended to be used in practical scenarios.
REFERENCES
[1] M. Kharrazi, H. T. Sencar and N. Memon, “Digital Camera Model Identification,” Proc. of
ICIP, 2004.
[2] S. Bayram, H. T. Sencar and N. Memon, “Source Camera Identification Based on CFA
nterpolation,” Proc. of IEEE ICIP, 2005.
[3] S. Bayram, H. T. Sencar, N. Memon, and I. Avcibas, “Improvements on source camera-
model identification based on CFA interpolation”, Proc. of WG 11.9 International Conference
on Digital Forensics, 2006
[4] J. Lucas, J. Fridrich, and M. Goljan, “Determining Digital Image Origin Using Sensor
Imperfections,” Proc. of IS&T SPIE, Vol 5680, 2005.
[5] Special Issue on Data Hiding, IEEE Transactions on Signal Processing, Vol. 41, No. 6, 2003.
[6] I. Avcibas, N. Memon and B. Sankur, “Steganalysis using Image Quality Metrics,” IEEE
Transactions on Image Processing , 2003.
[7] S. Lyu and H. Farid, “Detecting Hidden Messages Using Higher-Order Statistics and Support
Vector Machines,” Proc. of Information Hiding Workshop, 2002
[8] A. Popescu and H. Farid, “Exposing Digital Forgeries by Detecting Traces of Re-sampling,”
IEEE Transactions on Signal Processing, 2004.
[9] A. C. Gallagher, “Detection of Linear and Cubic Interpolation in JPEG Compressed
Images,” Proc. of CRV, 2005.
[10] Todd Moon, ‘The Expectation Maximization Algorithm’, IEEE Signal Processing
Magazine, 1996.
[11] B. K. Gunturk, J. Glotzbach, Y. Altunbasak, R. W. Schafer and R. M. Mersereau,
"Demosaicking: Color filter array interpolation in single chip digital cameras," IEEE Signal
Process. Mag., no. 9, Sep. 2004.
[12] J. Adams, K. Parulski and K. Sapulding, “Color Processing in Digital Cameras,” IEEE
Micro, Vol. 18, No.6, 1998.
[13] S. Bayram, I. Avcibas, B. Sankur and N. Memon, “Image Manipulation Detection,”
Accepted for publication in Journal of Electronic Imaging, 2006.
[14] Farid, H. and S. Lyu, “Higher-order Wavelet Statistics and their Application to Digital
Forensics,” IEEE Workshop on Statistical Analysis in Computer Vision, Madison, Wisconsin,
June 2003.
[15] J. Fridrich, D. Soukal and J. Lukas, “Detection of copy-move forgery in digital images”, In
Proceedings of DFRWS, 2003.
[15] O. Celiktutan, B. Sankur. I. Avcıbaş and N. Memon, “Source cell phone identification”,
Thirteenth International Conference on Advanced Computing & Communication - ADCOM
2005 of ACS, 2005.
[17] P. Blythe and J. Fridrich, "Secure Digital Camera" in Digital Forensic Research Workshop,
2004.
[18] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,” 2001, Software
available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[19] Software document is available at http://files.support.epson.com/pdf/ias___/ias___u1.pdf
[20] P. Pudil, F. J. Ferri, J. Novovicov and J. Kittler, “Floating search methods for feature
selection with nonmonotonic criterion functions”, In Proceedings of the 12th ICPR, Vol. 2, pp.
279-283, 1994.