ieee transactions on multimedia, vol. 12, no. 7, …hklee.kaist.ac.kr/publications/ieee trans. on...

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER 2010 605

Digital Cinema Watermarking for Estimatingthe Position of the PirateMin-Jeong Lee, Kyung-Su Kim, and Heung-Kyu Lee

Abstract—Many illegal copies of digital video productions forcinema release can be found on the Internet before their officialrelease. During the illegal copying of cinema footage, compositegeometric distortions commonly occur due to the angle of thecamcorder relative to the screen. We propose a novel videowatermarking based on spread spectrum way that satisfies therequirements for protecting digital cinema. It enables the detectorto not only extract the embedded message but also estimate theposition where the camcorder recording is made. It is sure thatthe proposed position estimating model (PEM) can judge the seatin a theater with a mean absolute error (MAE) of (33.84, 9.53,50.38) cm. Experimental results using various types of films showthat the presented method provides the mathematical model fordetecting and investigating the position of the pirate.

Index Terms—Digital cinema, in-theater piracy, local auto-cor-relation function, video watermarking.

I. INTRODUCTION

M ANY illegal copies of movies can be found on theInternet or on the street markets before their official

release. Over 90% of these copies were made by recordingfilms with camcorders in movie theaters [1]. These illegalacts inflict a great loss on motion picture industries. Manyentertainment companies use copy protection technologies ascountermeasures against illegal recording. Moreover, a digitalcinema system, which uses of digital technology to distributeand project motion pictures, is published and has come intowide use. Digital Cinema Initiatives (DCI) defines a forensicmarking system in their standards in order to protect their copy-rights [2]. Digital watermarking technology seems to match therequirements of both conventional film and digital cinema interms of copyright protection.

According to the specifications of DCI, the forensic markdata payload should contain the following information aboutmovie playback: time stamp and location information. It worksfor warning the designated theater against camcorder piracy andpreventing the piracies. However, only identifying when and

Manuscript received September 30, 2009; revised March 02, 2010 and May10, 2010; accepted July 10, 2010. Date of publication August 05, 2010; date ofcurrent version October 15, 2010. This work was supported by NRL (NationalResearch Lab) program through the National Research Foundation of Koreafunded by the Ministry of Education, Science and Technology (No. R0A-2007-000-20023-0). The associate editor coordinating the review of this manuscriptand approving it for publication was Dr. Alex C. Kot.

M.-J. Lee and H.-K. Lee are with the Department of Computer Science,Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea(e-mail: [email protected]).

K.-S. Kim is with the Network Security Research Team, KT Network R&DLaboratory, Daejeon 305-701, Korea .

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMM.2010.2061221

Fig. 1. Scenario for pirate identification. (a) Illegal capturing of the wa-termarked movie in the theater. (b) Position estimation through watermarkdetection.

where the illegal recording happens is not sufficient to the orig-inal purpose of copyright protection and traitor tracing. It isbetter to identify the pirate or limit the number of pirate sus-pects. A scenario for the purpose of identifying the pirate pro-posed in [3] is considered. Fig. 1 describes the scenario in detail.The pirate records the movie during playback in the theater. Inthe movie, the information about when and where the movieis playing is embedded as a watermark by using our proposedscheme. The pirate illegally puts the captured movie in circula-tion on the Internet. Then, the forensic marking system finds theillegally captured movie in the Internet and tries to find out thepirate.

Our proposed detection scheme extracts the watermark anddecodes the embedded watermark as the time and the theater.Also, another advantage of our scheme can estimate the approx-imate position of the pirate. Then it needs to match each seatwith its corresponding person. The extracted information fromthe embedded watermark determines when and where the pirateis made, and also it helps to match the persons who illegallyrecorded a movie to the databases stored in the electronic ticketoffices or in payment system. Finally, the number of piracy sus-pects is restricted and it helps to find out the pirate.

In order to realize this scenario, some facts should be sup-posed. First, the watermark embedding process has to be donein real-time. Second, the embedded watermark must survive tocamcorder piracy. The illegal copies were made by recordingthe projected movie at various angles, according to the locationof the pirate. They are suffered from composite geometric dis-tortions including translation, rotation, scaling, and perspectiveprojection. Therefore, the watermarking system should have ro-bustness to arbitrary geometric distortions and signal processingattacks that are accompanied with camcorder piracy.

1520-9210/$26.00 © 2010 IEEE

606 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 7, NOVEMBER 2010

The main contributions of this paper are as follows. Ourscheme has robustness to arbitrary geometric distortions.Especially, the robustness to projective distortion is one ofadvantages of our scheme compared to the conventional water-marking techniques for digital cinema [4]–[6]. The robustnessto geometric distortions is achieved by estimating the distor-tions. The distortion estimation method also can be used forposition estimation with the position estimating model (PEM);thus, the correct estimation of geometric distortion is essential.

The remainder of this paper is organized as follows. InSection II, previous watermarking schemes and positionestimation techniques are addressed. Section III introduces wa-termark design and embedding procedure. Section IV describeswatermark detection procedure based on local auto-correlationfunction (LACF) and position estimating model. Experi-mental results are given in Section V. Section VI presents ourconclusions.

II. RELATED WORK

A. Geometric Distortions Resilient Watermarking

Many watermarking schemes have been designed to resistgeometric distortions in still images using such methods as in-variant transforms [7], [8], image features [9], [10], templateinsertion [11], and periodical sequences [12]–[14]. These ap-proaches are robust against rotation, scaling, translation (RST)distortions, which are called affine transforms. However, theseimage watermarking algorithms mentioned above could not bedirectly applied to video watermarking applications because theillegal copies were suffered from not only affine transforms butarbitrary geometric distortions. The arbitrary geometric distor-tions can be illustrated as composite perspective distortions.

Several papers addressed watermarking for digital cinema.Leest et al. [4] proposed a video watermarking scheme that ex-ploits the temporal axis to embed the watermark by changingthe luminance value of each frame, thereby achieving robustnessagainst geometrical distortions. Since the luminance change be-tween frames may cause a flickering effect, the luminance mod-ulation has to be performed slowly and smoothly. Delannay etal. [5] investigated the restoration of geometrically distortedimages occurred by the camera acquisition angles. The com-pensation of the distortion required both unmodified contentand modified content. Lubin et al. [6] embedded watermarkinto low spatial-temporal frequency domain for invisible, ro-bust, and secure watermarking. To determine spatial-temporalregions of video sequences in the embedding procedure, a vi-sion model-based masking computation was employed but thesepapers did not consider the projective distortions by camcordercapture. Therefore, we should focus on perspective projectionto have robustness to camcorder piracy. Our previous work [15]presented a blind watermarking scheme for digital cinema usingLACF to resist to projective transform.

B. Position Estimation Techniques

There are only a few papers that focus on position estimation.Chupeau et al. [16] proposed a forensic tracking system withoutembedding any mark. Their scheme determined the camcorder

Fig. 2. Watermark embedding procedure.

viewing angle to the screen and derived the approximate posi-tion of the pirate in the theater using feature points. The estima-tion of the eight-parameter homographic model required bothtemporally synchronized source videos and captured videos.Nakashima et al. [3] proposed an audio watermarking systemwhich finds the position of the pirate in the theater using detec-tion strength model. Their position estimation depends on theconstruction of the inside of the theater. That is, the differentnumbers and locations of loudspeakers and microphones maydecrease the accuracy of position estimation. Our previous work[17] introduced a video watermarking scheme that estimates theposition of the pirate using LACF proposed in [15]. The estima-tion model of the previous work was designed for the projec-tions with only one vanishing point such as vertical projection orhorizontal projection. However, projective transform with morethan one vanishing point may occur during camcorder capturein practice. The estimation model of geometric distortion shouldbe designed to estimate composite projective transform.

In this paper, we improve the geometric distortion estimationmethods proposed in [15] and [17] to find out where the piratewas in the theater.

III. WATERMARK EMBEDDING

This section describes how the watermark is designed andembedded in the host video. The watermark pattern is gener-ated and then inserted into the video frames based on spreadspectrum way, taking the human visual system (HVS) into ac-count. Fig. 2 shows the embedding procedure, which is designedto satisfy the requirements for digital cinema security [2].

In the presented scheme, the embedded watermark is used inthree ways: 1) to estimate geometric distortions for recoveringthe watermark pattern, 2) to extract the embedded message fromthe recovered watermark, and 3) to find out the position of thepirate. In order to accomplish these roles, the watermark patternshould have periodicity. The periodicity is obtained by tiling thebasic pattern [18]. The basic pattern for a periodic watermark,which follows a Gaussian distribution with zero mean and unitvariance, is generated using a secret key and then consists of

LEE et al.: DIGITAL CINEMA WATERMARKING FOR ESTIMATING THE POSITION OF THE PIRATE 607

2-D random sequence of size . and de-note the width and the height of the host video, and anddenote the number of repetitions in horizontal and vertical di-rection, respectively. The 2-D basic pattern is then modulated tocontain the bit payload (e.g., serial number of the theater, timestamp). The modulated basic pattern is repeated timesto get the periodicity. Then a periodic watermark pattern of size

is obtained and the pattern is embedded using an addi-tive spread-spectrum method with perceptual scaling. The wa-termark is embedded in a video frame as follows:

(1)

Also, is a global weighting factor and is a localweighting factor of the pixel from HVS. We employ anHVS function to be optimized for real-time embedding, whichadopts eight compass operators as a local weighting function[19]. The compass operator measures gradients in a selectednumber of directions and can reduce computational costs byutilizing its separable property.

IV. WATERMARK DETECTION

The entire detection process performs as follows: 1) estimatethe watermark using a whitening filter, 2) find geometric distor-tions on the estimated watermark pattern, 3) recover the water-mark from the distortions, and 4) extract the embedded message.Fig. 3 describes the process of watermark detection.

A. Preprocessing

Due to the fact that a blind detector is used, the embedded wa-termark is estimated by employing the Wiener filter [20] as a de-noising filter. Subtracting the denoised frame from the capturedframe, we obtain an approximate version of the embedded wa-termark pattern. The Wiener filter estimates the original signalfrom the watermarked frame:

(2)

where and are the local mean and local variance ofthe , respectively. is the noise variance. Since the detectorhas no knowledge about the probability distribution of the noise,the average of the local variances is chosen as . The estimatedwatermark that is yielded by the Wiener filter is given by

(3)

For the purpose of enhancing the energy of the estimated wa-termark, we sum the values of each pixel of the estimated wa-termark from each frame in the series of frames for seconds.

B. Geometric Distortion Estimation

As a result of camcorder capture, the shapes of captured cin-ematic footages in the rectangular frames are generally quad-rangles. The distances and the angles of the scenes are not pre-served and parallel lines do not project to parallel lines unless

Fig. 3. Watermark detection procedure.

they are parallel to the image plane. A rectangle is transformedinto a quadrangle by perspective projection. Letbe the homogeneous vector that represents a point in the orig-inal frame and be the homogeneous vectorthat represents a point in the geometrically distorted frame. Theprojective transformation is a linear transformation on homoge-neous 3-vector represented by a nonsingular 3 3 matrix [21]:

(4)

Note that is a homogeneous matrix, so only the ratio of thematrix elements is significant in the homogeneous representa-tion of a point. There are eight independent ratios among thenine elements of and it follows that a projective transfor-mation has eight degrees of freedom (DOF). Thus, four pairsof point-to-point correspondence in the original and distortedframes are required to determine eight DOF. In this paper, fourcorner points of the video frame are employed for finding thepairs of point-to-point correspondence. The corner points of theoriginal video frame are selected as the original points and thedistorted coordinates of the original corner points taken by thecamcorder are chosen as the corresponding points. Given thatboth the embedder and the detector know the coordinates of fouroriginal points, it is necessary to know the coordinates of onlyfour distorted points. When the movie is recorded by the cam-corder, the corner points of the movie are not always seen inthe pirated copy. Also, they may be removed from the frame by


Fig. 4. Example of LACF on the projected frame.

cropping after camcorder capture. Therefore, the coordinates ofthe distorted corner points should be obtained according to thecomputational way to estimate geometric distortion itself, notthe direct way to find the corner points seen in the frame of thecopy.

1) Local Auto-Correlation Function (LACF): In the pre-vious work [15], local auto-correlation function (LACF) wasemployed for estimating projective transform. It computed theauto-correlation function of two local areas of the image thatare parallel to each other instead of computing auto-correlationof the whole image. The estimation model which uses twoparallel areas for LACF can estimate projective transformswith only one vanishing point. Since the camcorder captureraises projections with more than one vanishing point whichconverts a rectangle to a quadrangle, we employ the estimationmodel which considers simultaneously LACF results of bothtwo vertical local areas and two horizontal areas.

As shown in Fig. 4, all four local areas are needed for LACF:two horizontally parallel local areas and , two verticallyparallel local areas and . The width and the height ofthe video frame denote as and , respectively. The LACFon the estimated watermark pattern is modeled as

(5)

where and are the width and the height of the region ,and are the distance of -axis and -axis from the

upper-left corner point for selecting region for LACF. Thedistance and the size of regions are adaptively selected by thesize of the used basic pattern and lower bounds of projectivedistortion in a practical point of view. Moreover, the calculationof LACF is accelerated by FFT-based fast equation as follows:

(6)

where operator denotes complex conjugation. The LACFyields multiple periodic peaks since the periodic pattern was

Fig. 5. Example of projected image and its LACF results.

embedded as watermark. After that, the local auto-correlationpeak (LACP) is detected by applying an adaptive threshold as

(7)

where and denote the average and standard devi-ation of the LACF, respectively. is a value that is re-lated to the false positive error rate and returnsthe value of the inverse of the complementary error function forthe input. Fig. 5 shows the LACF results of the watermarkedimage undergone arbitrary projective distortions. Four LACFresults show the different angles and intervals; however, the an-gles and intervals between LACPs in the same LACF resultsare equal to each other. To take an accurate measurement of theangle and interval between LACPs in region , we first pick thecoordinate of the LACP that has the maximumvalue of auto-correlation as the datum point. The coordinate

of the LACP that has minimum distancefrom the datum point is chosen, then the angle and the in-terval of region is obtained as follows:

(8)

Based on these four pairs of angles and intervals, we constructthe coordinates of the distorted frame.

2) Calculating Coordinates: Now, it needs to calculate thecoordinates using the intervals ( , , , ) and the an-gles ( , , , ) between LACPs. Note that at least thetransform description of four points is needed to obtain eightDOF of the coefficient matrix of projective transform in (4).Fig. 6 shows the process for obtaining the coordinates of thecorresponding distorted points. The original points , , ,and are known to the detector. First, the relative coordinates

, , , and that compose a quadrangle are calculatedfrom each of the results of LACF. It assumes that the upper-left


Fig. 6. Calculation process of four distorted coordinates.

corner point is located at . Then , , and arecalculated by the following equation:

(9)

where and are horizontal and vertical repetition times ofbasic watermark pattern. Next, the quadrangle istranslated so as to bring the centroid of the corner points to thecenter of the video frame. So the barycentric coordinates ofthe quadrangle are fit onto the center of the originalvideo frame. The translation vector is determined as thedistance between the center coordinates of the original frameand barycentric coordinates of the quadrangle :

(10)

represents the homogeneous coordinates of the center ofthe video frame. Then the translation form of the quadrangle

is given by

(11)

where is a 2 2 identity matrix. The corner points of the trans-lated quadrangle , , , and are finally obtained and thenthey are used to compute the projective matrix in (4).

3) Computing Projective Matrix: Now we compute the co-efficient matrix of projective transform using estimated pairs offour points and the direct linear transformation (DLT) algorithm[21]. The DLT algorithm is a linear algorithm for determining

given a set of four 2-D to 2-D point correspondences between

and for . To derive a linear solution for ,(4) is expressed in terms of the vector cross product as

(12)

The cross product may be given explicitly as

(13)

where denotes the th row of the matrix . Sincefor , this gives a set of three equa-

tions in the entries of , which may be written in this form

where

(14)Although there are three equations in (14), only two of them arelinearly independent since the equations involve homogeneousvectors. Then the set of equations is now written by

where

(15)

Each point correspondence gives two equations in the entries of. Given a set of four point correspondences obtained by the

LACF results, we obtain a set of equations built from the matrixrows contributed from each correspondence:

where (16)

The solution of (16) is the eigenvector of with least eigen-value. The DLT method uses the singular value decomposition(SVD), one of the most useful matrix decompositions, to obtaina solution of (16). Given a non-square matrix , the SVD is afactorization of as follows:

where (17)

where is an orthogonal 8 8 matrix and is an orthogonal9 9 matrix, and is a diagonal matrix which contains thesingular values in descending order. Then is obtained bythe last column of as

(18)

Finally we can get the matrix since every row of is obtainedas .

C. Watermark Extraction

The watermark pattern is recovered from the geometric dis-tortion using the inverse matrix . Due to the fact that thematrix is a nonsingular matrix by the definition of projectivetransformation, the inverse of is always obtained. In orderto calculate the normalized correlation, the basic pattern of size


is generated using a secret key. The referencewatermarks are generated by modulating the basic pattern ac-cording to the bit payload representing 2 bits each. The restoredperiodic watermark is folded so that it is the same size as thebasic pattern. Normalized correlation between the folded wa-termark pattern and the reference watermark patterns canbe calculated so that it can be performed in less time with fastFourier transform (FFT) by

(19)

where means the modulated basic pattern contained the bitpayload . If the normalized correlation exceeds a predefinedthreshold , the bit payload is extracted successfully. Thedecision , which is a role of verifying the existence of thewatermark, is made by

(20)

where stands for the argument of the maximum andis the detection threshold defined by

(21)

where is the average and is the standard deviation ofthe normalized correlation . is a predefined value that isrelated to the false positive error rate and returnsthe value of the inverse of the complementary error function forthe input.

D. Position Estimating Model (PEM)

So far, we estimated geometric distortions and obtained thehomographic matrix in Section IV-B. Using the obtained ,the watermark is extracted by recovering the watermark patternfrom the geometric distortions. The extracted information fromthe embedded watermark determines when and where the movieis captured. Now, it is time to estimate the position of the cam-corder to find the pirate. To estimate the position of the cam-corder, we decompose into the parameters which indicaterotation and translation of the camcorder and camera calibra-tion. Since the zooming operation during camcorder capturinghinders in estimation of distance, the obtained parameters donot represent the position of the camcorder. Instead, the posi-tion vector is computed using the obtained distortion parame-ters. Also, since the information about the theater can be pro-vided after the watermark is extracted, the seating plane whichmeans the slope of seats of a theater is obtained. Therefore, theposition of the camcorder can be calculated by computing theintersection between the seating plane and the position vector.Before decomposing the matrix , the parameters and the ma-trices for camera projection are introduced.

1) Camera Projective Geometry: Let us introduce the no-tation for the world coordinate of the screen represented bya homogeneous 4-vector and for the captured

image coordinate by the camcorder represented by a homoge-neous 3-vector as defined in (4). Then the cameraprojection mapping from world to image coordinates is writtenin terms of matrix multiplication with for the 3 4 homoge-neous camera projection matrix as

(22)

which defines the camera projection matrix for the CCD cam-eras (camcorders) of central projection as

(23)

where is a camera calibration matrix. is a rotation matrixrepresenting the orientation of the camera coordinate frame and

represents the coordinates of the camera center in the worldcoordinate frame. The principal point in terms of pixel dimen-sions is the middle of the scene and image coordinates are mea-sured in pixels; thus, the 3 3 camera calibration matrix ofa CCD camera is given by

(24)

where is a non-square pixel aspect ratio of the CCD cameraand is a focal length of the camcorder. Note that the skewparameter assumes as zero so that the - and - axes are per-pendicular. In general, a nonzero skew might arise as a result ofrecapturing an image of the already captured image. It meansthat capturing the movie may cause a skewing of the pixel ele-ments in the CCD array [22]. However, in this scenario, only thecorner points of the screen are used to estimate the projection.No interior point in the actual scene is used and therefore theassumption of the zero skew is reasonable.

A three-dimensional rotation is a rotation using three anglesof each -, -, and -axes. If the rotations in a clockwise di-rection when looking towards the origin are written in terms ofrotation matrices , , and

(25)

Then the rotation matrix is represented as matrix multiplica-tions:

(26)

Fig. 7 represents the projective geometry in 2-D view. Theprojective geometry consists of a screen and a camcorder. Theorigin of the theater is set to the position which the distance is

from the screen. The location of the camcorder is describedby shifting as and rotating as as explained in (23). It is


Fig. 7. 2-D view of projective geometry. (a) Top view. (b) Side view. The origin of the theater is set to the position which the distance is � from the screen.�means a rotation representing the orientation of the camera coordinate frame and� represents the coordinates of the camera center in the world coordinate frame.

Fig. 8. 2-D view of redefined projective geometry. (a) Top view. (b) Side view. The redefined geometry is regarded as screen displacement instead of camcordermovement. It is assumed that the screen is translated as (� , � , � ) and rotated as� with (� , � , � ). Then it assumes that the camcorder is located at the origin.Note that since the center of the screen is moved on the screen plane, � is equal to zero.

a trivial solution to obtain the camera projection matrix anddecompose it into , , and . However, we only have the ho-mographic matrix . Also, the camera calibration matrix isnot known since we do not know about the camcorder whichis used for the piracy. Moreover, the zooming effect when thecamcorder is capturing the movie makes the estimated distancefrom the origin not to be trusted. Thus, it is not possible to dis-tinguish whether the zooming effect occurs or the camcordermoves closer to the screen. So we should redefine the projectivegeometry to estimate the position more flexibly.

2) Redefined Projective Geometry: Although the size of themovie is varied according to the various resolution of the cam-corder, the size of an input movie in detection process is unifiedas same as the original one. Also, the zooming effect hinders inestimation of distance. Therefore, the focal length of the cam-corder when the movie is captured is no use; we set the focallength of the camcorder as the same as the distance between thecamcorder and the screen in the redefined geometry. Also,the parameters and the coordinates in the redefined geometry

are changed in pixels rather than performing data normaliza-tion. The most important change in the redefined geometry isto consider screen displacement instead of camcorder displace-ment from the origin. We define displacement of the screen fromthe center of it with a translation ( , , ) and a rotation .Then it is assumed that the camcorder is located at the origin.Note that since the center of the screen is moved along the screenplane, is equal to zero. The projective geometry is redefinedand illustrated in Fig. 8. The equation representing camera pro-jection is changed by reflecting screen displacement of the re-defined projective geometry. in (23) is replaced to ( , ,0)and (23) is consequently changed as

(27)

From Section IV-D3, we use (27) instead of (23) to estimate theposition.


3) Calculating the Position: Since in (23) has 12 entriesand 11 DOFs, it is necessary to have 11 equations to solve for ,at least six point-to-point correspondences are required. How-ever, we have four point-to-point correspondences and the ob-tained homographic matrix has nine entries. In order to obtainthe position of the camcorder with , we assume that the screenis planar. So we omit the -coordinates of the planar screen.Then (22) and (27) are concisely expressed as

(28)

where and are the first and second column of . Since(28) has the same form as (4), we decompose the matrix in(4) as follows:

(29)

Finally, we get nine equations from (29) as shownin (30) at the bottom of the page. The term

from is used fornormalization by dividing all entries. If is set to be verylarge, the term is convergedto since the right termis remarkably small compared to . By eliminating thedenominator and the numerator , four equations in (30) arerewritten to estimate , , , and as follows:

(31)

Four equations with four unknowns , , , and in (31)can be solved. Using , , , and , and are obtainedby solving the following two equations:

(32)

Since and are in pixels, it needs to change to the unit ofworld coordinate system such as centimeters. The actually mea-sured size of the screen is used in this conversion:

(33)

where stands for the height of the screen in centimeters andstands for the height of the captured movie in pixels. Fi-

nally, position vector is computed by actual measurement asfollows:

(34)

Then the seating plane can be determined according to the con-struction of interior building. We can get the information aboutthe theater from the extracted watermark message. The equa-tion of the seating plane representing in - plane is defined asfollows:

(35)

(30)


Fig. 9. Snapshot examples of test videos.

TABLE IEXPERIMENTAL PARAMETERS FOR LACF

where stands for the distance when the -coordinate of theseating plane equals zero and is the slope of theater seats.Finally, the estimated position is obtained by calculating an in-tersection between (34) and (35) (see Figs. 11–13). Note thatthe origin point of the screen is reset to (0, 0, 0) not (0, 0, )and the -axis changes to follow Cartesian axes format, whichmeans -axis is now with values increasing from bottom to top.

V. EXPERIMENTAL RESULTS

On HD-resolution clips of digital cinema as shown in Fig. 9,we evaluate 1) fidelity of watermark embedding, 2) robustnessof watermark detection, and 3) accuracy of PEM against cam-corder capture attack. A 40-bit payload was embedded into each5-min clip to adhere to the specifications of DCI [2]. A 2-D basicpattern whose size is 120 120 is generated and modulated tocontain 2 bits of payloads. Then the modulated pattern is tiled 16times to horizontal axis and nine times to ver-tical axis. Thus, the watermark pattern is formed in dimensionsof 1920 1080 and is embedded in the entire frame. To insertthe 40-bit payload into the test clips, a set of 20 differently mod-ulated patterns is required. In detection process, the detector ac-cumulates the estimated watermark pattern for one-second pe-riod in order to enhance the energy of the watermarked signal.For the LACF, the parameters in (5) are set as Table I. in(7) and in (21) are set to 4.75 and 6.00 to haveand , respectively. This allows us to be confident thatthe specified error rate will not be exceeded.

We settled two places for the testing, one seminar room forsmall-scale test and one auditorium for large-scale test. Theseminar room is about 5 m wide and 6 m long with a screen

Fig. 10. Comparison of two different sized test environments (a seminar roomfor small-scale test and an auditorium for large-scale test). (a) Top view of small-scale environment. (b) Top view of large-scale environment. (c) Side view ofsmall-scale environment.(d) Side view of large-scale environment. The seminarroom is about 5 m wide and 6 m long with a screen of size 2.20 m � 1.24m. Since the seminar room has no height difference between seating rows, theparameters � and � in (34) for seating plane are determined as about�� and 0.4636 radians, respectively. The auditorium is about 18 m wide and 21 mlong with 286 fixed seats and a screen of size 4 m� 2.25 m. Since the auditoriumhas its own slope between seating rows, the seating plane of large-scale test isdetermined by measuring the construction of the auditorium.

which has a width of about 2.20 m. So HD-resolution clips pro-jected on the screen are shown as 2.20 m and 1.24 m in the hor-izontal and vertical directions, respectively. Since the seminarroom has no height difference between seating rows, the seatingplane of small-scale test should be determined according to thesize of the seminar room. In this case, the parameters andin (35) are determined as about and 0.4636 radians, re-spectively—see Fig. 10(a) and (c). The auditorium is about 18 mwide and 21 m long with 286 fixed seats. The width of the screenin the auditorium is about 4 m so that the movie clips projectedon the screen are shown as 4 m and 2.25 m in the horizontaland vertical directions, respectively. Since the auditorium has itsown slope between seating rows, the seating plane of large-scaletest is determined by measuring the construction of the audi-torium—see Fig. 10(b) and (d). Seventeen differently locatedseats were selected for movie capture for both small-scale testand large-scale test.

A. Fidelity Evaluation

We assessed the visual quality of the watermarked videosby using objective and subjective quality measures. The peaksignal-to-noise ratio (PSNR) is used as the objective measure.After watermark embedding, the average PSNR value for testvideos was 46.0 dB. In general, PSNR values above 45 dBon HD-resolution videos guarantee no perceptual differencesfrom the original ones. Also, we conducted the subjective fi-delity evaluation as described in [6]. Clips were projected onto a


wide screen using an EPSON EMP-TW1000 projector. The pro-jected clips were about 2.20 m and 1.24 m in the horizontal andvertical directions, respectively. Five expert observers partici-pated in a two-alternative, forced-choice experiment in whicheach trial consisted of two presentations of the same clip, oncewith and once without the watermark present. Observers viewedthe screen from two picture heights and were asked to indi-cate which clips contained the watermark. Each video clip wasplayed four times in each trial. Each trial lasted 5 min. No ob-server could determine the identity of the watermarked clip withcertainty in any case.

B. Robustness Evaluation

Robustness against camcorder capture, which included notonly geometric distortions such as rotation, scaling, translation,and projection but also signal processing distortions such asformat conversion, frame rate change, gamma correction and soon, was tested. Clips were projected on the screen with two dif-ferent environments, small-scale test in the seminar room andlarge-scale test in an auditorium. The clips were captured with atripod-mounted SONY HDR-FX1 camcorder. To assess the per-formance of our scheme, normalized correlation is employed asa measure of robustness. Also, the concept of “extracting rate”is introduced as another measure of robustness. The extractingrate is given by the percentage of the number of correctly ex-tracted frames to the total number of video frames.

The results for the extracting rate and normalized correlationare summarized in Table II. The average values of extractingrates and normalized correlations for small-scale test are 0.75and 0.28, respectively, and the average values of extracting ratesand normalized correlations for large-scale test are 0.91 and0.34, respectively. The extracting results at the locations “b”,“c”, and “g” have lower correlation values compared to otherlocations (see Fig. 14). It means that horizontal shifting of thecamcorder affects stronger geometric distortion to the capturedvideos since the width of the HD-video is almost twice widerthan the height of that. Also, in the case of farther distance fromthe screen to the camcorder, the performance of watermark de-tection is lower than closer ones. So we conducted additionalexperiment to analyze the effect of horizontal distortion on thecaptured videos. We experimented on five more test sampleswith various positions on -axis while the -values are set toapproximately zero and the test samples have nearly the same

-values (see Fig. 16). The results for the extracting rate andnormalized correlation are summarized in Table III. The averagevalues of extracting rates and normalized correlations are 0.65and 0.39, respectively. The extracting results at the locations“o” and “q”, whose -coordinates are farther from the screenthan other ones, have lower correlation values. It is proved thathorizontal shifting of the camcorder lowers the performance ofthe proposed scheme. However, in all cases, the 40-bit pay-loads were extracted with 100% reliability, where average cor-relation values were larger than the preset threshold satisfying

. Through experiments, it is proved that the presentedscheme is robust against camcorder capture.

Lastly, we measured the robustness against various attacks inaddition to camcorder capture. The six videos captured in the

TABLE IIEXTRACTING RATE AND NORMALIZED CORRELATION

OF THE CAMCORDER CAPTURED VIDEOS

TABLE IIIEXTRACTING RATE AND NORMALIZED CORRELATION OF THE CAMCORDER

CAPTURED VIDEOS FOR HORIZONTALLY DISTORTED CASES

TABLE IVEXTRACTING RATE AND NORMALIZED CORRELATION OF THE CAMCORDER

CAPTURED VIDEOS AFTER ADDITIONAL ATTACKS

large-scale environment were used for the experiment. The at-tacks to be tested include: 1) geometric distortions such as rota-tion, scaling, and translation; 2) white noise addition; 3) generalfiltering such as median filter, low-pass filter, and sharpeningfilter; and 4) video processing attacks like framerate change andformat conversion. The averaged extracting rate and normalizedcorrelation for the attacked videos are measured and summa-rized in Table IV. According to the table, the proposed scheme isless robust to geometric attacks. It is because the captured videoshave been already geometrically distorted since the videos werecaptured at various positions. That is, the geometric attacks are


Fig. 11. Estimated �-vectors and seating plane of small-scale test: The estimated �-vectors are illustrated as colored dotted lines. The seating plane is determinedaccording to the construction of the interior building and plotted as a dash-dot line. In this case, the parameters � and � in (35) for the seating plane are determinedas about �� and 0.4636 radians, respectively.

Fig. 12. Intersected positions between the seating plane and the estimated �-vectors of small-scale test (enlarged version of Fig. 11): The pentagram mark depictsthe actual position of the camcorder. The black circle represents the estimated positions in �-� plane.

regarded as secondary geometric distortions. Among them, thescaling attacks have the smallest extracting rate compared to theothers because our scheme employs the periodic watermark pat-tern. The correlation calculation depends on the size of one basic

pattern, not the size of the periodic pattern. As the size of the wa-termark signal becomes smaller, the variance of the correlationgets larger and its threshold gets larger. The size of the basic pat-tern used in the experiment is 120 120 for HD video clips. If


Fig. 13. Estimated positions of the camcorder in �-� plane of small-scale test.

Fig. 14. Six actual positions and six estimated positions of the camcorders ofsmall-scale test: the circle boundaries, which are based on the estimated positionthat shows the largest error from the actual ones, are set.

the HD video clip is downscaled to 720 480, the size of basicpattern decreases from 120 120 to about 45 54. Hence, it isobvious that the scaling attack decreases the robustness of theproposed scheme. However, our scheme has better robustness

to white noise addition, general filtering, and video processingattacks.

C. Accuracy Evaluation

To evaluate the estimation accuracy of the proposed mod-eling, we demonstrate the evaluation under camcorder captureattack. As mentioned before, camcorder capture attack inpractice is occurred not only geometric distortions but alsovarious signal processing attacks. Moreover, it is assumed thatthe screen is planar even though the screen is somewhat curved,and we do not consider lens distortion of camcorders as well.These facts make our PEM inaccurate so that this may causeestimation errors.

For testing the accuracy, clips were projected on the screenunder two different environments, small-scale test in the sem-inar room and large-scale test in an auditorium.

1) Small-Scale Testing: Small-scale test was performed in aseminar room of size 5 m 6 m and the size of the screen is2.20 m 1.24 m. In the small-scale test environment, the pa-rameters and in (35) are determined as aboutand 0.4636 radians, respectively, since the seminar room has noheight difference between seating rows. When the camcorderis located, the -position of the camcorder is settled by consid-ering the seating plane with and . Figs. 11–15 show the ex-perimental results of estimating the suspicious seats where thepiracy takes place. Fig. 11 describes the estimated -vectors of


TABLE VSMALL-SCALE ACCURACY TEST RESULT OF POSITION ESTIMATES ON CAMCORDER CAPTURED VIDEOS

Fig. 15. Estimated position of the camcorder of small-scale test in 3-D plots.(a) 3-D view 1. (b) 3-D view 2.

six differently captured videos and the seating plane and Fig. 12shows the enlarged versions of Fig. 11. The estimated position isobtained by calculating an intersection between (34) and (35).For examples, -coordinate is equal to in Fig. 12(a), thecorresponding values of - and -coordinate are and 211,respectively. The pentagram mark means the actual positions inthe figure. All results of and coordinate values for six dif-ferent positions are depicted in Figs. 13 and 14. Fig. 13 showsthe coordinates of the estimated position of the camcorder on

- plane and Fig. 14 puts six actual positions and six estimatedpositions of the camcorders in Fig. 13 together into one graph.Fig. 15 expresses all -, -, and - coordinate values of the es-timation results in 3-D plot.

Fig. 16. Five actual positions with various positions on �-axis while the�-values are set to approximately zero and the test samples have nearly same�-values and five estimated positions of the camcorders of small-scale test: thecircle boundaries, which are based on the estimated position that shows thelargest error from the actual ones, are set.

The total results are analyzed by using statistical measuresas shown in Table V. The mean absolute error (MAE) and stan-dard deviation of the estimation errors for six positions are (2.40,3.44, 6.87) and (2.68, 3.84, 7.18) cm, respectively. The locations“b” and “c”, whose -coordinates have larger magnitude thanothers, have higher mean errors compared to other locations.So we conducted additional experiment to analyze the effect ofhorizontal distortion to the accuracy of PEM. The five actualpositions and those estimated positions of the camcorders aredepicted in Fig. 16. The accuracy results are analyzed by usingstatistical measures as shown in Table VI. The MAE of the es-timation error for five additional positions is (4.65, 2.33, 4.66)cm. The MAE for the vertically distorted positions like the posi-tions “a”, “e”, and “f” in Table V is (2.51, 2.67, 5.35) cm. Com-pared to the vertically distorted positions, the MAE of the fiveadded positions has higher value on -axis since the videos cap-tured at those positions are more severely distorted in horizontalway. Also, we captured the video on the position “n” near theposition “b”, which had the highest error among the previoussix test results. The MAEs of “b” and “n” are (5.81, 7.17, 14.3)cm and (2.56, 1.96, 3.92) cm, respectively. The reason why theposition “b” has higher error than the position “n” in spite of


TABLE VISMALL-SCALE ACCURACY TEST RESULT OF POSITION ESTIMATES ON HORIZONTALLY DISTORTED VIDEOS

Fig. 17. Estimated �-vectors and seating plane of large-scale test: The estimated �-vectors are illustrated as colored dotted lines. The auditorium has its own slopebetween seating rows so the seating plane of large-scale test is determined by measuring the construction of the auditorium.

similar capturing position is deduced as quality difference be-tween the interlaced screen display and its asynchronous cam-corder capture. From this result, we can estimate the position ofthe camcorder with little error on small-scale test.

2) Large-Scale Testing: Large-scale test was performed inan auditorium of size 18 m 21 m and the size of the screen is4 m 2.25 m. Since the auditorium has its own slope betweenseating rows, the seating plane of large-scale test is determinedby measuring the construction of the auditorium. Figs. 17–21show the experimental results of estimating the suspicious seatswhere the piracy takes place. Fig. 17 describes the estimated

-vectors of six differently captured videos and the seatingplane, and Fig. 18 shows the enlarged versions of Fig. 17. Allresults of - and -coordinate values for six different positionsare depicted in Figs. 19 and 20. Fig. 19 shows the coordinatesof the estimated position of the camcorder on - plane, andFig. 20 puts six actual positions and six estimated positionsof the camcorders in Fig. 19 together into one graph. Fig. 21expresses all -, -, and - coordinate values of the estimationresults in 3-D plot.

The total results are analyzed by using statistical measuresas shown in Table VII. The MAE and standard deviation of theestimation errors for six positions are (33.84, 9.53, 50.38) cmand (38.61, 10.80, 57.64) cm, respectively. In the same mannerof the small-scale test results, the results show a tendency thatthe videos captured further positions from the screen has lowerperformance. The proposed algorithm can only use informationabout the original watermark pattern, which is not distorted, andthe distorted watermark pattern after camcorder capture. Usingthese two patterns, a 3 3 homographic matrix is provided by

the definition of 2-D projective geometry. It has a limitation toestimate 3-D position by exploiting the homographic matrix. Sothe proposed scheme employs the concept of the position vectorwhich has an initial point and direction, but does not have itsmagnitude. In this scheme, the magnitude of the position vectoris obtained by calculating the intersection between the positionvector and the seating plane. That is, what we estimate in thescheme is not the exact position but the initial point and thedirection. Therefore, if the estimated direction has an error, theerror is proportional to the distance of the position from thescreen.

Also, it shows that the horizontal distortion decreases the ac-curacy of the proposed method more than the vertical distortion.Compared to the small-scale test results, the large-scale test re-sults have severe mean errors. It is because the auditorium haslarger horizontal viewing angle than the seminar room. As theviewing angle gets larger, the degree of geometric distortion getsstronger. The degree of the viewing angle is proportional to theratio of screen size to room size. The ratio of the screen widthto the auditorium width is about 0.22 , while theratio of the screen width to the seminar room width is about 0.45

. The horizontal viewing angle extends up 56degrees in the auditorium, while the horizontal viewing angleextends up 23 degrees in the seminar room when the -posi-tions are equally 6 m to the screen in both cases. The scale ofthe auditorium for large-scale test is similar to real theater envi-ronment but the size of the screen is smaller than the standard ofdigital cinema, which screen is generally more than 8 m large.If the experiments are performed in real theater whose screen is8 m large, the ratio of the screen width to the theater is larger


Fig. 18. Intersected positions between the seating plane and the estimated �-vectors of large-scale test (enlarged version of Fig. 17): The pentagram mark depictsthe actual position of the camcorder. The black circle represents the estimated positions in �-� plane.

Fig. 19. Estimated positions of the camcorder in �-� plane of large-scale test.


TABLE VIILARGE-SCALE ACCURACY TEST RESULT OF POSITION ESTIMATES ON CAMCORDER CAPTURED VIDEOS

Fig. 20. Six actual positions and six estimated positions of the camcorders oflarge-scale test: the circle boundaries, which are based on the estimated positionthat shows the largest error from the actual ones, are set.

Fig. 21. Estimated position of the camcorder of large-scale test in 3-D plots.(a) 3-D view 1. (b) 3-D view 2.

than that of our large-scale test. Then the degree of geometricdistortion is weaker than our large-scale environment since theviewing angle gets smaller. As a consequence of these facts, weexpect that our proposed scheme produces acceptable perfor-mances in real digital cinema environments.

VI. CONCLUSION

Given that many pirated copies of digital cinema are capturedby the camcorder, we proposed a video watermarking schemeto protect digital cinema against camcorder capture. First, ourscheme provides robust watermark detection against camcordercapture to extract information about when and where the piracyoccurred. Second, the position of the pirate in addition to thetime and location information is estimated by our PEM. It limitsthe number of piracy suspects in the theater and helps to findout the pirate by matching persons to the databases stored inthe electronic ticket offices or in payment system. We showedthat our proposed scheme is robust against composite geometricdistortions that commonly occur due to the angle of the cam-corder relative to the screen. In our experiment, the PEM couldestimate the position of the camcorder with an MAE of (33.84,9.53, 50.38) cm. It is proved that these results of our PEM canbe applied in real theaters.

REFERENCES

[1] U.S. Piracy Fact Sheet, Motion Picture Association of America, 2005.[Online]. Available: http://www.mpaa.org/uspiracyfactsheet.pdf.

[2] Digital Cinema System Specification Version 1.2, Digital Cinema Ini-tiatives, LLC, 2008. [Online]. Available: http://www.dcimovies.com/DCIDigitalCinemaSystemSpecv1_2.pdf.

[3] Y. Nakashima, R. Tachibana, and N. Babaguchi, “Watermarked moviesoundtrack finds the position of the camcorder in a theater,” IEEETrans. Multimedia, vol. 11, no. 3, pp. 443–454, Apr. 2009.

[4] A. Leest, J. Haitsma, and T. Kalker, “On digital cinema and water-marking,” in Proc. SPIE Security and Watermarking of MultimediaContents V, Jan. 2001, vol. 5020, pp. 526–535.

[5] D. Delannay, J. Delaigle, B. Macq, and M. Barlaud, “Compensationof geometrical deformations for watermark extraction in the digitalcinema application,” in Proc. SPIE Security and Watermarking of Mul-timedia Contents III, Jan. 2003, vol. 4314, pp. 149–157.

[6] J. Lubin, J. Bloom, and H. Cheng, “Robust, content-dependent, high-fi-delity watermark for tracking in digital cinema,” in Proc. SPIE Securityand Watermarking of Multimedia Contents V, Jan. 2003, vol. 5020, pp.536–545.

[7] J. Ó Ruanaidh and T. Pun, “Rotation, scale and translation invariantspread spectrum digital image watermarking,” Signal Process., vol. 66,pp. 303–317, Nov. 1998.

[8] C. Lin, J. Bloom, I. Cox, M. Miller, and Y. Lui, “Rotation, scale,and translation-resilient watermarking for images,” IEEE Trans. ImageProcess., vol. 10, no. 5, pp. 767–782, May 2001.

[9] M. Alghoniemy and A. Tewfik, “Geometric distortion correction inimage watermarking,” in Proc. SPIE Security and Watermarking ofMultimedia Contents II, Jan. 2000, vol. 3971, pp. 82–89.

[10] P. Bas, J. Chassery, and B. Macq, “Geometrically invariant water-marking using feature points,” IEEE Trans. Image Process., vol. 11,no. 9, pp. 1014–1028, Sep. 2002.

[11] S. Pereira and T. Pun, “Robust template matching for affine resistantimage watermarks,” IEEE Trans. Image Process., vol. 9, no. 6, pp.1123–1129, Jun. 2000.


[12] M. Kutter, “Watermarking resisting to translation, rotation, andscaling,” in Proc. SPIE Multimedia Systems and Applications, Jan.1998, vol. 3528, pp. 423–431.

[13] S. Voloshynovskiy, R. Deguillaume, and T. Pun, “Multibit digital wa-termarking robust against local nonlinear geometrical distortions,” inProc. IEEE Int. Conf. Image Processing, Oct. 2001, pp. 999–1002.

[14] D. Bogumil, “Reversing global and local geometrical distortions inimage watermarking,” in Proc. Int. Workshop Information Hiding, May2004, vol. 3200, pp. 25–37.

[15] M. J. Lee, K. S. Kim, H. Y. Lee, T. W. Oh, Y. H. Suh, and H. K. Lee,“Robust watermark detection against D-A/A-D conversion for digitalcinema using local auto-correlation function,” in Proc. IEEE Int. Conf.Image Processing, Oct. 2008, pp. 425–428.

[16] B. Chupeau, A. Massoudi, and F. Lefébvre, “In-theater piracy: Findingwhere the pirate was,” in Proc. SPIE Security, Forensics, Steganog-raphy, and Watermarking of Multimedia Contents X, Jan. 2008, vol.6819, pp. 68190T1–10.

[17] M. J. Lee, K. S. Kim, and H. K. Lee, “Forensic tracking watermarkingagainst in-theater piracy,” in Proc. Int. Workshop Information Hiding,Jun. 2009, vol. 5806, pp. 117–131.

[18] M. Maes, T. Kalker, J.-P. Linnartz, J. Talstra, G. Depovere, and J.Haitsma, “Digital watermarking for DVD video copy protection,” IEEESignal Process. Mag., vol. 17, no. 5, pp. 47–57, Sep. 2000.

[19] K. S. Kim, H. Y. Lee, D. H. Im, and H. K. Lee, “Practical, real-time, androbust watermarking on the spatial domain for high-definition videocontents,” IEICE Trans. Inf. Syst., vol. E91-D, pp. 1359–1368, May2008.

[20] I. G. Karybali and K. Berberidis, “Efficient spatial image watermarkingvia new perceptual masking and blind detection schemes,” IEEE Trans.Inf. Forensics Security, vol. 1, no. 2, pp. 256–274, Jun. 2006.

[21] R. I. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision. Cambridge, U.K.: Cambridge Univ. Press, 2004.

[22] W. Wang and H. Farid, “Detecting re-projected video,” in Proc. Int.Workshop Information Hiding, May 2008, vol. 5284, pp. 72–86.

Min-Jeong Lee received the B.S. degree in computerengineering from Kyungpook National University,Daegu, Korea, in 2006, and has been in the IntegratedMaster’s Ph.D. Program in Computer Science fromKorea Advanced Institute of Science and Technology(KAIST), Daejeon, Korea. She is currently pursuingthe Ph.D. degree in the Multimedia Computing Lab-oratory, Department of Computer Science, KAIST.

Her research interests are focused on multimediawatermarking and fingerprinting, media forensics,and secure applications.

Kyung-Su Kim received the B.S. degree in computerengineering from Inha University, Incheon, Korea, in2005, and the M.S. and Ph.D. degrees, both in com-puter science, from Korea Advanced Institute of Sci-ence and Technology (KAIST), Daejeon, Korea, in2007 and 2010, respectively.

He is now with the Network Security ResearchTeam, KT Network R&D Laboratory in Daejeon. Hisresearch interests include image/video watermarkingand fingerprinting, error concealment methods,information security, multimedia signal processing,

multimedia communications, and network security.

Heung-Kyu Lee received the B.S. degree in elec-tronics engineering from Seoul National University,Seoul, Korea, in 1978 and the M.S. and Ph.D. de-grees in computer science from Korea Advanced In-stitute of Science and Technology (KAIST), Daejeon,Korea, in 1981 and 1984, respectively.

Since 1986, he has been a Professor in the Depart-ment of Computer Science, KAIST. His major inter-ests are digital watermarking, digital fingerprinting,and digital rights management.

ieee transactions on multimedia, vol. 12, no. 7, …hklee.kaist.ac.kr/publications/ieee trans. on...

Documents