real-time video watermarking system on the compressed...

Digital Signal Processing 22 (2012) 190–198

Contents lists available at SciVerse ScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

Real-time video watermarking system on the compressed domain forhigh-definition video contents: Practical issues

Min-Jeong Lee a, Dong-Hyuck Im a, Hae-Yeoun Lee b, Kyung-Su Kim a,1, Heung-Kyu Lee a,∗a Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu, Daejeon, Republic of Koreab School of Computer and Software Engineering, Kumoh National Institute of Technology, 1, Yangho-dong, Gumi, Gyeongbuk, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history:Available online 30 August 2011

Keywords:Real-time video watermarkingRobust video watermarkingPractical video watermarkingQuantization index modulationCopyright protection

Everyday, we encounter high-quality multimedia contents from HDTV broadcasting, DVD, and high-speedInternet services. These contents are, unhappily, processed and distributed without protection. This paperproposes a practical video watermarking technique on the compressed domain that is real-time androbust against video processing attacks. In particular, we focus on video processing that is commonlyused in practice such as downscaling resolution, framerate changing, and transcoding. Most previouswatermarking algorithms are unable to survive when these processings are strong or composite. Weextract low frequency coefficients of frames in fast by partly decoding videos and apply a quantizationindex modulation scheme to embed and detect the watermark. On an Intel architecture computer, weimplement a prototype system and measure performance against video processing attacks frequentlyoccur in the real world. Simulation results show that our video watermarking system satisfies real-timerequirements and is robust to protect the copyright of HD video contents.

© 2011 Elsevier Inc. All rights reserved.

1. Introduction

Due to advances in IT technology, we easily encounter high-quality digital contents from high-definition television (HDTV)broadcasting, DVD, and high-speed Internet services. Also, peoplestart using advanced IT devices in their daily lives. Every homeuses extremely high-speed Internet that provides many video ser-vices such as Internet protocol television (IPTV) and video on de-mand (VOD) service. Downloading and watching movies or tele-vision programs on a PC are as natural as watching TV. Whilewalking or in subway, people watch TV with digital media broad-casting (DMB) and movies with portable multimedia player (PMP).DMB and movies are even displayed on their cellular phones.

Although digital multimedia has many advantages, it hasbrought problems regarding the preservation of copyright. Illegalcontents have the same quality as the original ones and they areeasy to copy and distribute. With fast Internet speed, people getcontents free on Internet using P2P and web hard drive service.However, video watermarking techniques which have been devel-oped until now are not practical and powerful enough to protectthe copyright of digital multimedia.

• Practical video processing In movie and broadcasting markets,most contents are digitalized and produced in HDTV quality.

* Corresponding author.E-mail address: [email protected] (H.-K. Lee).

1 Fax: +82 42 350 8144.

1051-2004/$ – see front matter © 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.dsp.2011.08.001

These valuable contents are provided on the air or by Internetwithout protection. People usually save these HDTV contentson their computer with capture devices and change their res-olution, framerates, or coding formats to display in portabledevices such as iPod, PDA, and PMP. Moreover, they illegallyshare or distribute these processed contents with anonymousfriends. Cropping and rotation processing are rare in video,unlike the case of images, but downscaling, framerate chang-ing, and transcoding occur frequently. For example, 1920 ×1080 pixels HD videos are downscaled to 640×480 pixels VGAor 320×240 pixels QVGA videos for playing in portable de-vices with a small screen. Also, framerates are changed from30 frames per second (fps) to 24 fps or to 15 fps. Because ofbandwidth limits, MPEG-4 format is widely used in Internetbroadcasting service so that video files are transcoded fromMPEG-2 to MPEG-4. In most cases, these processing are con-ducted simultaneously.

• Video protection requirements There have been many workson the requirements for watermarking and fingerprinting [1].When we consider the current situations for video applica-tions, video watermarking systems should meet two main as-pects: real-time requirements and robustness to severe down-scaling, framerate changing, and transcoding.1. Real-time requirements: the size of video contents becomes

larger and larger and their processing requires more com-putational costs. In such situations, video watermarking sys-tems should process video frames without delay in display.It means that it should not take heavy computational cost.

http://dx.doi.org/10.1016/j.dsp.2011.08.001

http://www.ScienceDirect.com/

http://www.elsevier.com/locate/dsp

mailto:[email protected]

http://dx.doi.org/10.1016/j.dsp.2011.08.001

M.-J. Lee et al. / Digital Signal Processing 22 (2012) 190–198 191

Real-time watermarking systems for high definition videocontents are not an exception any more.

2. Robustness: high definition video contents are downscaled,changed framerates, and transcoded for displaying in a va-riety of devices. From the viewpoint of watermarking sys-tems, these processing are regarded as attacks on the em-bedded watermark in videos. Video watermarking systemsshould be able to detect the watermark correctly againstthese attacks, while it remains one of most difficult areasin watermarking researches.

This paper proposes a practical real-time video watermarkingalgorithm for high-definition video contents that is designed andtested to be used for actual broadcasting service. We apply a quan-tization index modulation technique to low frequency componentsof full-frame DCT frames calculated by decoding partially com-pressed videos. Especially, we focus on real-time requirements androbustness against most frequent attacks such as downscaling witharbitrary ratios, changing framerates, transcoding, and their com-position. Although our algorithm is not strong until now againstother geometric distortion attacks, these attacks are rare in reality.Our video watermarking algorithm is a blind watermarking schemewithout original videos in watermark detection. Currently, a pro-totype system has been implemented and tested in constrainedenvironments on an Intel architecture personal computer.

This paper is organized as follows. In Section 2, we reviewprevious video watermarking algorithms and analyze problems inaspect of practical video processing. Section 3 proposes our videowatermarking algorithm on the compressed domain. Simulationresults are shown in Section 4. Section 5 concludes the paper.

2. Backgrounds

Video watermarking research has been classified into two cat-egories: on the uncompressed domain and on the compresseddomain [2,3]. Our video watermarking algorithm belongs to thecategory on the compressed domain because it does not use rawvideo frames by fully decompressing videos.

2.1. Uncompressed domain video watermarking

Video sequences consist of a series of consecutive still imagesor frames. Since the general problem of video watermarking seemssimilar to that of image watermarking, image watermarking tech-niques can be directly applicable to video. A compressed videostream is decompressed into raw frames, and the watermark isembedded into the video signal on the uncompressed domain.

Huang et al. [4] explained a video watermarking scheme basedon a block-matching algorithm in the uncompressed domain. Theirscheme was robust against typical signal processing attacks, how-ever, they were hard to find the embedding position in detectionprocess in case of scaling attack. Severe scaling attack makes thesize of blocks very small, so that it cannot obtain motion vectorsfor block-matching. Koz and Alatan [5] fully decoded compressedvideos for video watermarking by means of utilizing temporal sen-sitivity of the human visual system. They focused on typical videoattacks such as additive Gaussian noise, video compression, andtemporal synchronization loss, but not scaling.

Although computing power has increased, calculating rawframes from compressed videos and compressing them after wa-termark embedding is still complex and time consuming. Also, itis difficult to design robust watermarking algorithms against se-vere scaling without degrading the visual quality of videos. It is adisadvantage to make real-time video watermarking systems us-ing uncompressed domain watermarking algorithms, although it ispossible.

2.2. Compressed domain video watermarking

In compressed domain watermarking, compressed videostreams are partly decoded, modified to accommodate the wa-termark, and re-compressed to form watermarked videos. Com-pressed domain watermarking algorithms are much faster thanuncompressed domain watermarking algorithms because full-decoding of compressed videos is not required.

Chan et al. [6] proposed a video watermarking algorithm thatembeds the watermark into middle frequency coefficients of DWT.They decoded fully compressed videos and applied DWT to rawframes. However, middle frequency coefficients of DWT are not ro-bust against severe scaling distortions. Langelaar and Lagendijk [7]proposed a video watermarking algorithm called differential en-ergy watermark (DEW). They partitioned a video into groups ofblocks and further divided each group into two sets of an equalsize depending on the watermark embedding key. A single payloadbit was embedded by comparing the energy of selected DCT co-efficients within two sets. This algorithm was not robust againsttranscoding, particularly if the group of pictures (GOP) structurewas changed. Noorkami and Mersereau [8] proposed a video wa-termarking scheme which holds video bitrate increase to reason-able values. They limited the watermark to nonzero-quantized acresiduals in P-frames and I-frames. Since the location of nonzero-quantized ac residuals was lost after decoding, a likelihood ratiotest was used for finding the location of watermarked coefficients.It makes the scheme not satisfy real-time requirements for HDvideos. Also, since video downscaling totally changes the organi-zation of block DCT coefficients and decreases the value of eachblock DCT coefficient, it is difficult to apply the likelihood test todetect videos after strong downscaling attacks.

Wang and Pearmain [9] proposed a blind video watermarkingalgorithm that considered the fact that low frequency coefficientsof full-frame DCT of an image were similar to coefficients of full-frame DCT of its scaled image itself. However, their watermarkingalgorithm is not enough to satisfy our video watermarking require-ment in practical aspects. Also, they did not consider compositeattacks for video processing. For example, their algorithm for re-sisting scaling attacks cannot survive in framerate changing. Ourvideo watermarking is similar to their basic approach, but we de-sign and modify a video watermarking scheme to achieve the real-time requirement and the robustness requirement against practicalvideo processing.

According to the syntax of MPEG-2 video bitstreams, the or-ganized variable length codeword (VLC) of block DCT coefficientsand macro-block motion vectors are only available. Lu et al. [10]designed a real-time video watermarking algorithm using this VLCof MPEG-2 video bitstreams. Ye et al. [11] proposed an adaptivevideo watermarking scheme by modifying the relations of equalquantization step DCT coefficient pairs. They used a block classi-fication criterion to select suitable watermarking positions. Bothalgorithms perform in real-time, but is not robust against spe-cific attacks. Usually, video processing such as downscaling andtranscoding result in the regeneration of whole MPEG-2 streamsand totally change organization, structure, and all bit stream fieldsincluding user or extra fields in the header.

Although these compressed video watermarking algorithmscould survive against typical attacks, they are not robust enoughto resist practical video processing attacks and geometric distortionattacks.

3. Real-time video watermarking system on the compresseddomain

This section proposes a practical real-time video watermarkingalgorithm using a quantization index modulation technique on the

192 M.-J. Lee et al. / Digital Signal Processing 22 (2012) 190–198

Fig. 1. Copyright protection structure in broadcasting service.

compressed domain. We first describe the structure of our videowatermarking system and then explain embedding and detectingthe watermark.

3.1. Structure of video watermarking system

Our video watermarking algorithm is developed to be used forbroadcasting service. Fig. 1 shows the structure of a copyright pro-tection system for broadcasting applications. When videos comedirectly from camera, we make watermarked compressed videoswith watermark embedding. When videos come from compressedvideos, first we decode partly and then re-encode videos withwatermark embedding. Watermarked videos are stored to a diskserver to be serviced later or directly broadcasted for service. Afterbroadcasting, a digital set-top box on the client side decodes com-pressed videos with fingerprint embedding and displays on screenin real-time.

Two kinds of copyright information are embedded into videocontents: watermark and fingerprint information. Digital water-mark is to prove contents ownership and fingerprint is to traceillegal distributors. Since digital watermark should be same for allcontents, it is embedded before broadcasting. Since all contentsshould have different digital fingerprint, embedding unique digitalfingerprints takes place in a digital set-top box before displaying.In this paper, we limit the message as watermark information forownership protection.

People record videos and share with others. Since HDTV con-tents are too large to share, they process videos and share withP2P or web-hard driver services. Downscaling, framerate changing,and transcoding are the most frequent processing to videos. Whenwe gather video contents from P2P and web-hard services, digitalwatermark and fingerprint are detected in those videos to deter-mine the owner of the contents or illegal distributors.

Our video watermarking algorithm is compatible with all DCT-based hybrid compression schemes, for example, MPEG-2, MPEG-4,and ITU-T H.263. So, it can be applicable for other applicationssuch as DVD, IPTV, and VOD service. Through this paper, we willassume that videos come from previously compressed videos be-cause videos from camera are a small variation of our assumption.Also, since watermarking and fingerprinting algorithms are exactlythe same except for the embedding location, we just explain aboutthe watermarking algorithm.

3.2. Full-frame DCT

3.2.1. RobustnessWhen we analyzed practical video processing in reality, the

video processing which gives a video the strongest distortion wasdownsizing from HD resolution to QVGA resolution, which has aratio over 1/27. Most video watermarking algorithms would fail

in this processing. Moreover, downsizing occurred with frameratechanging and transcoding simultaneously. So, we exploit the factthat low frequency components in the full-frame DCT domain wereapproximately equivalent under severe resizing because downscal-ing in the uncompressed domain has same effects as truncation ofhigh frequency bands [9] and multiplying the coefficients by a con-stant in the full-frame DCT domain [12]. More precisely speaking,the constant C is determined as

C =√

(h′ · w ′)(h · w)

(1)

where h and w are the height and width of the original imageand h′ and w ′ are the height and width of the downscaled image,respectively.

Fig. 2 shows downscaling effects in the uncompressed domainand in the full-frame DCT domain. The low frequency coefficientsof full-frame DCT are approximately equivalent to those of full-frame DCT after downscaling (see Fig. 2(b)). Also, in Fig. 2(c), thecoefficients of the downscaled image are decreased as around 2(= √

(M × N)/(M/2 × N/2)) times. It means that if we embed thewatermark into low frequency coefficients of full-frame DCT usingthese downscaling effects, the embedded watermark could surviveagainst downscaling. When we design watermark embedding anddetection, the difference of coefficient values should be consideredto resist scaling attacks.

3.2.2. Real-time processingIn order to calculate full-frame DCT coefficients, decompress-

ing videos fully and applying DCT to raw frames require expensivecomputational costs. Luckily, full-frame DCT coefficients can be di-rectly calculated from 8 × 8 pixels block DCT coefficients aftercompressed videos are partly decoded to 8×8 pixels block DCT [9,13].

Let the frame size as LN × MN and the size of single blockBi, j as N . L and M represent the number of blocks in rows andcolumns, respectively. Coefficients of Full DCT are calculated as fol-lows:

FullDCT =√

1

LMA1 ·

⎛⎜⎜⎜⎝

B0,0 B0,1 · · · B0,M−1B1,0 B1,1 · · · B1,M−1

......

......

BL−1,0 BL−1,1 · · · BL−1,M−1

⎞⎟⎟⎟⎠ · AT

2

(2)

The size of FullDCT matrix is LN × MN. Bi, j is a matrix withN × N elements and represents the set of DCT coefficients for sub-block. A1 and A2 are square matrixes with LN × LN and MN × MNsize, respectively and defined as:


Fig. 2. Downscaling effects in the uncompressed domain and in the full-frame DCT domain.

A1 =

⎧⎪⎨⎪⎩

√1/2a(u, i), u = 0, i mod N �= 0√2a(u, i), u �= 0, i mod N = 0

a(u, i), otherwise

A2 =

⎧⎪⎨⎪⎩

√1/2a(v, j), v = 0, j mod N �= 0√2a(v, j), v �= 0, j mod N = 0

a(v, j), otherwise

(3)

where

a(u, i) = cos

((2i + 1)uπ

2LN

)u, i = 0,1, . . . , LN − 1

a(v, j) = cos

((2 j + 1)vπ

2MN

)v, j = 0,1, . . . ,MN − 1

From this property, our video watermarking algorithm achieveslow complexity in computational cost and time for real-time re-quirements.

3.3. Watermark embedding

We employ a quantization index modulation (QIM) scheme toembed the watermark into the low frequency coefficients of full-frame DCT. Fig. 3 shows our watermark embedding procedure inMPEG-2 videos. First, we partly decompress videos and acquireblock DCT coefficients. Full-frame DCT coefficients are calculated

from block DCT coefficients as explained in Section 3.2.2. QIMembeds the watermark in selected low frequency coefficients byquantizing coefficients. Finally, we calculate block DCT coefficientsinversely and re-encode to make watermarked MPEG-2 videos. Ifvideos come from camera directly, we do not have to decode com-pressed videos, but embed the watermark after calculating blockDCT coefficients during video encoding process.

In order to embed the watermark using QIM in DCT domain,determining an appropriate quantization step (Q-step) is crucialfor robustness against various signal processing attacks, especiallydownscaling. We assume that downscaling from HD to QVGA isthe strongest distortion in practice, so our aim is to make thewatermark robust against downscaling from HD to QVGA. It is re-minded that the magnitude of DCT coefficients from HD videosis approximately

√27 times bigger than that from QVGA videos

downscaled from the HD videos as mentioned in Section 3.2.1.However, full-frame DCT coefficient values are slightly differentafter resizing because they are quantized and rounded throughre-encoding. We analyzed the extent of the differences betweenthe coefficients before resizing and those after resizing for achiev-ing watermark robustness. The difference values are expected tobe very close to zero. The experimental result has supported thisobservation. Fig. 4(a) is a histogram that explains re-encoding ef-fect on DCT domain after resizing from HD video to QVGA video.It shows the difference of DCT coefficients between HD frames of


Fig. 3. Watermark embedding procedure for MPEG-2 video contents.

Fig. 4. Analysis of re-encoding effect on DCT domain after resizing from HD video to QVGA video and performance of Q-step selection according to its analysis result.(a) Difference histogram of DCT coefficients between HD frames of 1/

√27 times and QVGA frames downscaled from the HD frames. Most difference values are centered on 0

and its probability distribution can be modeled by Laplacian distribution f (x|μ,b), where μ = −0.18 and b = 23.93. (b) Performance of the theoretical BER and the resultantBER under different Q-steps.

1/√

27 times and QVGA frames downscaled from the HD frames.A 5-min HD video clip is used. As we expected, most differencevalues are very close to zero. Also, the distribution of the plottedhistogram approximately follows the shape of Laplacian distribu-tion with location parameter μ and scale parameter b. The locationparameter μ is equal to mean value of the histogram and the scaleparameter b is obtained by calculating σ/

√2 where σ is the stan-

dard deviation of the histogram. The Q-step size is determined byemploying the concept of a confidence interval in statistics. Thebit error rate (BER) τ of the detection scheme is regarded as theoutside area of 100(1 − τ ) % confidence intervals [α,β] in thedifference histogram. If the Q-step size is chosen as twice of thelarger one between the confidence intervals α and β , it guaranteesthat the BER τ is caused by the difference values v fall in the rangeof v < −max(|α|, |β|) or v > max(|α|, |β|) when the HD video isdownscaled to QVGA. The Q-step size Δ is determined when theBER is set to τ and the difference histogram has the location pa-rameter μ and the scale parameter b as follows [14]:

Δ = 2 max(|α|, |β|)

= 2 max

(2

∣∣∣∣∣n∑

j=1

|X j − μ|χ2

2n,1−τ/2

∣∣∣∣∣,2

∣∣∣∣∣n∑

j=1

|X j − μ|χ2

2n,τ /2

∣∣∣∣∣)

(4)

where X is the sequence of the difference histogram and χ22n,p

denotes the pth quantile of the χ2 distribution with 2n degreesof freedom. For instances, the positive confidence intervals withdesirable BERs 0.01, 0.001, and 0.0001 are indicated as a dottedline, a dash–dot line, and a dashed line in Fig. 4(a), respectively.For setting the Q-step size in embedding process, thousands of thesample frames in the host video are randomly selected. Then a dif-ference histogram as Fig. 4(a) is calculated and then it is fitted toLaplacian distribution by obtaining the location and scale parame-ters. Then the Q-step size is chosen according to the tradeoff be-tween the desirable BER and visual quality. Once the quantizationstep is determined for HD video, it is adaptively used in detectionprocess when the watermarked video may be downscaled from


Fig. 5. Substituting values for watermark embedding in QIM.

HD video. Fig. 4(b) shows the performance of the theoretical BERand the resultant BER under different Q-steps. For example, the Q-step size of 547 is calculated in order to satisfy BER 0.00001. Thescheme embedded and detected watermark with this Q-step size.In this situation, the watermark was extracted with BER 0.000014.

Our next concern is choosing the embedding position of co-efficients of full-frame DCT. While the watermark should be em-bedded in low frequency coefficients for robustness against down-scaling, coefficients around the DC component usually have largevalues, so modifying them causes degrading visual quality severely.Also, the closer coefficient values locate near the DC component,the more their values vary after re-encoding. Hence, it is not agood choice to embed the watermark in coefficients near the DCcomponent. Therefore, we choose the mid-frequency componentsas the embedding position for trade-off between robustness andvisual quality. Considering the influence of MPEG compression onthe watermarked video, the low frequency coefficients among themid-frequency (mid-low frequency) are appropriate for embed-ding. Since our watermark has to be survived after severe resiz-ing and we also assume that the smallest size of video is QVGA(320×240), we select the embedding position in the mid-low fre-quency. Also, we found that modifying consecutive coefficients af-fects visual quality perceptibly, thus, the embedding positions havedistances between themselves. By doing so, the energy of the em-bedded watermark is spread in the wide range of mid-frequencycomponents. However, the bigger the interval is, the farther thelast embedding position is located. It brings about decreasing therobustness of the scheme. So we empirically choose the interval inorder to arrange the watermark bits in the mid-frequency.

After setting the parameters for QIM, the watermark is embed-ded by substituting DCT coefficients with quantized values. Thewatermark consists of a binary sequence, w = {w1, w2, . . . , wn},where wk ∈ {0,1} and n means the length of the watermark.Let x = {x1, x2, . . . , xn} be selected full-frame DCT coefficients ofa frame and y = {y1, y2, . . . , yn} be modified coefficients after wa-termark embedding. We use the embedding function E(x, w) asbelow that produces substituted values which have a minimumdistance between original values and modified values:

yk = E(xk, wk) = round

(xk

Δ

)· Δ + d(xk, wk) (5)

where Δ is the size of Q-step and the function d(xk, wk) denotesthe dithered value corresponding to the watermark bit wk .

d(xk, wk)

={

Δ2 if (R mod 2 = 0, wk = 0) or (R mod 2 = 1, wk = 1)

−Δ2 if (R mod 2 = 0, wk = 1) or (R mod 2 = 1, wk = 0)

(6)

where R stands for round(xkΔ

).Embedding the watermark using QIM is depicted in Fig. 5. As-

sume that the gray circle is an original coefficient value. If a wa-termark bit “1” is embedded to this coefficient, it is substituted forthe middle value of w1 which is the most nearest to the originalcoefficient value. If a watermark bit “0” is embedded, it is substi-tuted for the middle value of the most nearest w0.

After watermark embedding by QIM using these parameters,modified full-frame DCT coefficients are decomposed into 8×8 pix-els block DCT coefficients for MPEG-2 video re-encoding.

3.4. Watermark detection

We use a QIM scheme to detect the watermark that was em-bedded in low frequency coefficients of full-frame DCT. The processof watermark detection is quite similar to that of watermark em-bedding shown in Fig. 3. The process to calculate full-frame DCTcoefficients is the same as the embedding procedure. The differ-ences are that we do not modify coefficients, but estimate theembedded watermark. Also, we do not re-encode videos. We usethe same notations in watermark embedding and add several no-tations for watermark detection. Let w ′ = {w ′

1, w ′2, . . . , w ′

n} be thesequence of extracted watermark bits. To get the minimum dis-tance between watermarked values and nearest quantized valueswhich represent watermark bits, we define the detection functionD(y) as follows:

D(yk) = w ′k = arg

(min

w

∥∥yk − N(yk, w)∥∥)

(7)

where N(yk, w) is a function to get the nearest fixed quantizedvalue for each watermarked coefficient. We use a modified Q-stepsize for downscaled videos. Note that the largest frame size weassumed is HD and the Q-step size determined in embedding pro-cess is set for HD videos. So, the Q-step size in detection processis calculated according to downscaling size using the constant inEq. (1).

The function D(y) returns the watermark bit which makes‖yk − N(yk, w)‖ the smallest. The minimum distance between wa-termarked coefficients and quantized values of each watermark bit“0” or “1” is calculated. Detecting the watermark using QIM is de-picted in Fig. 6. The left gray circle indicates that a watermark bit“1” is embedded, because the distance between its value and aquantized value representing the watermark bit “1” (w1) is theshortest. The right gray circle indicates that a watermark bit “0” isembedded, because the distance between its value and a quantizedvalue representing the watermark bit “0” (w0) is the shortest.

4. Simulation results

We implemented a prototype system on a 3.6 GHz Intel Pen-tium IV processor with 1 GB RAM. Eight MPEG-2 HD videos weretested in the fields of movies, entertainment shows, documen-taries, etc., which contained more than 5 different scenes and was10 seconds long (300 frames). Captured frames from test videosare shown in Fig. 7. Our video watermarking system embeds 24watermark bits per each frame which encoded by Golay ECC codeinto 10 seconds long videos. Golay ECC code encodes 12-bit codeinto 23-bit word in such a way that any-triple-bit error can becorrected [15]. Therefore, 46 error corrected watermark bits areembedded in one frame. The first embedding position is 3000th inzigzag scan order and their interval is 100. We analyzed percep-tual quality, real-time performance, and robustness against videoprocessing attacks. To be applied in real broadcasting service, thestructure of the prototype system should be modified a little, butsimulation results will be equal.


Fig. 6. Estimating values for watermark detection in QIM.

Fig. 7. Snapshot examples of test videos.

Table 1Execution time of units for HD MPEG-2 watermarking. Decoding includes MPEG-2partial decoding and full-frame DCT calculation.

Decoding Watermarking procedure Display Total time

Embedding Detection

Embedder 0.011 0.015 – 0.005 0.031Detector – 0.001 0.017

4.1. Visual quality

Signal processing applications measure peak signal-to-noise ra-tio (PSNR) values to represent the quantitative quality of imagesor frames. Generally, watermarked images over 42 dB are regardedas good as original images. The overall PSNR values of our videowatermarking algorithm were around 46.9 dB. Especially, the mod-ification effects of coefficients on the frequency domain are spreadover the whole frames on the uncompressed domain. Therefore,we could not notice modification by the naked eye. It means thatour video watermarking algorithm does not harm the visual qual-ity of videos.

4.2. Real-time performance

We used a performance tuning tool called Intel VTune Perfor-mance Analyzer that helped identify the bottleneck of systems andmeasured the processing time of each function [16,17]. Our systemconsists of three major units: MPEG-2 partial decoding and calcu-lating full frame DCT coefficients, watermark embedding or detec-tion using QIM, and MPEG-2 re-encoding. The processing time ofeach processing unit are summarized in Table 1.

It took a constant time to decode partly compressed MPEG-2videos. We could not measure video re-coding time. As you know,video encoding requires expensive computational costs so that itis impossible to process in real-time even using the most recentpersonal computers. Optimized hardware systems for video en-coding should be applied. After partly decoding, full-frame DCTcoefficients are simply calculated from block DCT coefficients.Computing cost for calculating full-frame DCT coefficients fromblock DCT coefficients is almost one fourth of that from pixeldomain [13]. Watermark embedding in QIM is just substitut-ing several coefficients. Watermark detection in QIM is check-ing a quantization table. Therefore, these watermarking process

Table 2The average BER comparison between [9] and the proposed method with no rep-etition after video processing attacks such as downscaling, format conversion, andframerate change attacks. The used host videos are 1920 × 1080 MPEG-2 at 30 fpsand DR means downscaling ratio.

[9] Proposed

No attack 0.000 0.000

Downscaling to 640 × 480 (DR = 0.15) 0.233 0.002480 × 270 (DR = 0.06) 0.261 0.002320 × 240 (DR = 0.04) 0.274 0.004

Format conversion MPEG-2 → MPEG-4 0.096 0.000

Framerate conversion 30 fps → 24 fps 0.000 0.00030 fps → 15 fps 0.000 0.000

could be performed without expensive computational costs. Usu-ally, video watermarking algorithms on the uncompressed domainconsider human visual system (HVS) to increase robustness andinvisibility of the watermark. However, it takes a great amounttime to calculate HVS mask and hence uncompressed domainvideo watermarking algorithms require expensive computationalcosts [18–20].

4.3. Robustness

To measure the robustness of our video watermarking system,we embedded the watermark in eight HD videos (1920×1080 pix-els, 30 fps, MPEG-2 format). Watermarked HD videos were pro-cessed in several ways such as changing framerates, transcodingto MPEG-4, downscaling to arbitrary ratio, and the composition ofthese attacks. Then, we tried to detect the embedded watermarkfrom processed HD videos. We measured bit error rates (BER) ofthe watermark information after each attack. Table 2 summarizedcomparison results with [9] on average. Although it revealed thatnone of error bits extracted occurs for ‘No Attack’ and ‘FramerateConversion’, it was weaker when downscaling ratio was smaller.Also, there existed the small bit error for ‘Format conversion’. An-other disadvantage of [9] was that this algorithm based on modi-fying low frequency DCT coefficients cannot be directly applied toHD video contents because there was likely to cause visual arti-fact between the unwatermarked and watermarked frame such asflickering effect in the watermarked video.


Our video watermarking system is robust to downscale to ar-bitrary size. However, it was not easy to describe all cases sothat we showed most common sizes in illegally distributed videossuch as HD, VGA, QVGA, and so on. HD videos without downscal-ing are distributed to those who attach great importance to videoquality. VGA videos are distributed to those who watch videoswith PC. QVGA videos are distributed to those who use PDAs orPMPs. We tested four most common sizes of watermarked videos.Downscaling ratio from HD to QVGA is over 1/27, where mostvideo watermarking algorithms could not survive. Since we em-bedded the watermark into full DCT coefficients that are not af-fected by scaling, our video watermarking algorithm could detectcorrectly the watermark. Even though there are small errors dueto re-quantization effect, it shows robustness against downscalingprocessing. Downscaling is accompanied with framerates changingand transcoding, i.e., videos are processed with composite process-ing.

We also transcoded watermarked MPEG-2 HD videos to MPEG-4HD videos. Different from block DCT watermarking algorithms,we embed and detect the watermark into full-frame DCT coef-ficients. Since full-frame DCT coefficients were not affected bychanging video coding methods, our video watermarking algorithmwas robust against transcoding. Moreover, our scheme is robustagainst framerate change. Since we embedded the watermark in-formation into a single frame, we could detect the watermarkcorrectly against framerates changing. Therefore, changing fram-erates does not affect the robustness of our video watermarkingalgorithm.

In practice, videos are processed with composite processing.We simulated video processing by combining changing framerates,

downscaling, and transcoding. Table 3 shows averaged BERs overtest videos. Combining changing framerates and transcoding didnot affect watermark detection performance in our video water-marking system. However, BERs increased with the increase in thedownscaling ratio. Downscaling to QVGA was the worst case incomposite attacks. In order to get better performance on compositeattacks and obtain comprehensive results on various transcodingattacks (1920 × 1080 MPEG-2 at 30.1 Mbps was transcoded from720×480 MPEG-4 at 3.0 Mbps to 320×240 MPEG-4 at 0.5 Mbps),we extended our test by adopting repetition time, which was to ac-cumulate the extracted watermarks from consecutive frames dur-ing the predefined seconds. As the repetition number increased,watermark detection results became more reliable as depicted inFig. 8. Although the average BERs increased when the strength ofattack became greater, the increase of the repetition time madesure that the value of BERs was close to zero.

Our video watermarking system could detect successfully theembedded watermark against composite video processing. Resultssupport that our video watermarking algorithm satisfies robustnessrequirement.

Table 3Average bit error rate of watermark detection in composite attacks (no repetition).

Composite attacks with MPEG-4 format conversion

Resizing

640 × 480 480 × 270 320 × 240

Framerate 30 fps 0.006 0.02 0.047Framerate 24 fps 0.007 0.021 0.043Framerate 15 fps 0.01 0.022 0.032

Fig. 8. Average BER of watermark detection with repetition (0, 0.5, and 1 second) in various composite attacks: downscaling (DR = 0.167–0.04), format conversion, bitrate(BR = 0.5–3.0 Mbps) and framerate change (15, 24, and 30 fps).


5. Conclusions

As digital content markets and infrastructures are emerging,high quality digital contents become the center of market shares.Protecting copyright of valuable video contents comes up with themarket currencies. Previous video watermarking algorithms werenot robust enough against practical video processing. We proposeda real-time video watermarking algorithm on the compressed do-main for HD videos that was robust against practical video pro-cessing and developed for the broadcasting service. We decodedpartly compressed videos and applied the QIM technique for wa-termarking. Since watermarking performed on the compressed do-main, we satisfy real-time requirements. Since we embedded thewatermark to full-frame DCT coefficients that were not affectedby changing framerates, downscaling, and transcoding, our videowatermarking algorithm could satisfy robustness requirements. Weimplemented a prototype system on an Intel architecture computerand showed by simulation that our algorithm could satisfy real-time and robustness requirements.

Our major contribution is that we have proposed a practi-cal real-time video watermarking for HD videos that is robustagainst common video processing in practice such as downscalingto arbitrary sizes, changing framerates, transcoding to other videoformats, and processing combining these technologies. Our videowatermarking algorithm is blind watermarking not using originalvideos. The drawback is the system’s weakness against other ge-ometric distortions such as rotation, random bending, and so on.We believe that all of the techniques described in this paper canbe applied to other video watermarking applications.

Acknowledgments

This research was supported by WCU (World Class University)program (Project No. R31-30007) and NRL (National Research Lab)program (No. R0A-2007-000-20023-0) under the National ResearchFoundation of Korea and funded by the Ministry of Education, Sci-ence and Technology of Korea.

References

[1] I. Cox, J. Kilan, T. Leighton, T. Shamoon, Secure spread spectrum watermarkingfor multimedia, IEEE Trans. Image Process. 6 (12) (1997) 1673–1687.

[2] F. Hartung, B. Girod, Watermarking of uncompressed and compressed video,Signal Process. 66 (3) (1998) 283–301.

[3] G. Doerr, J.-L. Dugelay, A guide tour of video watermarking, Signal Process.:Image Comm. 18 (4) (2003) 263–282.

[4] H. Huang, Y. Lin, W. Hsu, Robust technique for watermark embedding in avideo stream based on a block-matching algorithm, Opt. Enging. 47 (3) (2008)037402.

[5] A. Koz, A.A. Alatan, Oblivious spatio-temporal watermarking of digital videoby exploiting the human visual system, IEEE Trans. Circuits Syst. VideoTechn. 18 (3) (2008) 326–337.

[6] P.W. Chan, M.R. Lyu, R.T. Chin, A novel scheme for hybrid digital video water-marking: approach, evaluation and experimentation, IEEE Trans. Circuits Syst.Video Techn. 15 (12) (2005) 1638–1649.

[7] G. Langelaar, R. Lagendijk, Optimal differential energy watermarking of DCTencoded images and video, IEEE Trans. Image Process. 10 (1) (2001) 148–158.

[8] M. Noorkami, R.M. Mersereau, Digital video watermarking in p-frames withcontrolled video bit-rate increase, IEEE Trans. Inform. Forensics Security 3 (3)(2008) 441–455.

[9] Y. Wang, A. Pearmain, Blind MPEG-2 video watermarking robust against ge-ometric attacks: a set of approaches in DCT domain, IEEE Trans. Image Pro-cess. 15 (6) (2006) 1536–1543.

[10] C.-S. Lu, J.-R. Chen, K.-C. Fan, Real-time frame-dependent video watermarkingin VLC domain, Signal Process.: Image Comm. 20 (7) (2005) 624–642.

[11] D. Ye, C. Zou, Y. Dai, Z. Wang, A new adaptive watermarking for real-time MPEGvideos, Appl. Math. Comput. 185 (2) (2007) 907–918.

[12] R. Rafael, C. Gonzalez, Digital Image Processing, Addison–Wesley PublishingCompany, 1992.

[13] J. Jiang, G. Feng, The spatial relationship of DCT coefficients between a blockand its sub-blocks, IEEE Trans. Signal Process. 50 (5) (2002) 1160–1169.

[14] S. Kotz, T. Kozubowski, K. Podgorski, The Laplace Distribution and Generaliza-tions, Birkhäuser, Boston, 2001.

[15] A. Calderbank, G.D. Formey Jr., A. Vardy, Minimal tail-biting trellises: the Golaycode and more, IEEE Trans. Inform. Theory 45 (5) (1999) 1435–1455.

[16] M. Atkins, R. Subramanism, PC software performance tuning, IEEE Com-puter 29 (9) (1996) 47–54.

[17] I.V.P. Analyzer, Intel Corporation, software available in online, http://developer.intel.com/vtune, 2000.

[18] J.F. Delaigle, C.D. Vleeschouwer, B. Macq, Watermarking algorithm based on ahuman visual model, Signal Process. 66 (3) (1998) 319–335.

[19] I.J. Cox, M.L. Miller, A review of watermarking and the importance of percep-tual modeling, in: Proceeding of SPIE, vol. 3016, 1997, pp. 92–99.

[20] H.-Y. Lee, H. Kim, H.-K. Lee, Robust image watermarking using local invariantfeatures, Opt. Enging. 45 (3) (2006) 037002.

Min-Jeong Lee received the B.S. degree in Com-puter Engineering from Kyungpook National Univer-sity, Republic of Korea, in 2006, and the Ph.D. degree(Integrated Master’s Ph.D. Program) in Computer Sci-ence from Korea Advanced Institute of Science andTechnology (KAIST), Republic of Korea, in 2011. Sheis currently working as a post-doctoral researcher atKAIST. Her research interests are focused on videowatermarking with particular attention to multimedia

forensics, and information security.

Dong-Hyuck Im received his B.S degree in Com-puter Science from Yonsei University, Korea, in 2001,and his M.S. and Ph.D. degrees in Computer Sciencefrom Korea Advanced Institute of Science and Tech-nology (KAIST), Korea, in 2006 and 2009, respectively.From 2001 to 2003, he was a researcher in Virtual-Tek, Korea. He is now with KT, Korea. His major in-terests are digital content protection including digitalwatermarking, digital rights management, conditional

access system.

Hae-Yeoun Lee received his M.S. and Ph.D. degreesin Computer Science from Korea Advanced Institute ofScience and Technology (KAIST), Korea, in 1997 and2006, respectively. From 2001 to 2006, he was withSatrec initiative, Korea. From 2006 to 2007, he was apost-doctoral researcher in Weill Medical College, Cor-nell University, USA. He is now with Kumoh NationalInstitute of Technology, Korea. His major interests aredigital watermarking, image processing, remote sens-

ing, and digital rights management.

Kyung-Su Kim received his B.S. degree in Com-puter Engineering from Inha University, Incheon, Re-public of Korea, in 2005, and his M.S. and Ph.D. de-grees, both in Computer Science from Korea AdvancedInstitute of Science and Technology (KAIST), Daejeon,Republic of Korea, in 2007 and 2010, respectively. Heis now with the Network Security Research Team,KT Network R&D Lab., Daejeon, Republic of Korea. Hisresearch interests include image/video watermarking

and fingerprinting, error concealment methods, information security, mul-timedia signal processing, multimedia communications, and network se-curity.

Heung-Kyu Lee received a B.S. degree in Elec-tronics Engineering from Seoul National University,Seoul, Korea, in 1978, and M.S. and Ph.D. degrees inComputer Science from Korea Advanced Institute ofScience and Technology (KAIST), Korea, in 1981 and1984, respectively. Since 1986 he has been a profes-sor in the Department of Computer Science, KAIST.His major interests are digital watermarking, digitalfingerprinting, and digital rights management.

http://developer.intel.com/vtune

http://developer.intel.com/vtune

real-time video watermarking system on the compressed...

Documents