statistical analysis of watermarking schemes for copyright … · 2013-09-05 · tracted. finally,...

Statistical Analysis of Watermarking Schemesfor Copyright Protection of Images

JUAN R. HERNANDEZ, STUDENT MEMBER, IEEE,

AND FERNANDO PEREZ-GONZALEZ, MEMBER, IEEE

In this paper, we address the problem of the performanceanalysis of image watermarking systems that do not require theavailability of the original image during ownership verification.We focus on a statistical approach to obtain models that canserve as a basis for the application of the decision theory tothe design of efficient detector structures. Special attention ispaid to the possible nonexistence of a statistical description ofthe original image. Different modeling approaches are proposedfor the cases when such a statistical characterization is knownand when it is not. Watermarks may encode a message, andthe performance of the watermarking system is evaluated usingas a measure the probability of false alarm, the probability ofdetection when the presence of the watermark is tested, and theprobability of error when the information that it carries is ex-tracted. Finally, the modeling techniques studied are applied to theanalysis of two watermarking schemes, one of them defined in thespatial domain, and the other in the direct cosine transform (DCT)domain. The theoretical results are contrasted with empiricaldata obtained through experimentation covering several cases ofinterest. We show how choosing an appropriate statistical modelfor the original image can lead to considerable improvements inperformance.

Keywords—Codes, copyright protection, cryptography, deci-sion-making, image communication, image processing, informa-tion theory.

I. INTRODUCTION

Recent years have witnessed a stunning proliferation oftechniques for representation, storage, and distribution ofdigital multimedia information. Nowadays, network designis oriented to digital data delivery, while content providersare rapidly transforming their archives to a digital format.Unfortunately, all these developments have also a seriousdrawback: digital copies can be made identical to theoriginal; moreover, it is fairly simple to manipulate or reusethe information in an unauthorized way. Final quality andimpracticality of massive copying were in the past keyfactors that put limits to any widespread distribution ofillegal copies; on the contrary, emerging networks allow

Manuscript received September 15, 1998; revised January 3, 1999.The authors are with Departamento de Tecnoloxıas das Communi-

cacions, University of Vigo, Vigo 36200 Spain.Publisher Item Identifier S 0018-9219(99)04949-X.

a fast and bulky dissemination with no loss of quality. Thiscreates new threats to copyright protection and puts thewhole creative process in danger.

Cryptography is an effective solution to the digital dis-tribution problem, but it has to be coupled with costlyand specialized hardware in order to preclude direct accessto data in digital format. However, most cryptographicprotocols are concerned with secure communications in-stead of ulterior copyright infringements. A good exampleare the cryptographic devices (set-top boxes) used foraccess control in digital television broadcasting [1]. There,the major goal lies in avoiding unauthorized customersto access programs that are being broadcast in scram-bled form [2], but once data have been (even legally)unscrambled it is quite simple to save them for furthermanipulation or dissemination. In other scenarios, such asthe Internet, the “network value” heavily depends on theexistence of relatively cheap and general-purpose hardwarethat eliminates a significant capital cost both to the userand the service provider, as explained by Metcalfe’s Law[3]. Consequently, there is an increasing need for softwarethat allows for protection of ownership rights. It is in thiscontext where watermarking techniques come to our help.

A digital watermark is a distinguishing piece of infor-mation that is adhered to the data that it is intended toprotect. There are two kinds of watermarks: perceptibleand imperceptible. For obvious reasons, the latter are moresuitable to become part of a digital copyright system. Im-perceptible watermarks obstruct illegal copying by ensuringthat ownership information is unnoticeably embedded intothe digital data. Considering that watermarking can be ap-plied to data of a very different nature, the imperceptibilityconstraint must be achieved by carefully taking into accountthe properties of the human senses. For instance, early workin the image watermarking field did not tackle adequatelythe perceptibility issue and embedded the watermark inthe least significant (the most “insignificant”) bits of theoriginal image [4]–[7], thus making it easy to remove oralter this additional information. On the other hand, morerecent methods consider the characteristics of the human

0018–9219/99$10.00 1999 IEEE

1142 PROCEEDINGS OF THE IEEE, VOL. 87, NO. 7, JULY 1999

visual systems (e.g., low sensitivity to edge changes) toenhance the robustness of the watermark.

The previous example leads us to the robustness issue[8]. In addition to imperceptibility, there are some desirablefeatures that a watermark should have. First, it shouldbe resilient to standard manipulations (e.g., MUSICAMcompression in the case of audio signals [9]) as wellas intentional manipulations (e.g., watermark-removal pro-grams discussed below). Second, it should be statisticallyunremovable (or better yet, undetectable), which means thata statistical analysis of different pieces of data watermarkedby the same provider should not lead to any gain fromthe attacker point of view. Finally, the watermark shouldwithstand multiple watermarking to facilitate the trackingof the subsequent transactions to which an image is sub-ject. Note that for different applications, the watermarkshould resist quite different types of manipulations. Anexample is photocopying, which is a possible attack fordocument images, but almost useless when thinking of colorimages. Furthermore, the computational power requiredby an attacker will strongly depend on the application;watermark removal in a digital video sequence wouldbe much more expensive than for a still image. Ideally,watermark destruction attacks should also affect the originaldata in a similar way; however, to what extent ownershipshould be preserved after data is severely distorted is adifficult question.

Watermarking, like cryptography, needs secret keys thatmap rights to owners. However, in most applications,embedment of additional information is required. Thishidden information may consist, among others, in owner,distributor, or recipient identifiers, transaction dates, serialnumbers, etc., that may become vital when tracking someillegal distribution. In some cases, a trusted authority thatissues certificates is needed [10]. For instance, signed timestamps could be imperceptibly hidden in the original datato prevent counterfeiting based on successive watermarks[11], [12]. In fact, the issue of data hiding (also calledsteganography) leads to the problem of correctly extracting(decoding) this information once in possession of the secretkey. In most cases, there will be a certain probabilityof error for the extracted information. This probabilityof error can be used as a measure of the performanceof a watermarking system. Note that this probability willincrease with the number of bits in the hidden message,thus imposing a limit on the length of the secret messageone wants to convey.

In addition to the data-hiding problem there is the de-tection problem, in which it is tested whether data werewatermarked with a certain key, therefore serving forownership determination purposes. The detection problemproduces a binary answer: data were (or were not) water-marked with a given key. Consequently, the problem canbe formulated as a statistical hypothesis test, for which aprobability of false alarm (i.e., of deciding that a given keywas used while it was not) and a probability of detection(i.e., of correctly deciding that a given key was used) can bedefined as quality measures. More precisely, one would fix

a certain value of the probability of false alarm and thenevaluate if the system gives an acceptable probability ofdetection. Note that the probability of false alarm should bekept to an extremely low value if the watermarking systemis to be used for commercial purposes, since the existenceof “false positives” would undermine its credibility.

In this paper we will concentrate on watermarking ofstill images, which is the case that has generated thelargest amount of research in the field. However, it isinteresting to point out that watermarking has been alsoapplied to other types of data, such as document images[13]–[20], audio signals [21], [22], video signals [23]–[31],three-dimensional (3-D) objects [32], and even software orhardware [33], [34]. Many image watermarking methodshave mushroomed over the past years, even with com-mercial products available or in preparation (e.g., NEC,Sony, Hitachi, or Kodak). Although some of them needknowledge of the original image [35], [36], we will assumethroughout this paper that this is not available duringthe watermark extraction and detection processes. Whilesuch knowledge would greatly simplify the former tasks,especially if the watermarked image has suffered commongeometric distortions, any Internet-oriented long-term prod-uct should take it as unacceptable, provided that resorting tothe originals would be unmanageable when huge quantitiesof images need be compared by intelligent agents searchingthe net for unauthorized copies. Then, the detection andextraction tasks will have to be performed without accessto the source image.

While cryptographic protocols have achieved the desiredsecurity level that expedites their use in electronic com-merce applications [37], this cannot be said yet for water-marking systems. Recent advances in watermark-removalprograms, such as unZign and StirMark [38] and others[39], [40] that have succeeded in washing the watermarkaway with little impact on the perceptibility constraint, evenfor commercial systems, are quite discouraging but willfoster new research in the field. Actually, most methodsthat do not require knowledge of the original image arenot robust to simple geometric transformations or cropping.This problem, also known as the synchronization problem,is one of the current salient points in image watermarkingthat will deserve much attention in the near future.

First stages in the development of watermarking tech-niques have produced an impressive amount of algorithms,even though in most cases no theoretical limits to theirperformance were given. We believe that such a theoreticalapproach is the only way to turn digital copyright protectioninto a mature discipline, at a level comparable to otherbranches of communications and cryptography. In thispaper we will show how a careful modeling of the problemcan help not only to assess the performance of the variousmethods but also to considerably improve them. We willfocus on the study of watermarking systems through the ap-plication of statistical analysis techniques, and we will usestatistical decision theory to derive the detection structuresinvolved when ownership of an image must be verified.The reasons why a statistical approach is convenient are

HERNANDEZ AND PEREZ-GONZALEZ: STATISTICAL ANALYSIS OF WATERMARKING SCHEMES 1143

Fig. 1. General model of a watermarking system.

manyfold. First, through a statistical formulation of theproblem performance measures can be defined to studyrigorously to which extent a watermarking technique can beapplied as a copyright protection mechanism. Dependenceof the achievable level of protection on the characteristicsof images can also be analyzed with such an approach.Furthermore, with this kind of analysis it is possible toobtain efficient detectors that optimize the performance andto assign appropriate values to system parameters so that acertain level of performance is guaranteed.

In Section II, we present a general model of a wa-termarking system and some definitions of concepts thatwill be revisited throughout the paper. In Section III, weparticularize the general model to watermarking techniquesbased on the addition of a spread spectrum signal carryingsome information. The rest of the paper refers to this kindof watermarks. In Section IV, we formulate statisticallythe problems of watermark detection and extraction of theinformation carried by the watermark, when a statisticaldescription of the original image is possible. In Section V,we formulate the same problems when such a statisticalcharacterization of the original image is unknown. Finally,in Sections VIII and IX, we apply the analytical techniquesdeveloped in Sections IV and V to the study of two water-marking techniques, one performed in the spatial domainand the other performed in the direct cosine transform(DCT) domain. The theoretical results derived in bothsections are contrasted with empirical data obtained throughexperiments performed with several test images.

II. GENERAL MODEL OF A WATERMARKING SYSTEM

In Fig. 1 we have represented in block diagram formthe general model of a watermarking system. A similarmodel was proposed in [41] in an information theoreticalcontext. From now on, variables in bold letters will repre-sent vectors whose elements will be referenced using thenotation . An image is transformed intoa watermarked version applying a watermarking function

that also takes as inputs a secret keyonly known tothe copyright owner and a messagetaken from a finitediscrete alphabet with elements. If the source outputis not watermarked, then it is left unaltered.

The watermarked version is delivered in place ofto the intended recipient. Then it can suffer unintentionaldistortions or attacks aimed at destroying the watermark in-formation. Note that even intentional attacks are performed

without any knowledge about or , since they are notpublicly available. The alterations suffered bycan be thusmodeled as a noisy channel whose inputand output arelinked by the conditional distribution .

Two tests are involved in the ownership verificationprocess. First, a watermark detectordecides whether theimage under test contains a watermark generated witha certain key . Hence, this detector takes as inputs theimage and the secret key and yields a Boolean outputwhich indicates the decision. If the watermark detectordecides that a watermark is present, then authorship bythe person who possesses the secret keyis provedand extraction of the hidden message can be performedafterwards. This task is accomplished by a watermarkdecoder , whose inputs are and . As a result, itoutputs an estimate of the hidden message. Note thatwe have assumed that both the watermark detector and thewatermark decoder have no access to the original image.

The watermark detector is characterized by two perfor-mance measures: the probability of false alarm andthe probability of detection . The former indicates theprobability of yielding a positive result in the watermarkdetection test when does not actually contain a watermarkgenerated from . The latter is the probability of gettinga positive result when the image does contain such awatermark. These two probabilities have been proposed forquality assessment in [42]. To express these probabilities inmathematical notation, let us denote the event “the imageis watermarked” by and the event “the image is notwatermarked” by . Then

(1)

(2)

The performance of the watermark decoder is given by theprobability of error , defined as the probability of gettinga wrong estimate

(3)

This is a measure of the overall performance of the wa-termarking system that is almost useless. Two conditionalerror probabilities are specially interesting for their applica-bility to the design of detectors and decoders. One of themis the probability of error conditioned to a given originalimage . The other is the probability of error conditionedto a certain key

(4)

(5)

The former indicates how appropriate an image is for datahiding and the latter expresses the average performanceseen by a copyright holder.

In the following sections we study a special kind ofwatermarking function based on the addition of a spreadspectrum signal. No explicit assumption will be made aboutthe domain where is defined. It could be just a spatial


Fig. 2. Generation of a spread spectrum watermark.

domain representation of the image luminance or any kindof representations obtained after applying transforms suchas the fast Fourier transform (FFT), DCT, Karhunen–Loevetransform (KLT), wavelet transform, etc.

III. A DDITIVE SPREAD SPECTRUM WATERMARKS

Additive spread spectrum watermarking systems, onwhich most of the proposed watermarking techniquesare based, constitute a special case of the general modeldepicted in Section II. They are inspired by the spreadspectrum modulation techniques employed for digitalcommunications in jamming environments [43], [44], sincedata hiding can be seen as a communication problem inwhich the original image plays the role of channel noiseand attackers may try to disrupt the transfer of information.In this kind of scheme a spread spectrum two-dimensionalsignal (the watermark) carrying some hidden informationis added to the original image.

The procedure followed to generate a watermark canbe summarized as follows (Fig. 2). Suppose that we wantto hide bits of information and has elements.First, a pseudorandom sequenceis generated using apseudonoise generator initialized to a state which dependson the value of . To guarantee invisibility, this sequenceis multiplied element by element by a perceptual mask

obtained after analyzing the original image employinga psychovisual model. This mask takes care of the factthat alterations performed to different elements ofhavedifferent influences on the overall perceptual distortion.Next, the set of indexes is partitioned intosubsets that we will denote by , satisfying, . Each of these sets represents the elements of

in which a certain information bit is going to be hidden.The partition can in general be key dependent to providean additional level of resilience to attacks directed againstspecific hidden information bits. Then, the final expressionof the watermark is

(6)

where is any index in , and is a coefficientused to encode theth bit of the hidden message. Finally,the watermarked image is obtained by adding the wa-termark to the original image. Using vector notation, thewatermark can be expressed as

(7)

where encodes the hidden message andis an matrix whose elements satisfy

if

otherwise.(8)

The columns of this matrix, , will hereafterbe called modulation pulses, since the watermark can beexpressed as a linear combination of them and can thusbe compared to multipulse amplitude modulation schemesused in communications [45], [46]. The resulting water-marked image is . Equation (6) indicates thatin the watermark generation process the message vector

, defined in a space with dimensions, ismapped in a pseudorandom fashion onto a space with manymore dimensions . This is what in communica-tions theory is called “spreading the spectrum.” The highdegree of redundancy introduced in this transformation andthe dependence of the mapping on the value of a secretkey, only known to the copyright owner, are the facts thatprovide the robustness necessary to resist both unintentionalalterations and malicious attacks.

IV. WATERMARK DETECTION AND DECODING

WITH KNOWN IMAGE STATISTICS

Given an image under test and a key , and assumingthat the watermarked image has not suffered neither attacksnor unintentional distortions, the watermark detection testcan be formulated as the binary hypothesis test

(9)

where and are images. The goal of the watermarkdetection test is to decide whether the imagecontains awatermark generated by the copyright holder who possessesthe key . It is not necessary to decode the hidden message.Therefore, must be regarded as a random vector with aprobability mass function (pmf) equal to the probabilitydistribution of the messages. Even though is known(it is the key under test), the matrix is a function of theoriginal image, which is supposed to be unknown. For eachpossible value of the message vectorthere is a uniquevalue of satisfying the equation . Thefunctional relationship between and is usually quitecomplex, so it is in practice very difficult to know the set ofpossible original images that could have been mapped onto


during the watermarking process. We can alternativelyassume that is approximately the same for all thepossible values of and substitute by .This is a reasonable approximation since the distortionsintroduced by the watermark are small, so they are expectedto produce negligible alterations in the perceptual mask.Then, the test in (9) is approximately equivalent to thebinary hypothesis test

(10)

and the dependence of on has disappeared. Letbe the decision made in the watermark detection

test (function in Fig. 1). The probability of false alarm,defined as , will be required tolie below a certain maximum value to guarantee that thewatermarking system is reliable. The design of the detectorstructure will be aimed at maximizing the probability ofdetection which corresponds to themaximum allowable .

In the watermark decoding test it is assumed thatdoescontain a watermark belonging to the copyright holder whopossesses the secret key under test. For this reason, thedecoding process is performed only if is decided inthe watermark detection test. The goal of the decoder is toobtain an estimate of the message vectorin such a waythat the probability of error is minimized. Following similararguments as in the discussion on the watermark detectiontest, the matrix can be approximated by .

Assuming that is fixed, there are two random vectorsin the statistical decision tests we have just formulated:

and . If the distribution is known, the optimumdecision tests are given by the Neyman–Pearson rule [47].In the watermark detection test, for example, the optimumdetector which maximizes the conditioned to a given

for any value of is given by the test

(11)

where is a threshold. If the messages are assumed tobe equiprobable, then , . Then, the optimumwatermark decoder which minimizes the probability of errorconditioned to is given by the maximum likelihoodstructure. The output of the decoder is therefore

(12)

Expressions (11) and (12) define the detector and decoderfunctions and , respectively, representedin Fig. 1.

Once these functions have been designed, it is interestingto study the influence of image characteristics in the per-formance. A measure of the goodness of an imageforwatermarking purposes can be obtained by conditioningthe probabilities , , and to and treating thesecret key and the message vectoras the only randomvariables in the system. In other words, we can regard agiven original image as a fixed deterministic vector and

model and the partition statistically. This leadsto a statistical model of the matrix , which can beused to compute the conditional probabilities of false alarmand detection

(13)

(14)

and the conditional probability of error

(15)

Given , indicates the proportion of keys that, afterbeing applied in the detection test performed to the originalimage , yield a positive result. The probabilitygives the proportion of keys that, after being applied inthe watermarking and detection processes, yield a positiveresult. The probability indicates the proportion ofkeys for which a decoding error occurs when applied in thewatermarking and decoding processes.

Usually the log-likelihood functions involved in the wa-termark detector and decoder can be expressed as a functionof a vector of sufficient statistics. In thiscase, the conditional probabilities , , andcan be evaluated by studying the conditional distributions

and .

V. WATERMARK DETECTION AND DECODING

WITH UNKNOWN IMAGE STATISTICS

In some cases the distribution may be unknownor difficult to approximate. For example, there are nosatisfactory statistical models for images in the spatialdomain. The approach discussed above for the design ofthe watermark detector and decoder is not applicable insuch a situation.

In Section III, we saw how in the watermark generationprocess the message vectorwas mapped onto a vectorwith a much higher number of dimensions. Therefore, it isreasonable to apply a transformation to reducethe number of dimensions of the image under testand thenanalyze the statistical decision problem in the transformedspace. One such a transformation is, for instance

(16)

which is nothing but the correlation receiver applied inspread spectrum communications [43], [44]. Expressed invector notation

(17)

In practice we will make some assumptions about thedistribution of to design . In some situations it ispossible to exploit statistical properties such as ergodicityand quasi-stationarity to obtain estimates of the first- andsecond-order moments of. In those cases, even though theexact shape of the distribution of the original image is notknown, at least means, variances, and cross covariancescan be approximated and this information can help to


Fig. 3. Dimensionality reduction and watermark detection.

improve substantially the performance associated with thedimensionality reduction transformation.

For instance, a key-independent linear minimum meansquare error (MMSE) estimate ofcan be computed beforeentering the correlation receiver. This filtering operationcan considerably reduce the noise contribution due to theoriginal image. Let be the mean of and let

be the covariance matrix of, defined as. If we assume that the watermarked image

does not suffer any alteration, then . As we saidbefore, is actually a function of since the perceptualmask is computed from it. We can, however, assume thatthe perceptual mask is approximately the same for all thepossible values of that can be mapped onto afterbeing watermarked with a certain pair, . Thus, we canassume that and are statistically independent. Underthis approximation, the optimum linear MMSE estimatorof is

(18)

where is the mean value of . Then, the number ofdimensions can be reduced applying the correlator receiveras we did before, obtaining as a result

(19)

Note that in practice will be estimated from sinceis not available in the watermark verification process. In

fact, a reasonable approximation is . We canalso estimate from the image under test, since a goodapproximation to the perceptual maskcan be obtainedfrom .

Once the dimensionality reduction function isdefined, it is possible to study the statistical properties ofits output for a fixed original image , assuming that istaken at random. Then we can make use of this statisticalcharacterization to design optimum detector and decoderstructures. In Fig. 3 we represent graphically the scenariocorresponding to the watermark detection process. Supposewe choose a certain original image. If it is watermarked(hypothesis ) taking some pair , at random, theresulting vector has a distribution that corresponds tothe transformation of the distribution of and when thefunction is applied. After passing

Fig. 4. Dimensionality reduction and watermark decoding.

through the dimensionality reduction function , theresulting observation vector will have some conditionaldistribution .

If is not watermarked (hypothesis ), the applicationof the dimensionality reduction function will result inan observation vector with a conditionaldistribution , which ideally can be evaluatedapplying the transformation to the distribution of

[note that the function is independent of ].Given an observed vector, the detector which maximizesthe conditional probability of detection for everyvalue of the conditional probability of false alarmis given by

(20)

where is a threshold. Even though the detector depends on, which is not available during the watermark verification

process, in practice only some “macroscopic” properties ofwill be required, and it will be possible to estimate them

from . Furthermore, certain families of functionsas, for example, those of the form ,allow the application of the central limit theorem to ap-proximate by a Gaussian distribution, since the elements

of the watermark are statistically independent ifismodeled as an independently identically distributed (i.i.d.)sequence. This kind of function appears, for instance, whenthe modulation pulses are sparsely spread and the imagesamples are assumed to be statistically independent if theyare not too close.

A similar approach can be undertaken for the design ofthe watermark decoder. In this case the scenario is repre-sented in Fig. 4. After watermarking with a given vectormessage and a random secret key , the observationvector has a distribution that can be evaluatedby applying the transformation tothe distribution of . Therefore, given an observed vector

, the optimum decoder which minimizes theconditional probability of error assuming that allcodewords have the samea priori probability is givenby the maximum likelihood (ML) decoder

(21)


In general, different decoders are associated with differentimages , since the probability density functions (pdf’s)are conditioned to . Even though the original image isassumed to be unknown during the decoding process, thedecoder will actually depend on a few image-dependentparameters that can be estimated from the imageundertest. The same kind of approximations resulting from theapplication of the central limit theorem can be made forcertain functions as when we talked about thewatermark detector. It is even possible that exploiting thiskind of approximations a decoder independent ofresults(see Section VIII).

VI. THE IMPACT OF DISTORTIONS AND ATTACKS

In Sections IV and V, we have assumed that the wa-termarked image did not suffer any alteration duringdistribution. However, in practice the watermarked imagemay be altered either on purpose or accidentally by linearfiltering distortions, cropping, scaling, rotations, etc., andthe watermarking system should still be able to detect andextract the watermark. The distortions are limited to thosenot producing excessive degradations, since otherwise theimage would become unusable. Distortions and attacks in-troduce an additional transformation between watermarkingand verification that changes the statistical distributions of

involved in the watermark detection and decoding tests.As a consequence, the performance of these tests can bedegraded. Given a watermarking system with a certainstructure, the goal of an attacker is to alter the image in sucha way that it is not severely distorted and the distributionof is transformed so that the probability of detection isdecreased. The main obstacle the attacker must deal withis the uncertainty about the value of the secret key used bythe copyright owner.

The ideal solution against attacks is the application ofrobust statistical decision theory [48], [49]. Instead ofderiving the optimum watermark detector and decoder forcertain distributions of the observation vector, robustdetection, and decoding devices are designed to maximizethe worst-case performance, associated with the worst-case attack. Robustness criteria can also be applied to thewatermarking system as a whole, searching simultaneouslythe watermarking function and the watermark decoderand detector (Fig. 1) such that the worst-case performanceis optimized. However, it is difficult to model all thepossible attacks that can appear in a practical situation. Forthis reason, the application of robust statistical theory to thedesign of watermarking schemes is an ambitious task thatcan hardly provide useful results.

The most harmful attack against an image watermarkingsystem based on additive spread spectrum watermarks isthat consisting in geometrical transformations such as scal-ings and rotations in the spatial domain. The sensitivity ofthe watermark detector to this kind of manipulation is due tothe white nature of the pseudorandom sequence. The reasonis that with such a sequence, a slight mismatch betweenthe modulation pulses generated during the verification

test and the ones actually present in the image producesdrastic degradations in the distribution ofconditioned to

, so that it can even become indistinguishable from thedistribution of conditioned to . A promising approachto achieve robustness against geometrical distortions is theuse of transforms invariant to rotations and scalings [50].

VII. ERROR-PROTECTION CODES

The performance of the watermark decoder can be im-proved if channel codes are used to encode the hiddenmessages carried by the watermark. Letbe the number ofelements of and the message vectors asso-ciated with the possible messages that can be encodedby the watermark employing the generation mechanismexplained in Section III. The watermarks that result aftermultiplying each of these vectors by the matrixcan beseen as points in the-dimensional space . A channelencoder basically transforms these points into a different setof points in such a way that the distances between any twoof them is increased [51]. Placing the watermarks fartherfrom each other in the space helps to reduce the probabilityof error in the watermark decoding stage. In fact, thewatermark generation procedure exposed in Section III canbe seen as an encoding scheme that places the codewordsin the subspace spanned by the vectors .

One of the problems that image watermarking has todeal with is the extremely low signal-to-noise ratio (SNR)in each dimension of the -dimensional space. In otherwords, the power of the original image, which is unknownto the decoder, is much stronger than the power of thewatermark. As a consequence, an acceptable probabilityof error in watermark decoding can be achieved only byadding a large amount of redundancy during the encodingprocess. For this reason, the number of hidden message bitsthat can be embedded into an image is limited, dependingon the size of the image () and its power.

Ideally, channel codes should be designed with as manydegrees of freedom as dimensions are available. Unfortu-nately, this is a difficult task considering the low SNR perdimension that usually appears. A practical approach thatcan be undertaken is to use a block code or a convolutionalcode and increase the number of pulses so that the messagelength is left the same.

Channel codes have been successfully used in imagewatermarking systems [50]. The use of Bose ChaudhuriHocquenghem (BCH) and Golay codes [51], [52] in thecontext of spatial-domain image watermarking and theinfluence of parameters such as the redundancy and theminimum distance in the performance of the watermarkdecoder as well as the watermark detection test are studiedin [53]. Promising results, not published, have also beenobtained for convolutional codes.

VIII. SPATIAL DOMAIN WATERMARKING

Let us now apply the ideas exposed in Sections IV and Vto the analysis of spread spectrum watermarking of imagesdefined in the spatial domain. We will thus assume in this


section that represents the luminance component of adigitized image. Although we will concentrate our attentionon grey-level images for simplicity, our derivations canbe easily extended to color images by including in theimage mathematical model three vectors, each associatedwith one of the three components in a luminance and colordifferences representation.

Unfortunately, there are no statistical distributions suit-able for modeling the luminance component of common im-ages in the spatial domain [54]. Without a satisfactory statis-tical model for the original image, we cannot apply the de-cision theory as described in Section IV to the design of theoptimal watermark detector and decoder structures that op-timize the performance conditioned to each value of the se-cret key (the performance that each copyright owner sees).

However, we can project the image under test ontoa subspace and apply the design techniques presentedin Section V. An interesting candidate among the trans-formations that can be used to reduce the number ofdimensions is the correlation receiver alreadydiscussed in Section V. The correlation receiver providessufficient statistics for both the watermark detection andthe watermark decoding problems when the watermarkis immersed in zero-mean white Gaussian noise. For thisreason, this is a reasonably good choice.

Images found in practical situations are nonstationaryin the spatial domain since the broad range of objectsthat can be represented in the same image may result inconsiderable variations in statistical properties of luminancesamples along the image. Nevertheless, in most cases theaforementioned statistical properties do not substantiallydiffer in adjacent pixels. In other words, the imagecan beapproximated as a quasi-stationary random process. If wealso assume that it is ergodic, then we can estimate the firstand second moments of the original image at each pixel bycomputing averages in block neighborhoods. As we saidbefore, the original image is not available in the copyrightverification process, so these statistics must be calculatedfrom the image intended to be tested.

As discussed in Section V, knowledge about the first- andsecond-order moments of the original image can be used toimprove the performance of detection and decoding whendone in the projection subspace, even if this knowledge isan approximation to the actual values. A Wiener filter, forinstance, can be used to obtain a linear MMSE estimateof the watermark. This estimate eliminates part of theoriginal image component before the projection process,thus improving the achievable performance in detection anddecoding.

We will also assume that the watermarked imagemay have been distorted by a linear filter, either as aconsequence of alterations occurring during distribution oras a result of attacks aimed at destroying or corrupting thewatermark. This filter can be combined with the Wienerfilter discussed above to form an equivalent linear systemrepresented by an matrix that will be called .Therefore, we will find the signal atthe input of the correlation receiver. After being projected,

we obtain the vector

(22)

Recall that is the only random element in our model.Hence, is deterministic and the matrix is random.The coefficients of the Wiener filter are computed fromthe image under test and depend, therefore, on the value of

. This fact implies that the filter is actually randomand statistically dependent on. However, considering thesmall alterations that watermarks produce, the variabilityexperienced by the filter coefficients can be expected to besmall. Thus, it is legitimate to assume thatis determin-istic and independent of . After this approximation, onlymatrix is random in (22).

To obtain a statistical characterization of, we need todefine a model for the matrix . As indicated in Section III,the columns of this matrix are the pulses thatcompose the watermark. Since these pulses are nonover-lapping, each row of has only one nonzero element forevery value of . Furthermore, if we take theth row, thevalue of the nonzero element is . The pseudorandomsequence is key dependent and thus must be treated as arandom vector. In order to simplify the discussion, we willmodel it as outcomes of an i.i.d. random process with adiscrete marginal distribution with two equiprobable levels{ 1, 1}. The results given below can be straightforwardlyextended to any kind of marginal distribution.

We will assume that the pulses are sparselyscattered over the whole image in a key-dependent pseu-dorandom fashion to provide diversity that strengthens therobustness to attacks directed against particular bits. Wealso assume that the watermark covers all the pixels of theimage. Then, the sets constitute a partition of theset , so each image pixel is assigned to only oneof those sets. This pixel assignment mechanism must bemodeled as a random procedure since it is key dependent.We will assume that every index is in-cluded in any of the sets with the same probability1/ and that the assignment is performed independentlyfor each index. As a consequence, for every value of,any row of has one and only one nonzero element, whichbelongs to any of the columns with probability1/ . Under these assumptions, and after some algebraicmanipulations, the first- and second-order moments of theelements of conditioned to an original image and amessage vector can be shown to be [45], [55]

(23)

Var

(24)


Cov

(25)

where are the elements of and . Inpractical situations the cross-covariance terms are negligiblecompared to the terms in the diagonal of the covariancematrix. Hence, we can assume that the elements ofareapproximately uncorrelated.

Since every modulation pulse is sparsely spread out overthe whole image, if the kernel of the filter for everyelement is small compared to the image size, i.e., ifevery row of has only a few nonzero elements, then theelements of can be expressed as a sum of statisticallyindependent terms. The number of terms summed up,on average, is in practice large since a high level ofredundancy is necessary if an acceptable performance isdesired. Hence, we can apply the central limit theorem andassume that is approximately Gaussian.

Thus we have come up with a Gaussian model for theobserved vector that can be exploited to obtain detectorand decoder structures.

A. Watermark Decoder

The optimum ML watermark decoder, derived inSection V, is given by (21), where is the distri-bution at which we have just arrived. If we observe (23)and (24), assuming that ,and that the covariance matrix is approximately diagonal,we can infer that the observation vectorcan be modeledas the output of an additive white Gaussian noise (AWGN)channel, , , where

(26)

and are samples of an i.i.d. zero-mean Gauss-ian random process with variance

(27)

We know from communication theory that the optimumML decoder for an AWGN channel seeks the messagevector closest to the observation vectorin the Euclideandistance sense. Therefore, this decoder structure minimizesthe probability of error conditioned to the original image.In other words, given some original image, this detectorminimizes the chances that the key under test yields anerror while extracting the hidden message.

When a binary antipodal constellation is used to encodepossible messages, i.e., when all possible com-

binations of elements taken from {1, 1} are valid

message vectors, the minimum Euclidean distance decoderis equivalent to a bit-by-bit hard decisor with the decisionthreshold located at the origin. Then, the output of thedecoder is

sign (28)

and the probability of making an error when decoding a bit[also known as bit error rate (BER)] is1

(29)

which can be easily computed from the channel parameters.

B. Watermark Detector

The optimum watermark detector, whose structure hasbeen already derived in Section V, is given by (20). Anequivalent expression for this Neyman–Pearson test is

(30)

where is the Gaussian pdf that has just beenderived in the analysis of the ML watermark decoder. Inorder to obtain the pdf of under hypothesis , i.e., when

is not watermarked, let us suppose thatinstead of istested. Then, the observation vector is

(31)

In this case, it can be easily shown thatis zero mean,white, and the variance of its elements is

(32)

Every element of can still be expressed as a sum ofindependent random variables. Therefore, under hypothesis

, the observation vector can be accurately approximatedby a zero-mean white Gaussian vector with variance givenby (32).

In Fig. 5 we have represented graphically an exampleof the Gaussian distributions conditioned to the hypothesis

and when and there are four equiprobablemessages. Given an observation, the watermark detectormust decide to which of these distributionsbelongs. Sup-pose that pulses, e.g., , are modulated bymessage-independent known coefficients. These reservedpulses can be used to improve the performance of thewatermark detection process since the uncertainty about thepossible message vectors that the watermark may carry isthus reduced. Assume also that the remaining pulses aremodulated by coefficients in a binary antipodal constellationwith dimensions (hence, ). Then, thelogarithmic function in (30), which will be denoted by ,

1Q(x)�= (1=

p2�) 1

x e�t =2 dt.


(a)

(b)

Fig. 5. PDF’s involved in the watermark detection problem.

can be expressed as [45]

(33)

The probability of false alarm indicates the chancethat, given a certain nonwatermarked image, the keyunder test yields a positive result in the watermark detectiontest when applied to . On the other hand, the probabilityof detection indicates the chance that, given acertain original image , the secret key under test yieldsa positive result in the watermark detection test whenapplied to watermark and to detect the presence of thewatermark. Although there is no close form expression forthe probabilities of false alarm and detection conditioned tothe original image as a function of the threshold, it ispossible, however, to obtain Chernoff bounds [47]. Thesebounds, as well as approximations, were derived in [45].

C. Attacks

Attacks suffered by the watermarked imageaffect thechannel parameters and , thus altering the achievableperformance in both the watermark detection and decodingtests. Linear filtering attacks are already included in themodel we have assumed when we began the analysis. Theirimpact in performance can be studied by assigning valuesto the coefficients of the matrix . The effect of attacks inwhich the image is cropped, so that some watermark energyis lost, can be studied by taking out from summations in(26) and (27) those terms whose indexcorresponds topixels that do not survive the attack. Undoubtedly, the mostharmful kind of attack is that consisting in geometricaltransformations of the image in the spatial domain [56].If the watermarked image is either scaled or rotated, there

will be a mismatch between the modulation pulses gener-ated during the projection process and the pulses that areactually in the image under test. Given the white natureof the pseudorandom sequence from which these pulsesare generated, we can expect a rapid degradation of theequivalent channel parameters in terms of the signal to noiseratio (defined as ) as we increment or decrementthe rotation angle or the scaling factor.

A countermeasure to weaken the effect of attacks basedon geometrical transformations is the design of a spatialsynchronization algorithm that estimates the transformationsuffered by the image. Since the original image is notavailable, only knowledge about the watermark can beexploited to recover the size and orientation that it hadbefore the attack. Let be a vector of parameters definingthe geometrical transformation that was performed by theattacker. An estimate of can be obtained if we search thevector of parameters that, after being applied to transformthe modulation pulses locally generated in the projectionprocess, maximizes the log-likelihood function in (30). Inmathematical notation [45]

(34)

However, this exhaustive search technique is impracticaldue to the narrowness of the peak of the log-likelihoodfunction, which is a consequence of the white nature of thewatermark. Resilience to scaling and rotation distortions isstill in fact a challenging problem in image watermarking.

D. Experimental Results

To verify the validity of the model described in previ-ous sections, analytical results have been contrasted withempirical data obtained through experimentation. We haveused five test images, shown in Fig. 6. These images werechosen for their different characteristics in terms of flatareas, noisy textures, etc. In Fig. 7 we show an example ofa watermark and a watermarked version for “Lena.”

The perceptual model we have chosen for the experimen-tal work in spatial domain watermarking is described in[54], [57], and [58] and exploits the spatial masking prop-erties of the human visual system (HVS). The perceptualmask , obtained after the analysis of the original image,indicates the maximum allowable standard deviation ofnoisy alterations at each pixel. The pseudorandom sequence

is assumed to have a discrete marginal distribution withtwo levels, { 1, 1}. This means that has unit variance atevery pixel, so if it is multiplied element by element bythe invisibility constraint will be satisfied.

In our experiments, a spatially variant Wiener filter isused before the correlation receiver to eliminate part ofthe noise contribution due to the original image. Thecoefficients of this filter are computed using (18) under theassumption that the original image is white, i.e., thatis a diagonal matrix. Let denote the matrix associatedwith the Wiener filtering operation. If we assume that thewatermarked image has not suffered any alteration, then


(a) (b)

(c) (d)

(e)Fig. 6. Original images used in the experiments.

since and are uncorrelated. Hence,the Wiener filter can be expressed as , whosecoefficients are

if

otherwise(35)

where is the variance of , which can be estimatedfrom the pixels in a block neighborhood around. Themean vector in (18) can be similarly estimated from.

In Figs. 8–10 we show plots of the BER associated to thewatermark decoder derived in Section VIII-A as a functionof the average size of the modulation pulses. In all casesthe empirical measures have been obtained by taking 100

keys at random. The first figure represents the BER whenthe watermarked image is not altered during distribution.In Fig. 9 we have plotted the BER when an attacker addsto the watermarked image Gaussian noise whose varianceat each pixel is shaped by the perceptual maskso thatthe perceptual distortion is minimized. Fig. 10 shows theBER that results when the image is distorted by a Wienerfilter aimed at obtaining an estimate of the original imageso that the watermark is partially destroyed. We can seethat the analytical approximations are reasonably close tothe empirical results. The dependence of the performanceon the image characteristics is also evident from the plots.The shift of theoretical curves with respect to the empirical


(a) (b)

Fig. 7. Example of (a) a watermark and (b) a watermarked version for “Lena” (256� 256).

Fig. 8. BER without attacks.

data is due to the watermark-dependent nature of the Wienerfilter coefficients, which has not been statistically modeled.

We have also measured through experimentationthe performance of the watermark detector derived inSection VIII-B and contrasted the empirical data with theanalytical results. Measures have been obtained by taking400 keys at random. In all cases the watermark carries240 bits of hidden information. In Figs. 11–15 plots of thereceiver operating characteristic (ROC) are shown for allthe test images. The empirical curves actually represent theexperimentally measured versus the Chernoff boundfor the because the values of in the range ofthresholds in which begins to fall down are so smallthat they cannot be estimated through simulations. For afair comparison of the performance results, all the images

have been cropped down to a size of 128128 pixels.The curves show that the Chernoff bound provides a fairlygood approximation of the ROC. Comparing the figureswe can see that performance of the watermark detectiontest clearly depends on the characteristics of the imagecontents. Note for instance the difference between theROC of “Lena,” an image with many flat regions, and“Brick” or “b32,” both with more noisy textures.

IX. DCT DOMAIN WATERMARKING

In this section we will assume that the vectoris theDCT of the luminance component of the original image,applied in blocks of 8 8 pixels, as in the JPEG algorithm.Then, can be splitted into 64 vectors, each gathering


Fig. 9. BER with additive Gaussian noise attack.

Fig. 10. BER with Wiener filtering attack.

the elements that correspond to one of the 88 DCTcoefficients.

A. Statistical Model for the DCT Coefficients

The DCT coefficients of common images have inter-esting properties that can be exploited to obtain goodwatermark detectors and decoders. One of the most inter-esting characteristics of the DCT is energy compaction. In

fact, it has been proved that the DCT converges to theKarhunen–Lo´eve transform (KLT) for images that can bestatistically modeled as first-order Markov processes with acorrelation factor close to one [59]. As a consequence, wecan assume that the DCT coefficients are uncorrelated.

Another interesting property is that the histograms of theac coefficients can be better approximated than luminancesamples in the spatial domain by analytical expressions of


Fig. 11. ROC for “Lena” (256� 256), cropped down to 128� 128 pixels, withN = 240;

Ns = 0.

Fig. 12. ROC for “Tiger” (128� 128), with N = 240; Ns = 0.

known pdf’s. An early proposal as a statistical descriptionof the ac coefficients is the Gaussian model. However,more recent studies have shown that a more accurateapproximation is the generalized Gaussian pdf, given bythe expression [60]

(36)

where both and can be expressed as a function ofand the standard deviation

(37)

(38)


Fig. 13. ROC for “Brick” (160 � 200), cropped down to 128� 128 pixels, withN = 240;

Ns = 0.

Fig. 14. ROC for “b32” (160� 200), cropped down to 128� 128 pixels, withN = 240; Ns = 0.

Note that the Gaussian as well as the Laplace distribu-tions are just special cases of this pdf, given byand , respectively. It turns out that coefficients inthe low-frequency range are reasonably well modeled bya generalized Gaussian distribution with andsometimes by a Laplace distribution ( ). Coefficients

at high frequencies, however, are in many cases bettermodeled by a Laplace distribution and sometimes even bya Gaussian distribution [60]. What seems to be clear isthat the Gaussian model is not a good model for DCTcoefficients in most cases, especially at low and mediumfrequencies.


Fig. 15. ROC for “Fabric” (160� 200), cropped down to 128� 128 pixels, withN = 240;

Ns = 0.

We will assume that the DCT coefficients are statisticallyindependent, even though this property is not necessarilyimplied from the fact that the DCT coefficients are approxi-mately uncorrelated, considering that a non-Gaussian modelis more accurate as an approximation to their distribution.It will also be assumed that samples of DCT coefficientsat the same frequency, in different blocks, are statisticallyindependent. Thus, we will model the elements ofasso-ciated with the same DCT coefficient as outcomes from ani.i.d. random process with a generalized Gaussian marginalpdf. The parametersand , which completely specify thedistribution, can be different for each DCT coefficient. Wewill denote by and the values of such parameters forthe DCT coefficient to which the element belongs.

The statistical characterization of the DCT coefficientsof the original image is an invaluable help for the designof satisfactory watermark detectors and decoders in termsof performance. Since the original image is unknown, theparameters defining the distribution of its DCT coefficientsmust be estimated from the image under test. Given thesmall alterations produced during the watermarking processdue to the limitations imposed by the invisibility constraint,and considering that the watermark can also be modeledstatistically, fairly good estimates of the distribution pa-rameters can be obtained in practice.

B. Watermark Decoder

Let be the message vectors thatcorrespond to the possible hidden messages. Let us alsodefine the watermark obtained from each of these vectorsas , , whose elements will be

denoted by . The decoder that minimizesthe probability of error conditioned to a given value of thesecret key, assuming that all the codewordshave the samea priori probability (i.e., the ML decoder),is the one that seeks the message vectorsatisfying

(39)

Using (36) and (37), and given that the elements ofareassumed to be independent, this is equivalent to

(40)It can be shown that, assuming that the message vectorsverify , , ,the expressions

(41)

are sufficient statistics for the hidden information decodingproblem, and the ML decoder is

(42)

where . When message vectors form abinary antipodal constellation, in which all possible com-binations of elements in { 1, 1} are valid codewordsrepresenting different messages, the ML detectoris a bit-by-bit hard decisor

sign (43)


So finally we have found a transformation that, afterbeing applied to , drastically reduces the number ofdimensions without loosing any useful information that canhelp in the decoding problem. Now we can analyze theperformance of the proposed watermark decoder in termsof the probability of error conditioned to a given originalimage. For doing so, we have to switch back to a statisticalmodel in which the original image is fixed and deterministicand only the secret key is random.

Assuming that the watermarked image has not sufferedany alteration, then , . Afterplugging this expression into (41), and treating the termsand the sets as the only random elements in the system,we can compute first- and second-order moments and drawsome conclusions. Let us assume, without loss of generality,that . Then

(44)

Let us define the vector , where. If the pseudorandom sequence

is modeled as outcomes of an i.i.d. random process, aswe did in Section VIII, then the mean and variance ofconditioned to a certain partition are

(45)

VarVar

(46)

If the sets are sparsely scattered over the wholeimage, and the same statistical model as that used inSection VIII is applicable here, then it can be proved aftersome algebra that

(47)

VarVar

(48)

where we have used the relations andVar Var Var . If we assumethat the pseudorandom sequence has a uniform discretemarginal distribution with two levels, {1, 1}, as we didin Section VIII, then, by observing the definition ofwecan infer that

(49)

Var (50)

and we can substitute these expressions in (47) and (48) toobtain the mean and variance of the elements of the vector

conditioned to a given original image. It can also beproved that when , the expected value of isnegative, with amplitude given by (47). The variance ofin this case is exactly the same as that given by (48).

In the definition of the sufficient statistics we can seethat they can each be expressed as a sum of statisticallyindependent terms. Therefore, if is not too large, wecan apply the central limit theorem and approximate thedistribution of by a vector Gaussian pdf.

Let us define the SNR

SNRVar

(51)

Then, under the Gaussian approximation and assuming thatmessage vectors form a binary antipodal constellation with

points, so a bit-by-bit hard decisor is used, theprobability of bit error is

SNR (52)

C. Watermark Detector

Given a certain key , the detector that maximizes theprobability of detection for any desired probability of falsealarm is given by (11), repeated here

(53)

Assuming equiprobable messages and using the expressionof the generalized Gaussian pdf given in (36), this isequivalent to

(54)

where is the th message vector. Forsimplicity, we will concentrate on the case in which a“pure” watermark not carrying any hidden information isemployed. Then, under this assumption the log-likelihoodfunction is considerably simpler

(55)

Once we have derived the optimum structure based ona statistical characterization of the original image, wecan study the probabilities of false alarm and detectionconditioned to a given fixed original image, assumingthat the secret key is taken at random. The goal is toobtain estimates of the proportion of keys that produce afalse positive when the detection test is applied directlyto the original image and the proportion of keys thatyield a positive result when they are applied both in thewatermarking stage and the detection test. Let us firststudy the distribution of the log-likelihood function whenhypothesis is true. In this case, and assuming that the


image does not suffer any alteration, we have that ,. Hence has the form

(56)

which is a sum of statistically independent terms (isi.i.d.) and, applying the central limit theorem, can thus beapproximated by a Gaussian random variable. Assumingthat all the elements of the pseudorandom sequence havetwo equiprobable levels, {1, 1}, it can be easily shownthat the mean and variance of are

(57)

Var

(58)

When hypothesis is true, i.e., when ,the log-likelihood function has the form

(59)

which is also a sum of statistically independent terms.Hence is also approximately Gaussian under hypothesis

. If we compare this expression to (56), consideringthe fact that equiprobably, for any

, then we can infer that each term in the summa-tion may take the same two values as in (56), with oppositesign. Thus, the distribution of under is symmetricalto the distribution under with respect to the origin. Let

us define and Var . Then,under the Gaussian approximation, the probabilities of falsealarm and detection are

(60)

(61)

If we define the signal-to-noise ratio

SNR (62)

and we call the value such that, then the ROC is given by the expression

SNR (63)

which depends exclusively on the value of SNR. There-fore, this SNR can be used to compare the performance ofthe ML watermark detector for different images.

Fig. 16. DCT coefficients where the watermark is embedded.

D. Experimental Results

To contrast the analytical expressions with empiricalresults, we have performed experiments with two of theimages shown in Section VII-D, “Lena” and “Brick.” Theformer is a good representative of images with flat areasand sharp edges, while the latter is an example of imagescontaining noisy textures. In all the experiments the DCTcoefficients in middle frequencies shown in Fig. 16 havebeen altered, following the ideas presented in [61]–[63].

We assume that the perceptual mask determines themaximum amplitude distortion that each coefficient of theoriginal image may suffer while satisfying the invisibilityconstraint. A good psychovisual model in the DCT domain(with 8 8 blocks) is capital to render the sequence. Forthe work presented in this section we have followed themodel proposed in [64] and [65], similar to those proposedin [66] and [67], that has been also applied to deriveadaptive quantization matrices for the JPEG algorithm [68].This model has been here simplified by disregarding theso-called contrast-masking effect, for which the perceptualmask at a certain coefficient depends on the amplitudeof the coefficient itself. This effect has been taken intoaccount by other authors [69], [70]. On the other hand, thebackground intensity effect, for which the mask depends onthe magnitude of the dc coefficient (i.e., the background),has been taken into account. The watermark power obtainedfrom the application of this model has been further reducedby 12 dB to introduce a certain degree of conservativenessin the watermark due to those effects that have beenoverlooked (e.g., spatial masking in the frequency domain[54]). In Fig. 17 we show an example of a watermark anda watermarked version for “Lena.”

In the experiments, we have used the same value ofthe distribution parameter for all the DCT coefficients,leaving it as a system parameter. The variance of eachoriginal image coefficient is estimated from the water-marked image. In Figs. 18 and 19 we show plots of theBER for the two test images. Both empirical curves andanalytical approximations corresponding to four values of

are included in each plot. In all cases, empirical measureshave been performed by taking 100 different keys atrandom. We can see that the values computed using theanalytical expressions derived in Section IX-B are good


(a) (b)

Fig. 17. Example of (a) a watermark and (b) a watermarked version for “Lena” (256� 256).

Fig. 18. BER versus pulse size for “Lena” (256� 256).

approximations of the empirical values. Note that in bothcases the best performance is achieved with . Itis clear from the figures that by choosing an appropriatevalue of , performance can be substantially improved. Forexample, it is clear that in both images the decoder based onthe Gaussian model for the DCT coefficients of the originalimage considerably degrades the BER. This suggests thatthe correlation receiver used in [69] and [71], which isoptimum in the Gaussian case, is not a good candidate forwatermark detection purposes.

In Figs. 20 and 21 we show in more detail how theparameter influences the value of the SNR defined in(51). The curves have been computed using the theoreticalexpressions derived in Section IX-B and the points of the

curve corresponding to , (Laplace), and(Gaussian) have been circled. Note that the value

of achieving the optimal performance is different ineach image. While the maximum SNR lies somewherebetween and for “Lena,” it falls downto approximately for “Brick.” Again, there is apatent difference between the performance achieved withthe correlation receiver (Gaussian case, ) and themaximum of the curve.

We have also performed experiments with the two afore-mentioned images to measure the performance of the wa-termark detector test derived in Section IX-C. In all cases,empirical measures have been obtained by taking 1000 keysat random. The DCT coefficients in which the watermark


Fig. 19. BER versus pulse size for “Brick” (160� 200).

Fig. 20. SNR as a function ofc for “Lena” (256 � 256).

has been embedded are also the ones shown in Fig. 16.

In Table 1 we have gathered both theoretical values and

empirical measures of the signal to noise ratio SNR

defined in (62). As we know, this parameter completelydetermines the shape of the ROC, so it can be used as

a performance measure for comparison purposes. We also

show in Figs. 22–25 curves of the theoretical and empirical

and as a function of the threshold. We can see that

the analytical approximations derived in Section IX-C arequite accurate. The different levels of performance achieved


Fig. 21. SNR as a function ofc for “Brick” (160 � 200).

Table 1Empirical and Theoretical Signal-to-Noise Ratio SNR1 (in dB)

Fig. 22. Probability of false alarm with “Lena” (256� 256).


Fig. 23. Probability of detection with “Lena” (256� 256).

Fig. 24. Probability of false alarm with “Brick” (160� 200).

with different values of are also evident. We can see thatfor both images the Gaussian assumption leads to the worstperformance results.

X. CONCLUDING REMARKS

In this paper we have discussed the statistical analysisof image watermarking algorithms in which the original

image is not needed during the watermark detection andextraction processes. In this context, watermarking can beseen as a communication problem in which a signal carryingsome information is transmitted through a noisy channelwhere the noise is the original image itself, unknown tothe receiver. Watermark verification can be seen, hence,as a statistical decision problem involving two tests: first,


Fig. 25. Probability of detection with “Brick” (160� 200).

detect the very presence of the watermark, then estimatethe information it can optionally carry.

Special attention has been paid to the possible nonavail-ability of adequate statistical models for the original image(i.e., the channel noise). When such a characterizationis impossible, a design approach based on a statisticalanalysis conditioned to a given original image has beenproposed that allows us to derive efficient structures forwatermark detection and extraction. If a reasonably accuratemodel exists, we have shown that it can be exploited toderive detection structures that can considerably improvethe performance of the watermarking system. In both cases,a careful theoretical analysis of watermarking techniquesusing a statistical approach constitutes a rigorous basisfor the development of adequate embedding and detectionalgorithms.

In addition, a theoretical analysis is, in our opinion,vital in order to get a better understanding of the dif-ferent problems that arise in watermarking and to assessrigorously the suitability of different algorithms, consider-ing the performance requirements of copyright protectionapplications. With these ideas in mind, we have focusedon spread spectrum techniques, and we have analyzed howimage characteristics, different kinds of attacks, and systemparameters such as the length of the bit string carried by thewatermark influence the overall performance of the system.

There are still many open research problems in the fieldof watermarking for copyright protection. A theoreticalapproach to the study of watermarking techniques willproduce immediate benefits, as we have shown in this paper.

REFERENCES

[1] W. Ciciora, “Inside the set-top box,”IEEE Spectrum,vol. 32,pp. 70–75, Apr. 1995.

[2] B. Macq and J. Quisquater, “Cryptology for digital TV broad-casting,”Proc. IEEE,vol. 83, pp. 944–957, Feb. 1995.

[3] A. J. Viterbi, “Four laws of nature and society: The governingprinciples of digital wireless communications networks,” inWireless Communications. Signal Processing Perspectives,H.Poor and G. Wornell, Eds. Englewood Cliffs, NJ: Prentice-Hall, 1998.

[4] R. G. van Schyndel, A. Z. Tirkel, and C. F. Osborne, “A digitalwatermark,” inProc. IEEE Int. Conf. Image Processing,Austin,TX, 1994, pp. 86–89.

[5] R. B. Wolfgang and E. J. Delp, “A watermarking technique fordigital imagery: Further studies,” inProc. Int. Conf. ImagingScience,Las Vegas, NV, June/July 1997, pp. 279–287.

[6] , “A watermark for digital images,” inProc. 1996 Int.Conf. Image Processing,Lausanne, Switzerland, Sept. 1996,vol. 3, pp. 219–222.

[7] K. Matsui and K. Tanaka, “Video-steganography: How toembed a signature in a picture,” inProc. IMA IntellectualProperty, Jan. 1994, vol. 1, pp. 187–206.

[8] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Securespread spectrum watermarking for multimedia,”IEEE Trans.Image Processing,vol. 6, pp. 1673–1687, Dec. 1997.

[9] P. Noll, “Digital audio coding for visual communications,”Proc. IEEE,vol. 83, pp. 925–943, June 1995.

[10] J. Nechvatal, “Public key cryptography,” inContemporaryCryptology,G. Simmons, Ed. New York: IEEE Press, 1992.

[11] D. Bayer, S. Haber, and W. Stornetta, “Improving the efficiencyand reliability of digital time-stamping,” inSequences II: Meth-ods in Communications, Security and Computer Science.NewYork: Springer-Verlag, 1993, pp. 329–334.

[12] S. Haber and W. Stornetta, “How to time-stamp a digitaldocument,”J. Cryptology,vol. 3, no. 2, pp. 99–112, 1991.

[13] J. Brassil, S. Low, N. Maxemchuk, and L. O’Gorman, “Elec-tronic marking and identification techniques to discourage doc-ument copying,” inProc. Infocom’94,pp. 1278–1287.

[14] , “Electronic marking and identification techniques todiscourage document copying,”IEEE J. Selec. Areas Commun.,vol. 13, pp. 1495–1504, Oct. 1995.


[15] S. H. Low, N. F. Maxemchuk, J. T. Brassil, and L. O’Gorman,“Document marking and identification using both line and wordshifting,” in Proc. Infocom’95,Apr. 1995.

[16] J. Brassil, S. Low, N. F. Maxemchuk, and L. O’Gorman,“Hiding information in document images,” inProc. Conf.Information Sciences and Systems (CISS-95), Mar. 1995, JohnsHopkins Univ., Baltimore, MD, pp. 482–489.

[17] A. K. Choudhury, N. F. Maxemchuk, S. Paul, and H. G.Schulzrinne, “Copyright protection for electronic publishingover computer networks,”IEEE Network Mag.,vol. 9, pp.12–21, May/June 1995.

[18] N. F. Maxemchuk, “Electronic document distribution,”AT&TTech. J., vol. 73, no. 5, pp. 73–80, Sept./Oct. 1994.

[19] S. H. Low and N. F. Maxemchuk, “Performance comparisonof two text marking methods,”IEEE J. Select. Areas Commun.,vol. 16, pp. 561–572, May 1998.

[20] S. H. Low, N. F. Maxemchuk, and A. M. Lapone, “Documentidentification for copyright protection using centroid detection,”IEEE Trans. Commun.,vol. 46, pp. 372–383, Mar. 1998.

[21] L. Boney, A. H. Tewfik, and K. N. Hamdy, “Digital watermarksfor audio signals,” inEUSIPCO’96, VIII European Signal Proc.Conf., Trieste, Italy, Sept. 1996, pp. 1697–1700.

[22] C. Neubauer, J. Herre, and K. Brandenburg, “Continuoussteganographic data transmission using uncompressed audio,”in Proc. Information Hiding Workshop, Portland, OR, Apr.1998, pp. 208–217.

[23] M. D. Swanson, B. Zhu, B. Chau, and A. H. Tewfik, “Object-based transparent video watermarking,” inElectron. Proc. IEEESPS Workshop Multimedia Signal Processing,Princeton, NJ,June 1997.

[24] M. D. Swanson, B. Zhu, and A. H. Tewfik, “Multiresolutionscene-based video watermarking using perceptual models,”IEEE J. Select. Areas Commun.,vol. 16, pp. 540–550, May1998.

[25] F. Hartung and B. Girod, “Digital watermarking of raw andcompressed video,” inDigital Compression Technologies andSystems for Video Communications(SPIE Proceedings Series),vol. 2952, N. Ohta, Ed. San Diego, CA: SPIE, 1996, pp.205–213.

[26] , “Fast public-key watermarking of compressed video,” inProc. IEEE ICIP’97,Santa Barbara, CA, Oct. 1997, vol. I, pp.528–531.

[27] F. Hartung, P. Eisert, and B. Girod, “Digital watermarking ofMPEG-4 facial animation parameters,”Comput. & Graphics,vol. 22, no. 3, pp. 425–435, Aug. 1998.

[28] F. Hartung and B. Girod, “Copyright protection in videodelivery networks by watermarking of pre-compressedvideo,” in Multimedia Applications, Services and Tech-niques—ECMAST’97(Springer Lecture Notes in ComputerScience), vol. 1242, S. Fdida and M. Morganti, Eds.Heidelberg, Germany: Springer, 1997, pp. 423–436.

[29] , “Digital watermarking of MPEG-2 coded video in thebitstream domain,” inProc. ICASSP’97,Munich, Germany,Apr. 1997, pp. 2621–2624.

[30] G. C. Langelaar, R. L. Lagendijk, and J. Biemond, “Real-time labeling methods for MPEG compressed video,” inProc.18th Symp. Information Theory in the Benelux,Veldhoven, TheNetherlands, May 1996, pp. 25–32.

[31] T.-L. Wu and S. F. Wu, “Selective encryption and watermarkingof MPEG video,” submitted for publication.

[32] R. Ohbuchi, H. Masuda, and M. Aono, “Watermarking three-dimensional polygonal models through geometric and topologi-cal modifications,”IEEE J. Select. Areas Commun.,vol. 16, pp.551–560, May 1998.

[33] G. Bleumer, “Biometric yet privacy protecting person au-thentication,” in Proceedings Information Hiding: Second In-ternational Workshop,D. Aucsmith, Ed. Berlin, Germany:Springer-Verlag, 1998, pp. 101–112.

[34] T. Sander and C. F. Tschudin, “On software protection via func-tion hiding,” in Information Hiding: Second International Work-shop, D. Aucsmith, Ed. Berlin, Germany: Springer-Verlag,1998, pp. 113–125.

[35] I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon, “A secure,robust watermark for multimedia,” inInformation Hiding,G.Goos, J. Hartmanis, and J. Leeuwen, Eds. Berlin, Germany:Springer-Verlag, pp. 185–206.

[36] C. I. Podilchuk and W. Zeng, “Perceptual watermarking of stillimages,” in Electron. Proc. IEEE SPS Workshop MultimediaSignal Processing,Princeton, NJ, June 1997.

[37] D. Lynch and L. Lundquist,Digital Money: The New Era ofInternet Commerce. New York: Wiley, 1995.

[38] F. A. Petitcolas, R. J. Anderson, and M. G. Kuhn, “Attackson copyright marking systems,” inInformation Hiding: SecondInternational Workshop,D. Aucsmith, Ed. Berlin, Germany:Springer-Verlag, 1998, pp. 219–239.

[39] J.-P. M. Linnartz and M. van Dijk, “Analysis of the sensitivityattack against electronic watermarks in images,” inInforma-tion Hiding: Second International Workshop,D. Aucsmith, Ed.Berlin, Germany: Springer-Verlag, 1998,pp. 259–273.

[40] M. Maes, “Twin peaks: The histogram attack on fixeddepth image watermarks,” inInformation Hiding: SecondInternational Workshop,D. Aucsmith, Ed. Berlin, Germany:Springer-Verlag, 1998, pp. 291–306.

[41] J. R. Hernandez and F. Perez-Gonzalez, “Shedding more lighton image watermarks,” inInformation Hiding: Second In-ternational Workshop,D. Aucsmith, Ed. Berlin, Germany:Springer-Verlag, 1998, pp. 192–208.

[42] N. Nikolaidis and I. Pitas, “Robust image watermarking in thespatial domain,”Signal Processing,vol. 66, pp. 385–404, May1998.

[43] S. Glisic and B. Vucetic,Spread Spectrum CDMA for WirelessCommunications. Norwood, MA: Artech House, 1997.

[44] A. Viterbi, CDMA. Principles of Spread Spectrum Communica-tion. Reading, MA: Addison-Wesley, 1995.

[45] J. R. Hernandez, F. Perez-Gonzalez, J. M. Rodrıguez, and G.Nieto, “Performance analysis of a 2D-multipulse amplitudemodulation scheme for data hiding and watermarking of stillimages,”IEEE J. Select. Areas Commun.,vol. 16, pp. 510–524,May 1998.

[46] B. Sklar,Digital Communications. Fundamentals and Applica-tions. Englewood Cliffs, NJ: Prentice-Hall, 1988.

[47] H. L. V. Trees,Detection, Estimation and Modulation Theory.New York: Wiley, 1968, pt. I.

[48] S. A. Kassam and H. V. Poor, “Robust techniques for signalprocessing: A survey,”Proc. IEEE,vol. 73, pp. 433–481, Mar.1985.

[49] P. J. Huber,Robust Statistics. New York: Wiley, 1981.[50] J. J. K. O. Ruanaidh and T. Pun, “Rotation, scale and translation

invariant spread spectrum digital image watermarking,”SignalProcessing,vol. 66, pp. 303–318, May 1998.

[51] S. Wilson,Digital Modulation and Coding. Englewood Cliffs,NJ: Prentice-Hall, 1996.

[52] S. Lin and D. Costello,Error Control Coding: Fundamentalsand Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983.

[53] J. R. Hernandez, J. M. Rodr´ıguez, and F. Pérez-González,“Improving the performance of spatial watermarking of imagesusing channel coding,”Signal Processing, to be published.

[54] A. N. Netravali and B. G. Haskell,Digital Pictures. Repre-sentation, Compression and Standards.New York: Plenum,1995.

[55] J. R. Hernandez, F. Perez-Gonzalez, and J. M. Rodrıguez,“The impact of channel coding on the performance of spatialwatermarking for copyright protection,” inProc. ICASSP’98,Seattle, WA, May 1998, vol. 5, pp. 2973–2976.

[56] , “Coding and synchronization: A boost and a bottleneckfor the development of image watermarking,” inProc. COST254 Workshop Intelligent Communications,SSGRR, L’Aquila,Italia, June 1998, pp. 77–82.

[57] J. S. Lim, Two-Dimensional Signal and Image Processing.Englewood Cliffs, NJ: Prentice-Hall, 1990.

[58] G. L. Anderson and A. N. Netravali, “Image restoration basedon a subjective criterion,”IEEE Trans. Syst., Man., Cybern.,vol. SMC-6, pp. 845–853, Dec. 1976.

[59] R. Clarke, “Relation between the Karhunen–Loéve and cosinetransforms,”Proc. Inst. Elect. Eng.,vol. 128, pt. F, pp. 359–360,1981.

[60] R. J. Clarke,Transform Coding of Images.New York: Aca-demic, 1985.

[61] J. Zhao and E. Koch, “Embedding robust labels into images forcopyright protection,” inProc. Int. Congr. Intellectual Prop-erty Rights for Specialized Information, Knowledge and NewTechnologies,Vienna, Austria, Aug. 21–25, 1995, pp. 242–251.


[62] E. Koch, J. Rindfrey, and J. Zhao, “Copyright protection formultimedia data,” inDigital Media and Electronc Publishing.New York: Academic, 1996, pp. 203–213.

[63] J. J. K. O. Ruanaidh, W. J. Dowling, and F. M. Boland,“Watermarking digital images for copyright protection,” inIEEProc. Vision, Image and Signal Processing,Aug. 1996, vol. 143,no. 4, pp. 250–256.

[64] A. J. Ahumada and H. A. Peterson, “Luminance-model-basedDCT quantization for color image compression,” inHumanVision, Visual Processing, and Digital Display III (Proc. SPIE),1992.

[65] J. A. Solomon, A. B. Watson, and A. J. Ahumada, “Visibilityof DCT basis functions: Effects of contrast masking,” inProc.Data Compression Conf.,Snowbird, UT, 1994, pp. 361–370.

[66] C. I. Podilchuk and W. Zeng, “Image-adaptive watermarkingusing visual models,”IEEE J. Select. Areas Commun.,vol. 16,pp. 525–539, May 1998.

[67] J. Delaigle, C. D. Vleeschouwer, and B. Macq, “Watermarkingalgorithm based on a human visual system,”Signal Processing,vol. 66, pp. 319–336, May 1998.

[68] A. B. Watson, “Visual optimization of DCT quantization ma-trices for individual images,” inProc., AIAA Computing inAerospace 9,San Diego, CA, 1993, pp. 286–291.

[69] M. Barni, F. Bartolini, V. Capellini, and A. Piva, “A DCT-domain system for robust image watermarking,”Signal Pro-cessing,vol. 66, pp. 357–372, May 1998.

[70] A. Piva, M. Barni, F. Bartolini, and V. Cappellini, “DCT-basedwatermark recovering without resorting to the uncorruptedoriginal image,” in Proc. IEEE ICIP’97, Santa Barbara, CA,vol. I, Oct. 1997, pp. 520–523.

[71] J.-P. Linnartz, T. Kalker, and G. Depovere, “Modeling thefalse alarm and missed detection rate for electronic water-marks,” inInformation Hiding: Second International Workshop,D. Aucsmith, Ed. Berlin, Germany: Springer-Verlag, 1998, pp.330–344.

Juan R. Hernandez (Student Member, IEEE)received the Ingeniero de Telecomunicacion de-gree from the University of Vigo, Spain, in 1993,the M.S. degree in electrical engineering fromStanford University, Stanford, CA, in 1996, andthe Ph.D. degree in telecommunications engi-neering from the University of Vigo, Spain, in1998.

From 1993 to 1995, he was a member ofthe Department of Communication Technolo-gies, University of Vigo, where he worked on

hardware for digital signal processing and access control systems fordigital television. Since 1996, he has been a member of this samedepartment, where he is working as a Research Assistant. His researchinterests include digital communications and copyright protection inmultimedia.

Fernando Perez-Gonzalez (Member, IEEE)received the Ingeniero de Telecomunicaci´ondegree from the University of Santiago, Spain,in 1990 and the Ph.D. degree from theUniversity of Vigo, Spain, in 1993, both intelecommunications engineering.

He joined the faculty of the School ofTelecommunications Engineering, Universityof Vigo, as an Assistant Professor in 1990 andis currently an Associate Professor in the sameinstitution. He has visited the University of New

Mexico, Albuquerque, NM, for different periods spanning ten months.His research interests lie in the areas of digital communications, adaptivealgorithms, robust control, and copyright protection. He has been theProject Manager of different projects concerned with digital television,both for satellite and terrestrial broadcasting. He is coeditor of the bookIntelligent Methods in Signal Processing and Communications(Birkhauser,1997) and has been Guest Editor of a special section ofSignal Processingmagazine devoted to signal processing for communications.


statistical analysis of watermarking schemes for copyright … · 2013-09-05 · tracted. finally,...

Documents