chapter 3 digital watermarking schemes using...

53

CHAPTER 3

DIGITAL WATERMARKING SCHEMES USING DCT

DCT is an important transform that has proved a mark in audio and image watermarking. The

lower complexity compared to other transforms distinguish it and make it better than other

transforms like DST, DFT etc as far as watermarking and de correlating the data for

watermark embedding is concerned. The different schemes which embed the data in DCT

transform domain in principle differs in the no. of coefficients taken for embedding, the type

of coefficients i.e. low, high or middle, ac or dc coefficients, the methodology used for

embedding [51]-[53] and finding the coefficients for watermark embedding which produce

minimum distortion and maximum robustness.

Although, the DCT based watermarking schemes have low embedding

complexity but the use of low frequency coefficients or the DC coefficients as the

watermarking locations leads to less imperceptibility [51][95][97]. The use of selected

frequency coefficients and better embedding strategy can provide a good balance between

imperceptibility and robustness. Embedding watermark on a single coefficient may not

sustain robustness against attacks but statistical properties such as mean, Euclidian norm etc.

of a group of coefficients when used for data embedding has higher probability to show

robustness. Also, improvement on the robustness and the security can be brought by means

of attack characterization and using permuted version of the watermark rather than the

original watermark directly.

This chapter presents two blind digital audio watermarking schemes using

DCT for mono audios. In our schemes for mono audios, the watermark is embedded into the

selected group of mid band coefficients of the DCT transformed audio. The selection of the

mid band DCT coefficients for embedding inherently make the watermarked audio robust to

filtering attacks. The modified mean or Euclidian norm of the group of mid band DCT

coefficients are quantized for watermark embedding. The mid band frequency components

are selected through experiments and are the coefficients which correspond to the frequencies

between 5 KHz and 11 KHz. The effect of mp3 compression and other signal processing

operation have minimum effect on these coefficients which are evident from the results also.

Embedding of the watermark on the audios involve mainly the

preprocessing, block selection and DCT coefficients selection. The energy threshold for

block selection is decided through experiments and is adaptive to different audio signals. In

54

addition, the block size has been made as dynamic so that the intruder cannot identify the

block of embedding and combining multiple copies of the watermarked audios should not

eliminate the watermark.

A DCT based image watermarking scheme presented at the end of

the chapter also uses the mid band DCT coefficients for watermark embedding. Unlike, the

traditional DCT based image watermarking schemes where block DCT is applied directly

on images, our scheme transforms the image into a sub band image. The sub band image is

obtained through averaging and differencing of neighboring pixel value. On the sub band

image block DCT is applied and on each DCT block one watermark bit is embedded.

The rest of the chapter is organized as follows. Section 3.1 gives a

brief of the related audio watermarking approaches which uses DCT for embedding with

the problem statement. Section 3.2.1 and Section 3.2.2 present our proposed schemes.

Section 3.3 discusses the experimental result mainly the imperceptibility and the robustness

of the watermarked audio followed by the comparison with contemporary schemes in

section 3.4. In section 3.5, an image watermarking scheme using mid band DCT

coefficients is presented along with the test results followed by summary of the chapter in

the last section.

3.1 Problem Statement

In the past few years a number of audio watermarking schemes are evolved for copyright

protection. The DCT based approaches are said to be simple and effective as the

watermark embedding is done in transform domain. The DCT based schemes are useful as

they try to accumulate the energy in some of the coefficients and also minimize the

relationship between adjacent samples with in an audio and de correlates them.

The problem with these schemes is that they don’t consider the energy

of the individual blocks for watermark embedding. This is a problem since watermarked

data hidden on the low energy blocks is always susceptible to being removed altogether

through an attack without disturbing the cover audio. The low energy blocks when

watermarked produce more disturbances while hearing the watermarked audio also.

Analyzing the effect of common signal processing operation including mp3 conversion

onto the different DCT coefficients and the DCT blocks need to be done. Fixed sized DCT

blocks used for embedding are also prone to watermark estimation and collusion attack.

55

So, our proposed schemes use the minimum energy threshold of the individual

blocks for watermark embedding. Also, the mid frequency DCT coefficients are used for

embedding watermark bits in the individual block. The region of embedding is chosen as

the range of frequencies from 5 KHz to 11 KHz. The lower frequency region i.e. lower

than 5 KHz is a highly perceptual significant region and the high frequency region is not

that perceptual and thus susceptible to modification after common signal processing

operations like compression. The selection of the mid band maintains a balance between

robustness and imperceptibility. The number of bits embedded within one DCT block is

controlled through a key and thus meets the adjustability requirement..

3.2 Proposed Watermarking Schemes for Audio 3.2.1 Scheme 1: Using Quantization of the Modified Mean First scheme uses the quantization of the modified mean of the selected mid band DCT

coefficients as the embedding function. In the second scheme the embedding function

uses quantization [126] of the Euclidian norm of the group of selected mid band DCT

coefficients.

The original audio is segmented into multiple segments in such a way that every

segment is able to carry one watermark. Every segment is divided into frames of 20 ms to

25 ms i.e. variable block size using the key and the frames which pass the criteria of

minimum energy threshold are used for embedding bit(s) of watermark information.

The two dimensional watermark is converted into a bit of stream.

If the size of the image is M x N with M & N as the number of rows and

columns respectively then it is converted into a one dimensional bit stream of MxN as

follows.

Let W be the binary watermark thus produced

, ; , , , ---- (3.1)

With Len =length(W)=M x N

56

3.2.1.1 Watermark Embedding Method

Step 1: Audio signal is sampled at a rate of 44.1 kHz. Then partition the entire

samples into segments for carrying watermark bits.

X = {x (1).x (2)…………………………………….x(N))

//X is the entire audio signal with N samples and with ith individual samples

represented as x(i).

The segments are partitioned into sub segments of equal length L consisting of

frames. Each frame is of 20 to 25ms durations. The samples of all frames makes up

the overall sampled audio segment signal represented through the following equation:

St = {Ft (1), Ft (2),.................................................. Ft (n) } ----- (3.2)

//St is the tth segment to embed watermark bit (s) and total is the no of samples in it and Xi

corresponds to the ith frame.

Step 2: The rejection or selection of the frame depends upon the energy of the frame.

Let Fl denotes l th frame of the segment,

The average energy of the audio frame is calculated. For this step silence

portion of the audio are not considered.

€( Xl) = ( )ii

p

iAA *

1∑= ----- (3.3)

where p is the total no. of samples within the frame which varies with the key input Step 3: The frames Xl for which frame energy is greater than a threshold are selected

for watermark embedding.

Step 4: For watermark embedding on the lth selected sub segment i.e. frame Xl , p point1-D DCT is performed. D

l= ₣(Xl) ----- (3. 4) Where the p point 1 –D DCT of the frame with p samples is given as [20] ₣(n) = w(n)∑ cos ------ (3. 5) Where w(n) =

√ for n=0 &

w(n) = 2 for n≠ 0

57

Step 5: Select the mid frequency components with the index as [low: high] for every selected frame. This mid band range is selected through experiments. Step 6: Treating the coefficients at these indexes as the vectors.

Embed the watermark bit by quantizing the modified mean of the selected group of

coefficients and redistributing them as follows.

The pseudo code for EmbedUsingMean function is presented as follows

Function: EmbedUsingMean (D, d, low, high, num)

D -> vector containing mid band DCT coefficients

low -> lower index corresponding to mid band.

high -> higher index corresponding to mid band.

num -> no. of watermark bits to be embedded.

//Divide the coefficients in the mid band into num segments and find the mean for

each such segments.

//Let Mk be the Mean of kth such segment

Begin

for k =1 to num

Mk abs D ) ------ (3. 6)

//Two quantizers are used to embed bit 0 and bit 1 respectively. Calculate n as

n= └(Mk/¥)+1

//└ is the floor function which returns highest integer value lesser than the

argument passed.

//¥ is quantization step size that controls the imperceptibility and robustness.

if (w(k)= =1) //w(k) is the kth bit of the watermark

nm=n-M(n-1,2) //Quantizer for embedding 1

else

nm= n-M(n,2) //Quantizer for embedding 0

endif

//M is the module function which return the remainder of the arguments

passed

xm=¥+nm+¥/2

58

end //end of if

//Finally the mid band selected DCT coefficients for each segment is modified

D k(i) = Dk(i) * xm/ Mk

//Inverse DCT is applied on the DCT blocks to obtain the frame in time domain

again.

The p point inverse DCT (IDCT) is given as follows [20]

₣-1(k) = w(k)∑ cos ------- (3.7) Where w(n) =

√ for n=0 &

w(n) = 2 for n≠ 0

End //End of for

End //End of EmbedUsingMean

Step 7: Repeat step 3 – step 6 to embed successive watermark bits.

Step 8: Repeat steps 2 - step7 to embed multiple watermarks.

The embedding key for the watermarking which is passed to the extraction module is

given by the tuple represented through key.

Key = {¥, threshold, num} ------- (3. 8)

3.2.1.2 Watermark Extraction

Watermark bit(s) are extracted from each of the frame of the sub segments. All the

watermarking bits retrieved from one sub segment constitute a watermark. The steps

involved for retrieving the watermark are as follows.

Step 1: Step 2 – step 3 of the watermark embedding process are repeated to divide

the audio into segments and each segment into frames using the key given in eqn

3.8.

Step 2: For each frame of the sub segment call ExtractUsingMean function

given as

59

Function: ExtractUsingMean (D, d, low, high, num)

//Using eqn 3.6 Mk the modified mean is calculated for mid band coefficients

distributed in num sections

Begin

For k =1 to num

//Calculate n using key similar to the embedding as

n= └(Mk/¥) +1

The individual watermark bit(s) is extracted using the following rule

hide(i) =M(n,2)

End ////End of ExtractUsingMean

Step 3: Step 2 is repeated for each frame and the bits thus extracted are appended.

Step 4: The one dimensional bit stream is converted into a 2-D image.

Step 5: Repeat steps 2 to step 4 to extract multiple copies of the watermark.

The number of bits per frame is decided through the value of num

parameter. Num divides the mid band coefficients within a block to different segments and

each segment are then used to embed one watermark bit. Varying value of num imposes

adjustability requirement as it controls the payload of the scheme. The effect of num on the

SNR and the robustness to different attacks for a fixed frame length is presented and discussed

in sections 3.3.1 and 3.3.2 respectively.

3.2.2 Scheme II:

Using Quantization of the Modified Mean

In scheme II using the mid band DCT coefficient, to impose security onto the watermark, a

preprocessing on the watermark is done. The preprocessing encrypts the watermark so that

even if the extraction algorithm is known to the intruder, he will only be able to extract the

encrypted watermark in the worst case which will not ensure the existence of a meaning full

watermark to him.

PREPROCESSING OF WATERMARK

For the robustness and security of the watermark, the watermark is scrambled or encrypted

using Baker’s chaotic map. The other encryption techniques like Arnold transform or

60

encryption through linear shift register can also be used. Since Baker’s map is well used in the

literature for chaotic mapping of a 2-d matrix, we also used the Baker’s map for this purpose.

Let the size of the image I(x, y) of the image to be embedded is M x N with M & N as the

number of rows and columns respectively.

Before converting the image into one dimensional binary sequence the image I (x , y)

produces permuted version I`(x, y) using Baker’s chaotic and security key k1.

Then I`(x,y) is converted into a binary watermark given by

, ; , , , ---- (3.9)

With Len =Length(W)=M x N

This imposes the correct estimation or removal of watermark without the use of secret key.

The extraction is that of a chaotic permuted watermark which is required to be decoded to the

original watermark through the key.

3.2.2.1 Embedding

Step 1: Step 1 – step 7 of the embedding of section 3.2.1.1 are repeated with the relaxed block selection criteria.

Step 2: Treating the coefficients at these indexes as the vectors.

Embed the watermark bit by quantizing the Euclidian norm of the selected group of coefficients and redistributing them as follows

The pseudo code for EmbedUsingEuclidianNorm function is presented as follows

Function: EmbedUsingEuclidianNorm (D, d, low, high, num)

D -> vector containing mid band DCT coefficients

low -> lower index corresponding to mid band.

high -> higher index corresponding to mid band.

num -> no. of watermark bits to be embedded.

//Divide the coefficients in the mid band into num segments and find the mean

for each such segments.

61

//Let Uk be the Euclidian norm of the mid band DCT coefficients of the kth

frame.

Begin

for k =1 to num

Uk power D , 2 ,1/2 ------- (3.10)

Calculate n as

//Two quantizers are used to embed bit 0 and bit 1 respectively. Calculate n as

n= └(Mk/¥)+1

if (w(k)= =1) //w(k) is the kth bit of the watermark

nm=n-M(n-1,2) //Quantizer for embedding 1

else

nm= n-M(n,2) //Quantizer for embedding 0

endif

xm=¥+nm+¥/2

end //end of if

//Finally the mid band selected DCT coefficients for each segment is modified

D k(i) = Dk(i) * xm/ Uk

//Inverse DCT is applied on the DCT blocks to obtain the frame in time

domain again using equation 3.5.

End //End of for

End //End of EmbedUsingEuclidianNorm

The embedding key for the watermarking which is passed to the extraction is given by

the tuple represented through key.

Key = {¥, num, k1} ------- (3.11)

3.2.2.2 Extraction

Step 1: Step 2 – step 3 of watermark embedding process are repeated to divide the

audio into segments and each segment into frames using the key given in eq 3.11.

Step 2: For each frame of the sub segment, ExtractUsingEuclidianNorm function

is called

62

Function: ExtractUsingUsingEuclidianNorm (D, d, low, high, num)

// Using eqn 3.10 Uk the Euclidian norm is calculated for mid band coefficients

distributed in num sections

Begin

For k =1 to num

n= └(Mk/¥) +1 //Calculate n using key similar to the embedding

hide(i) =M(n,2) //The individual watermark bit(s) is extracted

End ////End of ExtractUsingUsingEuclidianNorm

Step 3: Step 2 is repeated for each frame and the bits thus extracted are appended.

Step 4: The one dimensional bit stream is converted into a 2-D image.

Step 5: Using security key k1 the image is converted back to the original.

Step 6: Repeat steps 2 to step 5 to extract multiple copies of the watermark.

3.3 Performance of the Proposed Schemes

The performance of the watermarking is tested for different content categories of the audios.

For imperceptibility test, SNR as well as MOS are used. The SNR is the metric used vastly in

the literature for imperceptibility evaluation. Since no unified test data exist for audio

watermarking, we have used the audio samples having the same attributes (sampling rate,

sample resolution, format) as used in the literature.

16 audio samples belonging to four different categories of music are used for testing. All test

sample audios are in wave format (Waveform File Format} with a bit rate of 705 kbps. The

sampling rate is 44.1 kHz and the sample resolution is 16 bit. Two different watermarks in

the form of a 2-d image of 20X50 pixels are used as embedding information. Matlab 2007b is

used for implementing the schemes. For simulation of attacks, Matlab as well as Goldwave

tools are used.

The audios taken are classical, pop, country and folk with different lengths. Figure 3.1(a) to

3.1 (p) shows the spectra of the test audios which are used for testing the performance.

63

Figure 3.1(a): Waveforms of the audio 1 Figure 3.1(b): Waveforms of the audio 2

Figure 3.1 (c): Waveforms of the audio 3 Figure 3.1 (d): Waveforms of the audio 4

Figure 3.1 (e): Waveforms of the audio 5 Figure 3.1 (f): Waveforms of the audio 6

Figure 3.1 (g): Waveforms of the audio 7 Figure 3.1 (h): Waveforms of the audio 8

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 106

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

time(msec)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

64

Figure 3.1 (i): Waveforms of the audio 9 Figure 3.1 (j): Waveforms of the audio 10

Figure 3.1 (k): Waveforms of the audio 11 Figure 3.1 (l): Waveforms of the audio 12

Figure 3.1 (m): Waveforms of the audio 13 Figure 3.1 (n): Waveforms of the audio 14

Figure 3.1 (o): Waveforms of the audio 15 Figure 3.1 (p): Waveforms of the audio 16

Figure 3.1: Time waveforms of the test audios

0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

65

3.3.1 Imperceptibility of the Watermarked Audio

The performance analysis of the proposed method for imperceptibility is done by finding

the signal to noise ratio of the original and watermarked audios and determining the mean

opinion score. Both of these metrics are discussed in chapter 1 under section 1.4.1 and

1.4.2 respectively.

Two different watermarks are used for watermarking the different audios which are

given in the following figure.

Figure 3.2: Different test watermarks

The first set of experiments is done to show the effect of watermarking on the

perceptibility of the audio. The SNR is tabulated for different values of quantization step

size. The mid band DCT coefficients which are selected for watermark embedding are the

coefficients that corresponds to frequency range of 5 KHz to 11 KHz. It was a usual

outcome that with increasing value of the strength factor ¥ the SNR of the watermarked

audio keeps on decreasing. We select the maximum value of ¥ in such a way that there is

no perceptible difference between the original and the watermarked audio and the value

of SNR is always greater than 30 dB. The mid frequency band of 5 to 11 kHz corresponds

to 86: 222 for the block size of 441 and for the block size of 1000 the range of indexes

corresponds to 250: 550.

The result of the SNR on varying values of ¥(qStep) presented through

Figure 3.3 to Figure 3.7 show that the highest SNR is for the pop music category and for

¥ equal to .09 all the audios of the different categories have SNR greater than 30 dB. Also

the MOS obtained from 10 subjects with in an age range of 18 – 30 years with five male

and five female is 4.5 which inference that the watermark is inaudible. The MOS based

Figure 3.4.2(a) Figure 3.4.2(b

66

perceptibility determination method is a time consuming method so in the literature most

of the researchers have not used this method. Instead they used SNR and the segmental

SNR for imperceptibility evaluation for original and watermarked audios.

Figure 3.3: SNR(dB) Vs qStep for Classical audio samples

Figure 3.4: SNR(dB) Vs qStep for Country audio samples

Figure 3.5: SNR(dB) Vs qStep for Pop audio samples

0102030405060

SNR(dB

)

qstep‐‐‐>

sample 1

sample 2

sample 3

sample 4

Average

0102030405060

SNR(dB

)

qStep‐‐>

sample 1

sample 2

sample 3

sample 4

Average

010203040506070

SNR(dB

)

qStep‐‐‐>

sample 1

sample 2

sample 3

sample 4

Average

67

Figure 3.6: SNR(dB) Vs qStep for Folk audio samples

Figure 3.7: Average SNR(dB) Vs qStep for all categories of audios

After selecting the quantization step which is a scalar used to control the SNR

and the range of coefficients on which the watermark bit is embedded we compare the

imperceptibility in terms of SNR of our scheme in which selected coefficients are selected and

the scheme in which all the DCT coefficients are used for embedding. The first set of

experiments in this category is done for scheme 1 in which mean is used for watermark

embedding with two different watermarks. Figure 3.8 (a) and figure 3.8 (b) gives the

comparison of SNR for the same value of quantization step ¥ for two different watermarks

mentioned above.

010203040506070

SNR(dB

)

qstep‐‐‐>

sample 1

sample 2

sample 3

sample 4

Average

010203040506070

SNR(dB

)

qStep‐‐‐‐>

Classical

Country

Pop

Folk

Average

68

Figure 3.8 (a): SNR(dB) for mid band Vs all DCT for Modified Mean using watermark 1

Figure 3.8 (b): SNR(dB) for mid band Vs. all DCT for Modified Mean using watermark 2

Figure 3.8: SNR(dB) for mid band Vs all DCT for Modified Mean for different watermarks

The result of the experiments shows that there is no effect of type or the content of watermark

on SNR. The average SNR corresponding to mid band DCT scheme is 37.99 dB as compared

to 38.23 dB for all DCT scheme. The comparison shows that the average SNR is

approximately similar for same value of ¥ for both. The second experiment is done for finding

the average extraction time( in sec) of the watermark for the mid band and all DCT for

modified mean schemes.

Figure 3.9: Average Extraction time(sec) for mid band Vs all for Modified Mean

0

10

20

30

40

50

SNR(dB

)

samples ‐‐>

Modified mean For Watermark 1 Midband DCT

Modified mean For Watermark 1 All DCT

0

10

20

30

40

50

SNR(dB

)

samples‐‐‐>

Modified mean For Watermark 1

Midband DCT

Modified mean For Watermark 1 All

DCT

0

2

4

6

8

time(sec)

sample‐‐‐>

time(mid band DCT)

time (All DCT)

69

The average retrieval time for all DCT is 6.60sec as compared to 5.69sec for mid band DCT which inference that the retrieval is faster for mid band approach.

The same set of experiments are repeated for all DCT and mid band DCT when the Euclidian norm of the vector produced from the DCT coefficients is used for watermark embedding and the result are presented in Figure 3.10 to Figure 3.12.

Figure 3.10: SNR(dB) for mid band Vs all for Euclidian norm using watermark 1

Figure 3.11: SNR(dB) for mid band Vs all for Euclidian norm using watermark 2

Figure 3.12: Average Extraction time(sec) for mid band Vs all for Euclidian norm

0

10

20

30

40

50

SNR(dB

)

samples ‐‐‐>

Eucledian Norm Midband DCT

Eucledian Norm All DCT

0

10

20

30

40

50

SNR(dB

)

samples ‐‐‐‐>

Eucledian Norm Midband DCT

Eucledian Norm All DCT

0

2

4

6

8

SNR(dB

)

samples ‐‐‐‐>

Eucledian Norm time(mid band DCT)

Eucledian Norm time (All DCT)

70

The average SNR corresponding to mid band DCT scheme is 35.21 dB as

compared to 32.57 dB for all DCT scheme. The comparison shows that the average SNR is

better for mid band DCT for same value of ¥. Also the average extraction time for mid band

DCT is 5.69 sec as compared to 6.65 sec for all DCT.

The effect of num and the frame length i.e. indirectly the payload is analyzed on the

SNR for the fixed value of ¥. As the bits embedded per frame are increased the SNR reduces

but the reduction in the SNR is not prominent. Figure 3.13 shows the results of the effect of

bits per frame i.e. num on the SNR for the fixed value of ¥ and frame size for the different

group of samples.

Figure 3.13: Effect of num i.e. bits/frame on the SNR(dB)

For this experiment the frame size has taken as 1024 samples and ¥ is taken as .06. The SNR

in the figure is the average taken over all the samples in the same category of music. Similar

results are obtained for different values of frame size and increasing value of num.

Analysis of the comparison in all reveals that the SNR of watermarked audio in

which all the DCT coefficients are modified is lesser than the one in which mid band DCT

coefficients are only used for watermark embedding. Also, SNR of the mid band DCT based

scheme using the mean is better than the one which uses Euclidian norm. The SNR obtained

after embedding watermark 1 and watermark 2 is approximately same implying that there is

no effect of the content of the watermark on the SNR. The time of extraction of the watermark

from the watermarked audio is also lesser in case of mid band schemes and across the

watermark it is almost same.

0

10

20

30

40

50

num=1 num=2 num=3 num=4 num=5

SNR(dB

)

bits/frame

c1assical

Country

Pop

Folk

71

In the next section we are presenting the robustness results under different attacks.

3.3.2 Robustness of Watermarked Audio

The robustness testing is done by exposing the watermarked audios to attacks and then

calculating the normalized correlation of the embedded and the extracted watermark. For any

watermarking schemes, it is essential that the scheme be robust against intentional and

unintentional attacks. The robustness of the watermarking schemes is measured through the

correlation coefficient (CC) obtained from the original and retrieved watermark and the bit

error rate of the extracted watermark to the original. The more is the value of CC more robust

is the watermarking scheme.

The following stirmark attacks [127] are used for evaluation of robustness of the watermarked

audio.

Closed loop attack: The watermarks are extracted from the watermarked audio

directly without subjecting the watermarked audio to any attack.

Re-sampling attacks: The watermarked audio is subjected to re-sampling attack.

There are two types of attacks done under this category.

o Type I re-sampling: In this attack, the sampling of the watermarked audio is

done at a lower rate than the original and then again re-sampling is done to

retrieve the audio again back to the original sampling rate.

o Type II re-sampling: In this attack, the sampling of the watermarked audio is

done at a higher rate than the original and then again re-sampling is done to

retrieved the audio again back to the original sampling rate.

Re-quantization attack: Each of the samples of watermarked audio is subjected to re-

quantized attack. There are two types of attacks done under this category.

o Type I re-quantization: In this attack, each sample of the watermarked audio

are quantized using lesser no. of bits than the original and then again re-

quantization is done to retrieve the audio again back to the original resolution.

o Type II re-quantization: In this attack, each sample of the watermarked audio

are quantized using higher no. of bits than the original and then again re-

quantization is done to retrieve the audio again back to the original resolution.

72

Amplify: The watermarked audio is amplified at different amplification rate. Through

this attack the watermarking schemes that embed the watermark in the amplitude of

the individual samples becomes vulnerable as the amplitude is altered.

Low pass Filtering: The watermarked audio is passed through a low pass filter with

the pass frequency as 11 KHz.

High pass Filtering: The watermarked audio is passed through a high pass filter with

the pass frequency as 100 Hz.

Echo Addition: Echoes with different delays are added to the watermarked audios.

This attack basically is the counter for echo addition schemes that adds the watermark

bit in the form of echoes with different delays.

LSB flipping/invert: The LSB of each of the sample is flipped i.e. the LSB is

modified to 1 if the original bit was 0 and vice versa.

AAC attack: the MPEG4 advanced coding based compression is applied on the

watermarked audio and again the audio is converted into the original format.

MP3 compression attack: The mp3 compression at different bit rate is applied on the

watermarked audio and again decompression is done to the original bit rate.

Noise addition: Different types of noises like Additive White Gaussian Noise

(AWGN) with variable SNR are added to the watermarked audios.

Figures 3.14(a) – 3.14 (i) show the test result of the average CC obtained when the

watermarked audio is subjected to different attacks using the two proposed schemes. For all

the figures the x-axis depicts the different attacks and the corresponding value of CC is given

on the y-axis.

00.20.40.60.81

1.2

CC

Attack

Robustness against Re‐Quantization Modified mean

Robustness against Re‐Quantization Eucledian Norm

0.920.940.960.98

11.02

CC

Attack

Robustness against Resampling Modified mean

Robustness against Resampling Eucledian Norm

73

Figure 3.14 (a): Robustness against Figure 3.14 (b): Robustness against Re Sampling

Quantization for both schemes for both schemes

Figure 3.14 (c): Robustness against Figure 3.14(d): Robustness against Filtering

Amplification for both schemes for both schemes

Figure 3.14(e): Robustness against Random Figure 3.14(f): Robustness against echo

Cropping for both schemes addition for both schemes

Figure 3.14(g): Robustness against LSB Figure 3.14(h): Robustness against mp3

Zero and LSB invert for both schemes compression for both schemes

00.20.40.60.81

1.2CC

Attack

Robustness against amplification Modified mean

Robustness against amplification Eucledian Norm

00.20.40.60.81

1.2

CC

Attack

Low pass(11kHz)

high pass (100 Hz)

0.970.9750.98

0.9850.99

0.9951

1.005

CC

Attack

Robustness against Cropping Modified mean

Robustness against Cropping Eucledian Norm

00.20.40.60.81

1.2

CC

Attack

delay 10 msec

delay 5msec

00.20.40.60.81

1.2

Modified mean

Eucledian Norm

CC

Attack

invert

LSB Zero 0

0.2

0.4

0.6

0.8

1

CC

Attack

Robustness against mp3 compression Modified mean

Robustness against mp3 compression Eucledian Norm

74

Figure 3.14 (i): Robustness against Noise Addition

for both schemes

Figure 3.14: Robustness against different attacks for mid band schemes

From the results of robustness, it can be inferred that both of our schemes are able to

resist the common signal processing attacks with a good imperceptibility. The average SNR

corrosponding to the watermarked audios for scheme 1 is 38 dB and that for scheme 2 is 35

dB which is far above the IFPI recommendation of 20 dB. Scheme 2 performs better than the

first scheme especially for the compression attack showing that there is less alteration in the

Euclidian norm of the mid band DCT coefficients of the frame when the watermarked audio is

subjected to compression attacks.

3.4 Comparison with Other Schemes.

The result of imperceptibility and robustness of the proposed mid band based schemes are

compared with some contemporary schemes. Though, the direct comparison can’t be done as

there is no common test data for the cover audio and the watermark, we compared on the basis

of the result provided by them. The robustness result corresponding to the schemes in which

all the DCT coefficients are used shows a poor response against filtering and compression

attacks even at lower compression. So, the results are not explicitly presented here.

3.4.1 Comparison on the Basis of SNR

The proposed schemes are compared with DCT based schemes proposed by Dhar [97], echo

hiding based scheme by Bender [78], reduced SVD based scheme by Healy, Wang [114],

00.20.40.60.81

CC

Attack

Noise Addition(35dB)

Noise Addition(30dB)

75

SVD based scheme by Ali [119], SVD based scheme by Fathi [117] and time domain

embedding scheme by Shahriar [118].

Figure 3.15: Comparison on the basis of SNR for mid band scheme

3.4.2 Comparison on the Basis of Payload

One of the dimension or the requirement of the watermarking on the basis of which the

schemes can be compared is the payload expressed in bits per second i.e. bps .As

recommended by IFPI the payload should be more than 0.5 bps for copyright applications and

more than 20bps for other applications. Also adjustability requirement is imposed for using

the watermarking scheme for multiple applications. Both of our schemes met the adjustability

criteria in which variable rate of payload are achieved with a minimum of 86 bps. The

variable rate is achieved through the number of bits embedded that can be embedded on every

individual frame used for the process of watermarking. In our experiments, the frame size has

been fixed at 1024 samples as the compression using mp3 process the frames with a block

size of 1024. Frame size of 1024 adds robustness to the DCT coefficients against mp3

0510152025303540

SNR(dB

)

schemes

76

compression attack. Although, explicit results are not presented here for showing the effect of

frame size on the robustness against mp3 compression for both proposed schemes it is

inferred from the experiments that they work best for a frame size of 1024 samples.

The payload of our proposed schemes is compared with some contemporary and state

of the art schemes. The results are presented in the tabulated form next.

Table 3.1: Comparison on the basis of payload of schemes

3.4.3 Comparison on the basis of Robustness to Attacks

For comparing the robustness against attacks given in Section 3.4.2 we regulate the strength

factor to achieve the SNR in the range of 32 dB to 35 dB for our audios.

Scheme SNR (dB)

Payload (bps)

Dhar [2011] 22.15 N/A

Bender [1996] 21.47 N/A

Wang [2011] 27.23 187

Ali[2010] 25.24 N/A

E. Fathi[2009] 27.18 N/A

Shariar[2013] 23.93 61.7

Proposed with

Eucledian norm 35.21 86.13

Proposed with modified Mean 37.99 86.13

tribhuwan.tewari

Line

77

Table 3.2: Robustness comparison with other Schemes

Ours with modified mean

Ours with Euclidian Norm

Wang [ 2011] Fathi [2010]

Dhar DCT[2011]

Closed loop 1 1 0.94 1 1

Re-sampling to half 1 1 X X 1

Re-sampling to32KHz 1 1 X 0.88 X

Re-Quantization to 8 bit 1 1 1 1 1

Low pass filtering(11KHz) 1 1 X 0.94 1

High Pass filtering(100 Hz) 1 1 X 0.99 1

Amplify 5% 1 0.96 X X X

Amplify 10% 1 0.75 X X X

Invert 1 1 X X X

LSB Zero 1 1 X X X

Noise Addition 0.70 0.73 X 0.96 X

320 kbps 0.88 0.89 X X X

256 kbps 0.78 0.78 0.94 0.96 X

96 kbps 0.65 0.67 0.93 X X

48kbps 0 0.5 X X X

32kbps 0 0.48 X X X

The average of the correlation coefficients obtained from the different watermarked audio is

then used for comparison with the result given by different authors directly in table 3.2. In the

78

table 3.2, X depicts that corresponding to the attack category the results are not reported by

the author.

The analysis of the comparison show that our schemes have higher SNR and

are robust to attacks and out performs the other schemes except against the compression

attacks. Out of the two proposed schemes, the one which uses the Euclidian Norm of the mid

band DCT coefficients for watermark embedding i.e. Scheme 2 shows good robustness

against higher compression rate in addition to some other attacks. Scheme 1 shows better

robustness result against amplification and echo addition.

In the next section, we present a novel image watermarking scheme using the mid band DCT

coefficients. The idea is to watermark an image and audio separately so that it is used for

watermarking video.

3.5 Watermarking of Digital Image using Mid-band DCT

In this section, a DCT based additive image watermarking scheme is presented. The scheme

provides higher resistance to image processing attacks mainly JPEG compression. In our

approach the watermark is embedded in the mid frequency band of the DCT blocks only in

the sub band which is carrying low frequency components and the high frequency sub band

components remain untouched. The intent is to develop a high compression resistant image

watermarking scheme showing least blocking artifacts and when combined with audio

watermarking can be used for watermarking a video also.

In an image adjacent pixels generally have almost the same intensity value. Neighboring

pixel intensity can easily be predicted from a given pixel intensity. When the DCT is

performed on an image this correlation is removed thus helping us in embedding the

watermark in scattered form in different DCT coefficients.

The approach adopted by us is slightly different from the traditional mid band DCT based

schemes in which the mid band DCT coefficients are directly used for watermarking. Since

our major work is on audio watermarking in the theses, we are limiting the writing to the

description of our proposed scheme and also not explicitly comparing it with other image

watermarking schemes. Next section describes the watermark embedding and extraction

algorithm using modified mid band DCT based schemes.

79

3.5.1.1 Watermark Embedding Algorithm

The algorithm which is used to embed a watermark on an image is given below

Step 1: Segment the image I(i , j) into two sub band blocks with half the size of the

original image i.e I1(i/2,j) & I2(i/2,j). I1(i/2,j) gives the high intensity pixels block

& I2(i/2,j) give low intensity pixels block

I(i.j) =∑I(i/2,j)+I(i/2,j).

Step 2: Break the I(i,j/2) into blocks of size 8* 8.

Step 3: Find the 2- d DCT of each of the block.

Step 4: Private key is used to generate two pseudo random no sequences of domain

{-1, 0, 1} which are highly uncorrelated.

Step 5: Preprocess the watermark by converting the watermark into a binary

sequence.

W (m*n)->W( s * 1) where s=m*n

Step 6: Embed the watermark on each of the DCT block in the mid band of each

coefficient block using the pseudo random number sequence and the watermark

sequence.

Step 7: Inverse 2-d DCT operation is done on the blocks to obtain the averaged

image block again.

Step 8: Inverse operation of step 2 is done to obtain the watermarked image.

The insertion of the watermark in the mid band of the coefficient block of each averaged DCT

block gives extra robustness to the watermark. The use of the key gives security to the

watermarking system. As the watermark is embedded in the mid frequency band of the

transformed high pixel intensity image, robustness against JPEG attack is highly increased.

3.5.2 Watermark Extraction Algorithm

The steps involved in the watermark extraction algorithm are given below.

Take the watermarked image as the input along with the key used for watermark

embedding.

Step 1: Repeat step 1 and 2 of watermark embed algorithm

80

Step 2: Using the private key extract the watermark.

Step 3: Compare the watermark with the original watermark.

Step 4: Similar watermark will prove the authenticity.

The following section shows the results of the extracted watermark from the

watermarked image with some attacks. The percentage similarity between the original

watermark and the retrieved watermark is obtained through the correlation function.

3.5.3 Results and Discussions

For testing the robustness of the proposed algorithm 4 test images are used. The watermarked

image is subjected to compression at different quality factor in addition to the common image

processing attacks. The watermark from the compressed watermarked image is retrieved using

the extraction process. The percentage similarity between the extracted watermark and the

original watermark is calculated. The same procedure is repeated for different standard images

and results are taken in the form of SNR and CC. i.e. measuring correlation between original

and retrieved watermark at different quality factor (Q) .

Figure 3.16 gives the standard images used for testing the watermarking

algorithm. For embedding, two different watermarks are used which the same as used for

audio watermarking and are given in Figure 3.2

Figure 3.16 (a): Figure 3.16 (b): Figure 3.16 (c): Figure 3.16 (d):

Lena image Boat image Baboon image Barbara image

Figure 3.16: Standard test images [128]

81

The intermediate snapshot of the Lena image when undergone the embedding stage and

compression attack is shown in Figure 3.17.

(a) Original Lena image (b) Averaged Lena (c) Watermarked Lena (d) Compressed Lena

Figure 3.17: Lena image undergone different operations

The watermarked images are subjected to compression at different compression ratio

referenced as the quality factor. Lower the quality factor higher will be the compression.

Quality factor depicts the amount of compression and reflects the size of the compressed

image as compared to the original one. The following table gives the robustness against JPEG

compression for the three test images.

Table 3.3: Correlation Coefficient (CC) Vs Quality Factor

Quality Factor CC for Baboon CC for Lena CC for Boat CC for Barbara

5 0.75 0.935 0.89 0.92 10 0.77 0.959 0.90 0.95 20 0.79 0.9859 0.94 0.98 40 0.79 0.9859 0.93 0.98 45 0.79 0.9859 0.94 0.98 50 0.79 0.9859 0.94 0.98 60 0.80 0.9859 0.94 0.98 80 0.79 0.9894 0.94 0.98

With other common signal processing attacks the performance of the watermarking algorithm is checked for Lena image which is presented through the following table.

82

Table 3.4: Correlation coefficient (CC) for Lena image undergone different attacks

In the table 3.4, v refers to the variance & Q refers to the quality factor.

For all the attacking operations Adobe Photoshop is used and the code is built

in Mat lab 7. Corr2 function is used for finding the correlation between the original watermark

and the retrieved watermark. Imnoise function is used to introduce both types of noise i.e.

Gaussian, salt and pepper noise. The result shows that the method work best for the

compression attack and tolerant against common image processing attacks.

The watermarking techniques in which the watermark is embedded in transform

domains are typically better candidates for watermarking than spatial, for both reasons of

robustness as well as visual impact. Embedding in the DCT domain proved to be highly

resistant to JPEG compression as well as significant amounts of random noise. By anticipating

which coefficients would be modified by the subsequent transform and quantization, we were

able to produce a watermarking technique with moderate robustness, good capacity, and low

visual artifacts. But as all the DCT based images suffers from visual artifacts as DCT is done

Attacks CC Attacks CC Q=5 0.935 Blur

Factor=.20.9052

Q=10 0.959 Gaussian v=.01

0.9484

Q=20 0.9859 Gaussian v=.02

0.8844

Q=40 0.9859 Gaussian v=.04

0.7267

Q=45 0.9859 Gaussian v=.06

0.664

Q=50 0.9859 Gaussian v=.1

0.5384

Q=60 0.9859 Gaussian v=.15

0.3524

Q=80 0.9894 Salt & Pepper v=0.1

0.8936

Cropped 1

0.9859 Salt & Pepper v=0.15

0.8084

Cropped 2

0.9859 Salt & Pepper v=0.2

0.5476

83

on the blocks our approach is no exception. But the visual quality is good and it’s not with

noticeable change. Nonetheless, the visual quality can be improved by the slight introduction

of blur.

3.6 Summary

We proposed two digital audio watermarking algorithms based on DCT domain for mono

audios. One of the schemes uses quantization of the modified mean of the selected mid band

coefficient and the other uses the Euclidian norm of the selected mid band coefficients for

watermark embedding. In Scheme 1, for the robustness of audio watermark, energy threshold

is decided to consider selecting blocks which has minimum energy threshold. In scheme 2, all

the blocks, are used for watermark embedding and the watermark is preprocessed before

embedding through Baker’s map encryption which produce the permuted version of the

watermark. This preprocessing of the watermark introduces security of the watermark.

Further, estimation of the watermark becomes difficult without the possession of the security

key. Both the schemes uses mid band DCT coefficients and give better result on the approach

which uses all the coefficients. Experimental results have illustrated the robust and inaudible

nature of our embedding scheme as well. In addition, the watermark can be extracted without

the help from original audio signal and procedure for embedding and extraction can be easily

implemented. For the schemes, the SNR and the payload is far more than what is

recommended by IFPI.

At the end of chapter an image watermarking scheme which is using sub band

DCT for watermark embedding is proposed. The proposed scheme is a DCT based additive

watermarking scheme which provides higher resistance to image processing attacks mainly

JPEG compression. The watermark is embedded in the mid frequency band of the DCT blocks

only in the sub band which is carrying low frequency components and the high frequency sub

band components remain untouched.

chapter 3 digital watermarking schemes using...

Documents