chapter 3 digital watermarking schemes using...
TRANSCRIPT
53
CHAPTER 3
DIGITAL WATERMARKING SCHEMES USING DCT
DCT is an important transform that has proved a mark in audio and image watermarking. The
lower complexity compared to other transforms distinguish it and make it better than other
transforms like DST, DFT etc as far as watermarking and de correlating the data for
watermark embedding is concerned. The different schemes which embed the data in DCT
transform domain in principle differs in the no. of coefficients taken for embedding, the type
of coefficients i.e. low, high or middle, ac or dc coefficients, the methodology used for
embedding [51]-[53] and finding the coefficients for watermark embedding which produce
minimum distortion and maximum robustness.
Although, the DCT based watermarking schemes have low embedding
complexity but the use of low frequency coefficients or the DC coefficients as the
watermarking locations leads to less imperceptibility [51][95][97]. The use of selected
frequency coefficients and better embedding strategy can provide a good balance between
imperceptibility and robustness. Embedding watermark on a single coefficient may not
sustain robustness against attacks but statistical properties such as mean, Euclidian norm etc.
of a group of coefficients when used for data embedding has higher probability to show
robustness. Also, improvement on the robustness and the security can be brought by means
of attack characterization and using permuted version of the watermark rather than the
original watermark directly.
This chapter presents two blind digital audio watermarking schemes using
DCT for mono audios. In our schemes for mono audios, the watermark is embedded into the
selected group of mid band coefficients of the DCT transformed audio. The selection of the
mid band DCT coefficients for embedding inherently make the watermarked audio robust to
filtering attacks. The modified mean or Euclidian norm of the group of mid band DCT
coefficients are quantized for watermark embedding. The mid band frequency components
are selected through experiments and are the coefficients which correspond to the frequencies
between 5 KHz and 11 KHz. The effect of mp3 compression and other signal processing
operation have minimum effect on these coefficients which are evident from the results also.
Embedding of the watermark on the audios involve mainly the
preprocessing, block selection and DCT coefficients selection. The energy threshold for
block selection is decided through experiments and is adaptive to different audio signals. In
54
addition, the block size has been made as dynamic so that the intruder cannot identify the
block of embedding and combining multiple copies of the watermarked audios should not
eliminate the watermark.
A DCT based image watermarking scheme presented at the end of
the chapter also uses the mid band DCT coefficients for watermark embedding. Unlike, the
traditional DCT based image watermarking schemes where block DCT is applied directly
on images, our scheme transforms the image into a sub band image. The sub band image is
obtained through averaging and differencing of neighboring pixel value. On the sub band
image block DCT is applied and on each DCT block one watermark bit is embedded.
The rest of the chapter is organized as follows. Section 3.1 gives a
brief of the related audio watermarking approaches which uses DCT for embedding with
the problem statement. Section 3.2.1 and Section 3.2.2 present our proposed schemes.
Section 3.3 discusses the experimental result mainly the imperceptibility and the robustness
of the watermarked audio followed by the comparison with contemporary schemes in
section 3.4. In section 3.5, an image watermarking scheme using mid band DCT
coefficients is presented along with the test results followed by summary of the chapter in
the last section.
3.1 Problem Statement
In the past few years a number of audio watermarking schemes are evolved for copyright
protection. The DCT based approaches are said to be simple and effective as the
watermark embedding is done in transform domain. The DCT based schemes are useful as
they try to accumulate the energy in some of the coefficients and also minimize the
relationship between adjacent samples with in an audio and de correlates them.
The problem with these schemes is that they don’t consider the energy
of the individual blocks for watermark embedding. This is a problem since watermarked
data hidden on the low energy blocks is always susceptible to being removed altogether
through an attack without disturbing the cover audio. The low energy blocks when
watermarked produce more disturbances while hearing the watermarked audio also.
Analyzing the effect of common signal processing operation including mp3 conversion
onto the different DCT coefficients and the DCT blocks need to be done. Fixed sized DCT
blocks used for embedding are also prone to watermark estimation and collusion attack.
55
So, our proposed schemes use the minimum energy threshold of the individual
blocks for watermark embedding. Also, the mid frequency DCT coefficients are used for
embedding watermark bits in the individual block. The region of embedding is chosen as
the range of frequencies from 5 KHz to 11 KHz. The lower frequency region i.e. lower
than 5 KHz is a highly perceptual significant region and the high frequency region is not
that perceptual and thus susceptible to modification after common signal processing
operations like compression. The selection of the mid band maintains a balance between
robustness and imperceptibility. The number of bits embedded within one DCT block is
controlled through a key and thus meets the adjustability requirement..
3.2 Proposed Watermarking Schemes for Audio 3.2.1 Scheme 1: Using Quantization of the Modified Mean First scheme uses the quantization of the modified mean of the selected mid band DCT
coefficients as the embedding function. In the second scheme the embedding function
uses quantization [126] of the Euclidian norm of the group of selected mid band DCT
coefficients.
The original audio is segmented into multiple segments in such a way that every
segment is able to carry one watermark. Every segment is divided into frames of 20 ms to
25 ms i.e. variable block size using the key and the frames which pass the criteria of
minimum energy threshold are used for embedding bit(s) of watermark information.
The two dimensional watermark is converted into a bit of stream.
If the size of the image is M x N with M & N as the number of rows and
columns respectively then it is converted into a one dimensional bit stream of MxN as
follows.
Let W be the binary watermark thus produced
, ; , , , ---- (3.1)
With Len =length(W)=M x N
56
3.2.1.1 Watermark Embedding Method
Step 1: Audio signal is sampled at a rate of 44.1 kHz. Then partition the entire
samples into segments for carrying watermark bits.
X = {x (1).x (2)…………………………………….x(N))
//X is the entire audio signal with N samples and with ith individual samples
represented as x(i).
The segments are partitioned into sub segments of equal length L consisting of
frames. Each frame is of 20 to 25ms durations. The samples of all frames makes up
the overall sampled audio segment signal represented through the following equation:
St = {Ft (1), Ft (2),.................................................. Ft (n) } ----- (3.2)
//St is the tth segment to embed watermark bit (s) and total is the no of samples in it and Xi
corresponds to the ith frame.
Step 2: The rejection or selection of the frame depends upon the energy of the frame.
Let Fl denotes l th frame of the segment,
The average energy of the audio frame is calculated. For this step silence
portion of the audio are not considered.
€( Xl) = ( )ii
p
iAA *
1∑= ----- (3.3)
where p is the total no. of samples within the frame which varies with the key input Step 3: The frames Xl for which frame energy is greater than a threshold are selected
for watermark embedding.
Step 4: For watermark embedding on the lth selected sub segment i.e. frame Xl , p point1-D DCT is performed. D
l= ₣(Xl) ----- (3. 4) Where the p point 1 –D DCT of the frame with p samples is given as [20] ₣(n) = w(n)∑ cos ------ (3. 5) Where w(n) =
√ for n=0 &
w(n) = 2 for n≠ 0
57
Step 5: Select the mid frequency components with the index as [low: high] for every selected frame. This mid band range is selected through experiments. Step 6: Treating the coefficients at these indexes as the vectors.
Embed the watermark bit by quantizing the modified mean of the selected group of
coefficients and redistributing them as follows.
The pseudo code for EmbedUsingMean function is presented as follows
Function: EmbedUsingMean (D, d, low, high, num)
D -> vector containing mid band DCT coefficients
low -> lower index corresponding to mid band.
high -> higher index corresponding to mid band.
num -> no. of watermark bits to be embedded.
//Divide the coefficients in the mid band into num segments and find the mean for
each such segments.
//Let Mk be the Mean of kth such segment
Begin
for k =1 to num
Mk abs D ) ------ (3. 6)
//Two quantizers are used to embed bit 0 and bit 1 respectively. Calculate n as
n= └(Mk/¥)+1
//└ is the floor function which returns highest integer value lesser than the
argument passed.
//¥ is quantization step size that controls the imperceptibility and robustness.
if (w(k)= =1) //w(k) is the kth bit of the watermark
nm=n-M(n-1,2) //Quantizer for embedding 1
else
nm= n-M(n,2) //Quantizer for embedding 0
endif
//M is the module function which return the remainder of the arguments
passed
xm=¥+nm+¥/2
58
end //end of if
//Finally the mid band selected DCT coefficients for each segment is modified
D k(i) = Dk(i) * xm/ Mk
//Inverse DCT is applied on the DCT blocks to obtain the frame in time domain
again.
The p point inverse DCT (IDCT) is given as follows [20]
₣-1(k) = w(k)∑ cos ------- (3.7) Where w(n) =
√ for n=0 &
w(n) = 2 for n≠ 0
End //End of for
End //End of EmbedUsingMean
Step 7: Repeat step 3 – step 6 to embed successive watermark bits.
Step 8: Repeat steps 2 - step7 to embed multiple watermarks.
The embedding key for the watermarking which is passed to the extraction module is
given by the tuple represented through key.
Key = {¥, threshold, num} ------- (3. 8)
3.2.1.2 Watermark Extraction
Watermark bit(s) are extracted from each of the frame of the sub segments. All the
watermarking bits retrieved from one sub segment constitute a watermark. The steps
involved for retrieving the watermark are as follows.
Step 1: Step 2 – step 3 of the watermark embedding process are repeated to divide
the audio into segments and each segment into frames using the key given in eqn
3.8.
Step 2: For each frame of the sub segment call ExtractUsingMean function
given as
59
Function: ExtractUsingMean (D, d, low, high, num)
//Using eqn 3.6 Mk the modified mean is calculated for mid band coefficients
distributed in num sections
Begin
For k =1 to num
//Calculate n using key similar to the embedding as
n= └(Mk/¥) +1
The individual watermark bit(s) is extracted using the following rule
hide(i) =M(n,2)
End ////End of ExtractUsingMean
Step 3: Step 2 is repeated for each frame and the bits thus extracted are appended.
Step 4: The one dimensional bit stream is converted into a 2-D image.
Step 5: Repeat steps 2 to step 4 to extract multiple copies of the watermark.
The number of bits per frame is decided through the value of num
parameter. Num divides the mid band coefficients within a block to different segments and
each segment are then used to embed one watermark bit. Varying value of num imposes
adjustability requirement as it controls the payload of the scheme. The effect of num on the
SNR and the robustness to different attacks for a fixed frame length is presented and discussed
in sections 3.3.1 and 3.3.2 respectively.
3.2.2 Scheme II:
Using Quantization of the Modified Mean
In scheme II using the mid band DCT coefficient, to impose security onto the watermark, a
preprocessing on the watermark is done. The preprocessing encrypts the watermark so that
even if the extraction algorithm is known to the intruder, he will only be able to extract the
encrypted watermark in the worst case which will not ensure the existence of a meaning full
watermark to him.
PREPROCESSING OF WATERMARK
For the robustness and security of the watermark, the watermark is scrambled or encrypted
using Baker’s chaotic map. The other encryption techniques like Arnold transform or
60
encryption through linear shift register can also be used. Since Baker’s map is well used in the
literature for chaotic mapping of a 2-d matrix, we also used the Baker’s map for this purpose.
Let the size of the image I(x, y) of the image to be embedded is M x N with M & N as the
number of rows and columns respectively.
Before converting the image into one dimensional binary sequence the image I (x , y)
produces permuted version I`(x, y) using Baker’s chaotic and security key k1.
Then I`(x,y) is converted into a binary watermark given by
, ; , , , ---- (3.9)
With Len =Length(W)=M x N
This imposes the correct estimation or removal of watermark without the use of secret key.
The extraction is that of a chaotic permuted watermark which is required to be decoded to the
original watermark through the key.
3.2.2.1 Embedding
Step 1: Step 1 – step 7 of the embedding of section 3.2.1.1 are repeated with the relaxed block selection criteria.
Step 2: Treating the coefficients at these indexes as the vectors.
Embed the watermark bit by quantizing the Euclidian norm of the selected group of coefficients and redistributing them as follows
The pseudo code for EmbedUsingEuclidianNorm function is presented as follows
Function: EmbedUsingEuclidianNorm (D, d, low, high, num)
D -> vector containing mid band DCT coefficients
low -> lower index corresponding to mid band.
high -> higher index corresponding to mid band.
num -> no. of watermark bits to be embedded.
//Divide the coefficients in the mid band into num segments and find the mean
for each such segments.
61
//Let Uk be the Euclidian norm of the mid band DCT coefficients of the kth
frame.
Begin
for k =1 to num
Uk power D , 2 ,1/2 ------- (3.10)
Calculate n as
//Two quantizers are used to embed bit 0 and bit 1 respectively. Calculate n as
n= └(Mk/¥)+1
if (w(k)= =1) //w(k) is the kth bit of the watermark
nm=n-M(n-1,2) //Quantizer for embedding 1
else
nm= n-M(n,2) //Quantizer for embedding 0
endif
xm=¥+nm+¥/2
end //end of if
//Finally the mid band selected DCT coefficients for each segment is modified
D k(i) = Dk(i) * xm/ Uk
//Inverse DCT is applied on the DCT blocks to obtain the frame in time
domain again using equation 3.5.
End //End of for
End //End of EmbedUsingEuclidianNorm
The embedding key for the watermarking which is passed to the extraction is given by
the tuple represented through key.
Key = {¥, num, k1} ------- (3.11)
3.2.2.2 Extraction
Step 1: Step 2 – step 3 of watermark embedding process are repeated to divide the
audio into segments and each segment into frames using the key given in eq 3.11.
Step 2: For each frame of the sub segment, ExtractUsingEuclidianNorm function
is called
62
Function: ExtractUsingUsingEuclidianNorm (D, d, low, high, num)
// Using eqn 3.10 Uk the Euclidian norm is calculated for mid band coefficients
distributed in num sections
Begin
For k =1 to num
n= └(Mk/¥) +1 //Calculate n using key similar to the embedding
hide(i) =M(n,2) //The individual watermark bit(s) is extracted
End ////End of ExtractUsingUsingEuclidianNorm
Step 3: Step 2 is repeated for each frame and the bits thus extracted are appended.
Step 4: The one dimensional bit stream is converted into a 2-D image.
Step 5: Using security key k1 the image is converted back to the original.
Step 6: Repeat steps 2 to step 5 to extract multiple copies of the watermark.
3.3 Performance of the Proposed Schemes
The performance of the watermarking is tested for different content categories of the audios.
For imperceptibility test, SNR as well as MOS are used. The SNR is the metric used vastly in
the literature for imperceptibility evaluation. Since no unified test data exist for audio
watermarking, we have used the audio samples having the same attributes (sampling rate,
sample resolution, format) as used in the literature.
16 audio samples belonging to four different categories of music are used for testing. All test
sample audios are in wave format (Waveform File Format} with a bit rate of 705 kbps. The
sampling rate is 44.1 kHz and the sample resolution is 16 bit. Two different watermarks in
the form of a 2-d image of 20X50 pixels are used as embedding information. Matlab 2007b is
used for implementing the schemes. For simulation of attacks, Matlab as well as Goldwave
tools are used.
The audios taken are classical, pop, country and folk with different lengths. Figure 3.1(a) to
3.1 (p) shows the spectra of the test audios which are used for testing the performance.
63
Figure 3.1(a): Waveforms of the audio 1 Figure 3.1(b): Waveforms of the audio 2
Figure 3.1 (c): Waveforms of the audio 3 Figure 3.1 (d): Waveforms of the audio 4
Figure 3.1 (e): Waveforms of the audio 5 Figure 3.1 (f): Waveforms of the audio 6
Figure 3.1 (g): Waveforms of the audio 7 Figure 3.1 (h): Waveforms of the audio 8
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 106
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
time(msec)
ampl
itude
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
64
Figure 3.1 (i): Waveforms of the audio 9 Figure 3.1 (j): Waveforms of the audio 10
Figure 3.1 (k): Waveforms of the audio 11 Figure 3.1 (l): Waveforms of the audio 12
Figure 3.1 (m): Waveforms of the audio 13 Figure 3.1 (n): Waveforms of the audio 14
Figure 3.1 (o): Waveforms of the audio 15 Figure 3.1 (p): Waveforms of the audio 16
Figure 3.1: Time waveforms of the test audios
0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 0.5 1 1.5 2 2.5 3 3.5 4
x 105
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
65
3.3.1 Imperceptibility of the Watermarked Audio
The performance analysis of the proposed method for imperceptibility is done by finding
the signal to noise ratio of the original and watermarked audios and determining the mean
opinion score. Both of these metrics are discussed in chapter 1 under section 1.4.1 and
1.4.2 respectively.
Two different watermarks are used for watermarking the different audios which are
given in the following figure.
Figure 3.2: Different test watermarks
The first set of experiments is done to show the effect of watermarking on the
perceptibility of the audio. The SNR is tabulated for different values of quantization step
size. The mid band DCT coefficients which are selected for watermark embedding are the
coefficients that corresponds to frequency range of 5 KHz to 11 KHz. It was a usual
outcome that with increasing value of the strength factor ¥ the SNR of the watermarked
audio keeps on decreasing. We select the maximum value of ¥ in such a way that there is
no perceptible difference between the original and the watermarked audio and the value
of SNR is always greater than 30 dB. The mid frequency band of 5 to 11 kHz corresponds
to 86: 222 for the block size of 441 and for the block size of 1000 the range of indexes
corresponds to 250: 550.
The result of the SNR on varying values of ¥(qStep) presented through
Figure 3.3 to Figure 3.7 show that the highest SNR is for the pop music category and for
¥ equal to .09 all the audios of the different categories have SNR greater than 30 dB. Also
the MOS obtained from 10 subjects with in an age range of 18 – 30 years with five male
and five female is 4.5 which inference that the watermark is inaudible. The MOS based
Figure 3.4.2(a) Figure 3.4.2(b
66
perceptibility determination method is a time consuming method so in the literature most
of the researchers have not used this method. Instead they used SNR and the segmental
SNR for imperceptibility evaluation for original and watermarked audios.
Figure 3.3: SNR(dB) Vs qStep for Classical audio samples
Figure 3.4: SNR(dB) Vs qStep for Country audio samples
Figure 3.5: SNR(dB) Vs qStep for Pop audio samples
0102030405060
SNR(dB
)
qstep‐‐‐>
sample 1
sample 2
sample 3
sample 4
Average
0102030405060
SNR(dB
)
qStep‐‐>
sample 1
sample 2
sample 3
sample 4
Average
010203040506070
SNR(dB
)
qStep‐‐‐>
sample 1
sample 2
sample 3
sample 4
Average
67
Figure 3.6: SNR(dB) Vs qStep for Folk audio samples
Figure 3.7: Average SNR(dB) Vs qStep for all categories of audios
After selecting the quantization step which is a scalar used to control the SNR
and the range of coefficients on which the watermark bit is embedded we compare the
imperceptibility in terms of SNR of our scheme in which selected coefficients are selected and
the scheme in which all the DCT coefficients are used for embedding. The first set of
experiments in this category is done for scheme 1 in which mean is used for watermark
embedding with two different watermarks. Figure 3.8 (a) and figure 3.8 (b) gives the
comparison of SNR for the same value of quantization step ¥ for two different watermarks
mentioned above.
010203040506070
SNR(dB
)
qstep‐‐‐>
sample 1
sample 2
sample 3
sample 4
Average
010203040506070
SNR(dB
)
qStep‐‐‐‐>
Classical
Country
Pop
Folk
Average
68
Figure 3.8 (a): SNR(dB) for mid band Vs all DCT for Modified Mean using watermark 1
Figure 3.8 (b): SNR(dB) for mid band Vs. all DCT for Modified Mean using watermark 2
Figure 3.8: SNR(dB) for mid band Vs all DCT for Modified Mean for different watermarks
The result of the experiments shows that there is no effect of type or the content of watermark
on SNR. The average SNR corresponding to mid band DCT scheme is 37.99 dB as compared
to 38.23 dB for all DCT scheme. The comparison shows that the average SNR is
approximately similar for same value of ¥ for both. The second experiment is done for finding
the average extraction time( in sec) of the watermark for the mid band and all DCT for
modified mean schemes.
Figure 3.9: Average Extraction time(sec) for mid band Vs all for Modified Mean
0
10
20
30
40
50
SNR(dB
)
samples ‐‐>
Modified mean For Watermark 1 Midband DCT
Modified mean For Watermark 1 All DCT
0
10
20
30
40
50
SNR(dB
)
samples‐‐‐>
Modified mean For Watermark 1
Midband DCT
Modified mean For Watermark 1 All
DCT
0
2
4
6
8
time(sec)
sample‐‐‐>
time(mid band DCT)
time (All DCT)
69
The average retrieval time for all DCT is 6.60sec as compared to 5.69sec for mid band DCT which inference that the retrieval is faster for mid band approach.
The same set of experiments are repeated for all DCT and mid band DCT when the Euclidian norm of the vector produced from the DCT coefficients is used for watermark embedding and the result are presented in Figure 3.10 to Figure 3.12.
Figure 3.10: SNR(dB) for mid band Vs all for Euclidian norm using watermark 1
Figure 3.11: SNR(dB) for mid band Vs all for Euclidian norm using watermark 2
Figure 3.12: Average Extraction time(sec) for mid band Vs all for Euclidian norm
0
10
20
30
40
50
SNR(dB
)
samples ‐‐‐>
Eucledian Norm Midband DCT
Eucledian Norm All DCT
0
10
20
30
40
50
SNR(dB
)
samples ‐‐‐‐>
Eucledian Norm Midband DCT
Eucledian Norm All DCT
0
2
4
6
8
SNR(dB
)
samples ‐‐‐‐>
Eucledian Norm time(mid band DCT)
Eucledian Norm time (All DCT)
70
The average SNR corresponding to mid band DCT scheme is 35.21 dB as
compared to 32.57 dB for all DCT scheme. The comparison shows that the average SNR is
better for mid band DCT for same value of ¥. Also the average extraction time for mid band
DCT is 5.69 sec as compared to 6.65 sec for all DCT.
The effect of num and the frame length i.e. indirectly the payload is analyzed on the
SNR for the fixed value of ¥. As the bits embedded per frame are increased the SNR reduces
but the reduction in the SNR is not prominent. Figure 3.13 shows the results of the effect of
bits per frame i.e. num on the SNR for the fixed value of ¥ and frame size for the different
group of samples.
Figure 3.13: Effect of num i.e. bits/frame on the SNR(dB)
For this experiment the frame size has taken as 1024 samples and ¥ is taken as .06. The SNR
in the figure is the average taken over all the samples in the same category of music. Similar
results are obtained for different values of frame size and increasing value of num.
Analysis of the comparison in all reveals that the SNR of watermarked audio in
which all the DCT coefficients are modified is lesser than the one in which mid band DCT
coefficients are only used for watermark embedding. Also, SNR of the mid band DCT based
scheme using the mean is better than the one which uses Euclidian norm. The SNR obtained
after embedding watermark 1 and watermark 2 is approximately same implying that there is
no effect of the content of the watermark on the SNR. The time of extraction of the watermark
from the watermarked audio is also lesser in case of mid band schemes and across the
watermark it is almost same.
0
10
20
30
40
50
num=1 num=2 num=3 num=4 num=5
SNR(dB
)
bits/frame
c1assical
Country
Pop
Folk
71
In the next section we are presenting the robustness results under different attacks.
3.3.2 Robustness of Watermarked Audio
The robustness testing is done by exposing the watermarked audios to attacks and then
calculating the normalized correlation of the embedded and the extracted watermark. For any
watermarking schemes, it is essential that the scheme be robust against intentional and
unintentional attacks. The robustness of the watermarking schemes is measured through the
correlation coefficient (CC) obtained from the original and retrieved watermark and the bit
error rate of the extracted watermark to the original. The more is the value of CC more robust
is the watermarking scheme.
The following stirmark attacks [127] are used for evaluation of robustness of the watermarked
audio.
Closed loop attack: The watermarks are extracted from the watermarked audio
directly without subjecting the watermarked audio to any attack.
Re-sampling attacks: The watermarked audio is subjected to re-sampling attack.
There are two types of attacks done under this category.
o Type I re-sampling: In this attack, the sampling of the watermarked audio is
done at a lower rate than the original and then again re-sampling is done to
retrieve the audio again back to the original sampling rate.
o Type II re-sampling: In this attack, the sampling of the watermarked audio is
done at a higher rate than the original and then again re-sampling is done to
retrieved the audio again back to the original sampling rate.
Re-quantization attack: Each of the samples of watermarked audio is subjected to re-
quantized attack. There are two types of attacks done under this category.
o Type I re-quantization: In this attack, each sample of the watermarked audio
are quantized using lesser no. of bits than the original and then again re-
quantization is done to retrieve the audio again back to the original resolution.
o Type II re-quantization: In this attack, each sample of the watermarked audio
are quantized using higher no. of bits than the original and then again re-
quantization is done to retrieve the audio again back to the original resolution.
72
Amplify: The watermarked audio is amplified at different amplification rate. Through
this attack the watermarking schemes that embed the watermark in the amplitude of
the individual samples becomes vulnerable as the amplitude is altered.
Low pass Filtering: The watermarked audio is passed through a low pass filter with
the pass frequency as 11 KHz.
High pass Filtering: The watermarked audio is passed through a high pass filter with
the pass frequency as 100 Hz.
Echo Addition: Echoes with different delays are added to the watermarked audios.
This attack basically is the counter for echo addition schemes that adds the watermark
bit in the form of echoes with different delays.
LSB flipping/invert: The LSB of each of the sample is flipped i.e. the LSB is
modified to 1 if the original bit was 0 and vice versa.
AAC attack: the MPEG4 advanced coding based compression is applied on the
watermarked audio and again the audio is converted into the original format.
MP3 compression attack: The mp3 compression at different bit rate is applied on the
watermarked audio and again decompression is done to the original bit rate.
Noise addition: Different types of noises like Additive White Gaussian Noise
(AWGN) with variable SNR are added to the watermarked audios.
Figures 3.14(a) – 3.14 (i) show the test result of the average CC obtained when the
watermarked audio is subjected to different attacks using the two proposed schemes. For all
the figures the x-axis depicts the different attacks and the corresponding value of CC is given
on the y-axis.
00.20.40.60.81
1.2
CC
Attack
Robustness against Re‐Quantization Modified mean
Robustness against Re‐Quantization Eucledian Norm
0.920.940.960.98
11.02
CC
Attack
Robustness against Resampling Modified mean
Robustness against Resampling Eucledian Norm
73
Figure 3.14 (a): Robustness against Figure 3.14 (b): Robustness against Re Sampling
Quantization for both schemes for both schemes
Figure 3.14 (c): Robustness against Figure 3.14(d): Robustness against Filtering
Amplification for both schemes for both schemes
Figure 3.14(e): Robustness against Random Figure 3.14(f): Robustness against echo
Cropping for both schemes addition for both schemes
Figure 3.14(g): Robustness against LSB Figure 3.14(h): Robustness against mp3
Zero and LSB invert for both schemes compression for both schemes
00.20.40.60.81
1.2CC
Attack
Robustness against amplification Modified mean
Robustness against amplification Eucledian Norm
00.20.40.60.81
1.2
CC
Attack
Low pass(11kHz)
high pass (100 Hz)
0.970.9750.98
0.9850.99
0.9951
1.005
CC
Attack
Robustness against Cropping Modified mean
Robustness against Cropping Eucledian Norm
00.20.40.60.81
1.2
CC
Attack
delay 10 msec
delay 5msec
00.20.40.60.81
1.2
Modified mean
Eucledian Norm
CC
Attack
invert
LSB Zero 0
0.2
0.4
0.6
0.8
1
CC
Attack
Robustness against mp3 compression Modified mean
Robustness against mp3 compression Eucledian Norm
74
Figure 3.14 (i): Robustness against Noise Addition
for both schemes
Figure 3.14: Robustness against different attacks for mid band schemes
From the results of robustness, it can be inferred that both of our schemes are able to
resist the common signal processing attacks with a good imperceptibility. The average SNR
corrosponding to the watermarked audios for scheme 1 is 38 dB and that for scheme 2 is 35
dB which is far above the IFPI recommendation of 20 dB. Scheme 2 performs better than the
first scheme especially for the compression attack showing that there is less alteration in the
Euclidian norm of the mid band DCT coefficients of the frame when the watermarked audio is
subjected to compression attacks.
3.4 Comparison with Other Schemes.
The result of imperceptibility and robustness of the proposed mid band based schemes are
compared with some contemporary schemes. Though, the direct comparison can’t be done as
there is no common test data for the cover audio and the watermark, we compared on the basis
of the result provided by them. The robustness result corresponding to the schemes in which
all the DCT coefficients are used shows a poor response against filtering and compression
attacks even at lower compression. So, the results are not explicitly presented here.
3.4.1 Comparison on the Basis of SNR
The proposed schemes are compared with DCT based schemes proposed by Dhar [97], echo
hiding based scheme by Bender [78], reduced SVD based scheme by Healy, Wang [114],
00.20.40.60.81
CC
Attack
Noise Addition(35dB)
Noise Addition(30dB)
75
SVD based scheme by Ali [119], SVD based scheme by Fathi [117] and time domain
embedding scheme by Shahriar [118].
Figure 3.15: Comparison on the basis of SNR for mid band scheme
3.4.2 Comparison on the Basis of Payload
One of the dimension or the requirement of the watermarking on the basis of which the
schemes can be compared is the payload expressed in bits per second i.e. bps .As
recommended by IFPI the payload should be more than 0.5 bps for copyright applications and
more than 20bps for other applications. Also adjustability requirement is imposed for using
the watermarking scheme for multiple applications. Both of our schemes met the adjustability
criteria in which variable rate of payload are achieved with a minimum of 86 bps. The
variable rate is achieved through the number of bits embedded that can be embedded on every
individual frame used for the process of watermarking. In our experiments, the frame size has
been fixed at 1024 samples as the compression using mp3 process the frames with a block
size of 1024. Frame size of 1024 adds robustness to the DCT coefficients against mp3
0510152025303540
SNR(dB
)
schemes
76
compression attack. Although, explicit results are not presented here for showing the effect of
frame size on the robustness against mp3 compression for both proposed schemes it is
inferred from the experiments that they work best for a frame size of 1024 samples.
The payload of our proposed schemes is compared with some contemporary and state
of the art schemes. The results are presented in the tabulated form next.
Table 3.1: Comparison on the basis of payload of schemes
3.4.3 Comparison on the basis of Robustness to Attacks
For comparing the robustness against attacks given in Section 3.4.2 we regulate the strength
factor to achieve the SNR in the range of 32 dB to 35 dB for our audios.
Scheme SNR (dB)
Payload (bps)
Dhar [2011] 22.15 N/A
Bender [1996] 21.47 N/A
Wang [2011] 27.23 187
Ali[2010] 25.24 N/A
E. Fathi[2009] 27.18 N/A
Shariar[2013] 23.93 61.7
Proposed with
Eucledian norm 35.21 86.13
Proposed with modified Mean 37.99 86.13
77
Table 3.2: Robustness comparison with other Schemes
Ours with modified mean
Ours with Euclidian Norm
Wang [ 2011] Fathi [2010]
Dhar DCT[2011]
Closed loop 1 1 0.94 1 1
Re-sampling to half 1 1 X X 1
Re-sampling to32KHz 1 1 X 0.88 X
Re-Quantization to 8 bit 1 1 1 1 1
Low pass filtering(11KHz) 1 1 X 0.94 1
High Pass filtering(100 Hz) 1 1 X 0.99 1
Amplify 5% 1 0.96 X X X
Amplify 10% 1 0.75 X X X
Invert 1 1 X X X
LSB Zero 1 1 X X X
Noise Addition 0.70 0.73 X 0.96 X
320 kbps 0.88 0.89 X X X
256 kbps 0.78 0.78 0.94 0.96 X
96 kbps 0.65 0.67 0.93 X X
48kbps 0 0.5 X X X
32kbps 0 0.48 X X X
The average of the correlation coefficients obtained from the different watermarked audio is
then used for comparison with the result given by different authors directly in table 3.2. In the
78
table 3.2, X depicts that corresponding to the attack category the results are not reported by
the author.
The analysis of the comparison show that our schemes have higher SNR and
are robust to attacks and out performs the other schemes except against the compression
attacks. Out of the two proposed schemes, the one which uses the Euclidian Norm of the mid
band DCT coefficients for watermark embedding i.e. Scheme 2 shows good robustness
against higher compression rate in addition to some other attacks. Scheme 1 shows better
robustness result against amplification and echo addition.
In the next section, we present a novel image watermarking scheme using the mid band DCT
coefficients. The idea is to watermark an image and audio separately so that it is used for
watermarking video.
3.5 Watermarking of Digital Image using Mid-band DCT
In this section, a DCT based additive image watermarking scheme is presented. The scheme
provides higher resistance to image processing attacks mainly JPEG compression. In our
approach the watermark is embedded in the mid frequency band of the DCT blocks only in
the sub band which is carrying low frequency components and the high frequency sub band
components remain untouched. The intent is to develop a high compression resistant image
watermarking scheme showing least blocking artifacts and when combined with audio
watermarking can be used for watermarking a video also.
In an image adjacent pixels generally have almost the same intensity value. Neighboring
pixel intensity can easily be predicted from a given pixel intensity. When the DCT is
performed on an image this correlation is removed thus helping us in embedding the
watermark in scattered form in different DCT coefficients.
The approach adopted by us is slightly different from the traditional mid band DCT based
schemes in which the mid band DCT coefficients are directly used for watermarking. Since
our major work is on audio watermarking in the theses, we are limiting the writing to the
description of our proposed scheme and also not explicitly comparing it with other image
watermarking schemes. Next section describes the watermark embedding and extraction
algorithm using modified mid band DCT based schemes.
79
3.5.1.1 Watermark Embedding Algorithm
The algorithm which is used to embed a watermark on an image is given below
Step 1: Segment the image I(i , j) into two sub band blocks with half the size of the
original image i.e I1(i/2,j) & I2(i/2,j). I1(i/2,j) gives the high intensity pixels block
& I2(i/2,j) give low intensity pixels block
I(i.j) =∑I(i/2,j)+I(i/2,j).
Step 2: Break the I(i,j/2) into blocks of size 8* 8.
Step 3: Find the 2- d DCT of each of the block.
Step 4: Private key is used to generate two pseudo random no sequences of domain
{-1, 0, 1} which are highly uncorrelated.
Step 5: Preprocess the watermark by converting the watermark into a binary
sequence.
W (m*n)->W( s * 1) where s=m*n
Step 6: Embed the watermark on each of the DCT block in the mid band of each
coefficient block using the pseudo random number sequence and the watermark
sequence.
Step 7: Inverse 2-d DCT operation is done on the blocks to obtain the averaged
image block again.
Step 8: Inverse operation of step 2 is done to obtain the watermarked image.
The insertion of the watermark in the mid band of the coefficient block of each averaged DCT
block gives extra robustness to the watermark. The use of the key gives security to the
watermarking system. As the watermark is embedded in the mid frequency band of the
transformed high pixel intensity image, robustness against JPEG attack is highly increased.
3.5.2 Watermark Extraction Algorithm
The steps involved in the watermark extraction algorithm are given below.
Take the watermarked image as the input along with the key used for watermark
embedding.
Step 1: Repeat step 1 and 2 of watermark embed algorithm
80
Step 2: Using the private key extract the watermark.
Step 3: Compare the watermark with the original watermark.
Step 4: Similar watermark will prove the authenticity.
The following section shows the results of the extracted watermark from the
watermarked image with some attacks. The percentage similarity between the original
watermark and the retrieved watermark is obtained through the correlation function.
3.5.3 Results and Discussions
For testing the robustness of the proposed algorithm 4 test images are used. The watermarked
image is subjected to compression at different quality factor in addition to the common image
processing attacks. The watermark from the compressed watermarked image is retrieved using
the extraction process. The percentage similarity between the extracted watermark and the
original watermark is calculated. The same procedure is repeated for different standard images
and results are taken in the form of SNR and CC. i.e. measuring correlation between original
and retrieved watermark at different quality factor (Q) .
Figure 3.16 gives the standard images used for testing the watermarking
algorithm. For embedding, two different watermarks are used which the same as used for
audio watermarking and are given in Figure 3.2
Figure 3.16 (a): Figure 3.16 (b): Figure 3.16 (c): Figure 3.16 (d):
Lena image Boat image Baboon image Barbara image
Figure 3.16: Standard test images [128]
81
The intermediate snapshot of the Lena image when undergone the embedding stage and
compression attack is shown in Figure 3.17.
(a) Original Lena image (b) Averaged Lena (c) Watermarked Lena (d) Compressed Lena
Figure 3.17: Lena image undergone different operations
The watermarked images are subjected to compression at different compression ratio
referenced as the quality factor. Lower the quality factor higher will be the compression.
Quality factor depicts the amount of compression and reflects the size of the compressed
image as compared to the original one. The following table gives the robustness against JPEG
compression for the three test images.
Table 3.3: Correlation Coefficient (CC) Vs Quality Factor
Quality Factor CC for Baboon CC for Lena CC for Boat CC for Barbara
5 0.75 0.935 0.89 0.92 10 0.77 0.959 0.90 0.95 20 0.79 0.9859 0.94 0.98 40 0.79 0.9859 0.93 0.98 45 0.79 0.9859 0.94 0.98 50 0.79 0.9859 0.94 0.98 60 0.80 0.9859 0.94 0.98 80 0.79 0.9894 0.94 0.98
With other common signal processing attacks the performance of the watermarking algorithm is checked for Lena image which is presented through the following table.
82
Table 3.4: Correlation coefficient (CC) for Lena image undergone different attacks
In the table 3.4, v refers to the variance & Q refers to the quality factor.
For all the attacking operations Adobe Photoshop is used and the code is built
in Mat lab 7. Corr2 function is used for finding the correlation between the original watermark
and the retrieved watermark. Imnoise function is used to introduce both types of noise i.e.
Gaussian, salt and pepper noise. The result shows that the method work best for the
compression attack and tolerant against common image processing attacks.
The watermarking techniques in which the watermark is embedded in transform
domains are typically better candidates for watermarking than spatial, for both reasons of
robustness as well as visual impact. Embedding in the DCT domain proved to be highly
resistant to JPEG compression as well as significant amounts of random noise. By anticipating
which coefficients would be modified by the subsequent transform and quantization, we were
able to produce a watermarking technique with moderate robustness, good capacity, and low
visual artifacts. But as all the DCT based images suffers from visual artifacts as DCT is done
Attacks CC Attacks CC Q=5 0.935 Blur
Factor=.20.9052
Q=10 0.959 Gaussian v=.01
0.9484
Q=20 0.9859 Gaussian v=.02
0.8844
Q=40 0.9859 Gaussian v=.04
0.7267
Q=45 0.9859 Gaussian v=.06
0.664
Q=50 0.9859 Gaussian v=.1
0.5384
Q=60 0.9859 Gaussian v=.15
0.3524
Q=80 0.9894 Salt & Pepper v=0.1
0.8936
Cropped 1
0.9859 Salt & Pepper v=0.15
0.8084
Cropped 2
0.9859 Salt & Pepper v=0.2
0.5476
83
on the blocks our approach is no exception. But the visual quality is good and it’s not with
noticeable change. Nonetheless, the visual quality can be improved by the slight introduction
of blur.
3.6 Summary
We proposed two digital audio watermarking algorithms based on DCT domain for mono
audios. One of the schemes uses quantization of the modified mean of the selected mid band
coefficient and the other uses the Euclidian norm of the selected mid band coefficients for
watermark embedding. In Scheme 1, for the robustness of audio watermark, energy threshold
is decided to consider selecting blocks which has minimum energy threshold. In scheme 2, all
the blocks, are used for watermark embedding and the watermark is preprocessed before
embedding through Baker’s map encryption which produce the permuted version of the
watermark. This preprocessing of the watermark introduces security of the watermark.
Further, estimation of the watermark becomes difficult without the possession of the security
key. Both the schemes uses mid band DCT coefficients and give better result on the approach
which uses all the coefficients. Experimental results have illustrated the robust and inaudible
nature of our embedding scheme as well. In addition, the watermark can be extracted without
the help from original audio signal and procedure for embedding and extraction can be easily
implemented. For the schemes, the SNR and the payload is far more than what is
recommended by IFPI.
At the end of chapter an image watermarking scheme which is using sub band
DCT for watermark embedding is proposed. The proposed scheme is a DCT based additive
watermarking scheme which provides higher resistance to image processing attacks mainly
JPEG compression. The watermark is embedded in the mid frequency band of the DCT blocks
only in the sub band which is carrying low frequency components and the high frequency sub
band components remain untouched.
84