[ieee tencon 2009 - 2009 ieee region 10 conference - singapore (2009.01.23-2009.01.26)] tencon 2009...

978-1-4244-4547-9/09/$26.00 ©2009 IEEE TENCON 2009

Speech Based Watermarking for Digital Images

Vandana S.Inamdar Department of Computer Engineering and IT

College of Engineering Pune, India

[email protected]

Priti. P. Rege Department of Electronics and Telecommunication

College of Engineering Pune, India

[email protected]

Aarti Bang Department of Electronics

Vishwakarma Institue of Information Technology Pune , India

[email protected]

Abstract— This paper presents a novel scheme of watermarking of digital images for copyright protection and authentication. In this paper we proposed a method of embedding owner’s speech signal. Speech being a biometric data, the watermark signal in this method is expected to be more meaningful and has closer correlation with copyright holder. The main issue of concern here is the capacity because the speech data has large number of samples. Linear predictive coding is used to encode the audio data. Here, speech samples are imperceptibly inserted using spread spectrum technique into the mid frequency band of wavelet transform of image which makes it robust to lossy compression. Applications for such a speech-hiding scheme include copy protection, authentication and covert communication. Keywords-Watermarking; copy right Protection; Biometric watermarking; Wavelet transform

I. INTRODUCTION The rapid growth of digital media and communication network has highlighted the need for Intellectual Property Rights (IRP) protection technology for digital multimedia. Watermarking of multimedia data has become a hotspot for research in recent years. Watermarking can be used to identify the owners, license information, or other information related to the digital object carrying the watermark. Watermarks can provide the mechanism for determining if a particular work has been tampered with or copied illegally [1]. In Digital watermarking, a low-energy signal is imperceptibly embedded in another signal. Watermarking systems can be characterized by properties such as imperceptibility, robustness and capacity. The importance of each property is dependent on the requirement of the application and the role the watermark will play. The process of embedding the watermark requires modifying the original image and in essence the watermarking process inserts a controlled amount

of distortion in the image. The recovery of this distortion allows one to identify the owner of the image. Invisible or transparent watermark makes use of the properties of the human visual system to minimize the perceptual distortion in the watermarked images.

II. RELATED WORK Early work on digital water marking focused on information hiding in the spatial domain [2],[3]. In spatial domain technique, the watermark casting is achieved by directly modifying the pixel values of the host image. Lee, Chen. proposed to insert watermark by changing least significant bit of some pixels in an image[4]. Recent efforts are mostly based on transform based techniques. The widely used transformations are DCT, Walsh-Hadamard, DFT, and wavelet transform. Bit replacement, data modulation and Spread Spectrum modulation are being used widely[1]. In particular, Cox [5] described a method where the watermark is embedded in large discrete cosine transform coefficients using ideas borrowed from spread spectrum in communication. Although lot of efforts have been made in the field, yet most of them embed a character string or logo as a copyright information. There are some limitations to these watermarks: 1) Usually they are less meaningful and intuitive for easily identifying. 2) Low correlative to copyright holder. The information of the holder is not inherent and may change with time. Using these as a watermark may lead to imitation, tamper and repudiation. Traditional watermarking method does not convincingly validate the claimed identification of the person as the host might be fraudulently watermarked with a particular string pattern or logo by impersonators [13]. Recently biometrics is adaptively merged into watermarking technology to enhance the credibility of the conventional watermarking methods [8], [14]. The motivation of the present work arises from developing a watermarking algorithm which

1

TENCON 2009

embeds speech as a biometric data. As a biometric of human being, speech is inherent and does not change along with time, it is universal and easily quantifiable [9]. Using speech as a watermark we can overcome the limitations mentioned above.

III. PROPOSED WATERMARKING TECHNIQUE Watermarking scheme consists of following steps:

A) Watermark preparation

B) Watermark Embedding

C) Watermarking Extraction.

It has been shown that the discrete wavelet transform is an effective choice for the spread spectrum method due to natural similarities between the space-frequency tiling of DWT and the perceptual characteristics of human visual system. Therefore the spread Spectrum technique in wavelet domain is used for watermarking.

A. Watermark Preparation The audio signal is first encoded using PCM encoding technique at 8 KHz sampling rate. As the Speech file consists of large number of samples we have applied linear predictive coding on speech to reduce the size of audio watermark. Linear Predictive coding is efficient in exploiting the parametric redundancy. The basic idea behind LPC analysis is that the speech samples can be approximated as a linear combination of past speech samples [10]. A linear predictor model forecasts the amplitude of a signal at time n, x(n), using a linearly weighted combination of p past samples x(n-1), x(n-2), …….x(n-k).

∑=

−=p

k

knxanx1

)()(ˆk (1)

Where integer variable n is the discrete time index, )(ˆ nx is the predictor of x(n) and ak are the predictor coefficients. The prediction error e(n), defined as the difference between the actual sample value x(n) and it’s predicted value )(ˆ nx , given by

)(ˆ)()( nxnxne −=

(2)

From equation (2), a signal generated or modeled by a linear predictor can be described by the following feedback equation.

)()()(1

neknxanxp

kk +−= ∑

=

(3)

In this model, the random input excitation i.e. the prediction error is e(n)=Gu(n), where u(n) is the zero-mean, unit-variance random signal, and G, a gain term, is the square root of the variance of e(n). Taking the Z transform of the above equation, the linear prediction model is an all-pole digital filter with Z-transfer function.

k

p

kk za

GzH−

=∑−

=

11

)( (4)

Thus the system is excited by an impulse train for voiced speech or a random noise sequence for unvoiced speech. Thus the parameters of this models are voiced/unvoiced classification, pitch period for voiced speech(time duration between consecutive excitation pulses), gain parameter G, and the coefficients {ak} of the digital filter and all these parameters are slowly varying with time. Autocorrelation method is used to estimate the predictor coefficients. By minimizing the sum of the squared differences over a finite interval between the actual speech samples and the linearly predicted ones, a unique set of predictor coefficients can be determined. Speech can be modeled as a output of a linear, time varying system excited by either quasi-periodic pulses(during Voiced Speech), or random noise (during unvoiced speech). The coefficients of the difference equation (prediction coefficients) characterize the formants of the speech signal, so the LPC system needs to estimate these coefficients. However the reconstructed sound with only LPC coefficient is metallic/robotic in perception. To improve the quality of reconstructed sound voice excited LPC is used[11]. The input speech signal in each frame is filtered with the estimated transfer function of LPC analyzer. The filtered signal is called residual. For a good reconstruction of the excitation signal at the decoder only the low frequencies of the residual signals are needed. The DCT of residual signal is taken and only first forty coefficients are transmitted. Sound synthesized using LPC and residual coefficients is very much natural and maintains high fidelity. Analysis Steps are 1. The input speech is segmented into overlapping frames of 20 msec. 2. A pre-emphasis filter is used to adjust the spectrum of the signal. 3. The voicing detector classifies the current frame as voiced or unvoiced and outputs one bit indicating the voicing state. 4. The pre-emphasized signal is used for LP analysis, where twelve LPC coefficients are derived for each frame 5. The quantized LPCs are used to build the prediction error filter, which filters the pre-emphasized speech to obtain the prediction-error signal at its output. 6. Forty residual coefficients are generated, which are transferred to decoder to construct more natural sound [11] With this, residual coefficients in addition to LPC coefficients are extracted which are embedded as a watermark. To increase the security of the algorithm, the coefficients are randomized using secret key. Hence it is difficult to identify the locations and remove the watermark for imposter.

B. Watermark embedding Fig.1 shows typical watermark model for still image. Watermark X is inserted into image I with secret key K. The

2

TENCON 2009

watermarked image Iw pass through distribution channel and may go through different attacks. The watermark is extracted with the same secret key. Watermarking scheme is based on wavelet transform domain. Single level decomposition using db2 filter is used. The two dimensional discrete wavelet transform decomposes an image in to four subbands namely LL, LH, HL, and HH. The decomposed subbands correspond to the coarse approximation, horizontal, vertical and diagonal details of the image signal respectively.

Figure 1. Overview of Watermarking system

The challenges are how to select the coefficients and which frequency band. Embedding watermark in low frequency band is most resistant to JPEG compression, blurring, adding noise, rescaling, sharpening while embedding in high frequency is most resistant to histogram equalization, intensity adjustment and gama correction [7]. Although embedding in low frequency band survives most of the attack, because of its high energy, any modification to its coefficients can be detected easily. We have selected mid frequency band either LH or HL for embedding. While selecting the coefficients in mid frequency band, coefficients which have large perceptual capacity should be selected because they allow stronger watermarks to be embedded and result in least perceptual distortion [6]. Large coefficients in this band are selected for watermarking. The embedding process consists of the following steps:

1) Apply the wavelet transform on image to decompose it into four bands

2) Select the LH subband of decomposed image and generate the perceptual mask that identifies the significant perceptual components (watermark indices) of the wavelet coefficients. The method employs the largest ‘N’ wavelet coefficients, where ‘N’ is chosen to be equal to length of watermark signal.

3) Insert the watermark into selected wavelet coefficients using additive multiplicative equation (5):

)1('iii xvv α+= (5)

where ix = the ith watermark value

iv = the original wavelet coefficients

iv ' = the modified wavelet coefficients α = the strength of the watermark. As the strength of watermark is increased the robustness of watermark increases and its recovery is good but the image suffers visible degradation. So, the value of the watermark strength is chosen as a trade off between perceptibility and robustness. 4) Generate the watermarked image Iw by applying the

inverse wavelet transform.

C. Watermark Extraction In order to detect the watermark, following watermarking algorithm is applied.

1) Apply the inverse wavelet transform on original image and watermarked image to get the four bands

2) Apply the perceptual mask (watermark indices) to extract the wavelet coefficients that contain the watermark. The perceptual mask used in the watermark insertion process identifies the locations of wavelet coefficients in the recovered image to look for watermark information. Extract the watermark by using the inverse of the insertion equation. The watermark is extracted by using equation (6)

α

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=

∗

1* i

i

i

vv

X (6)

Where *iX = the ith recovered watermark value

3) The extracted watermark is in randomized form. Generate the ordered watermark using the same secret key.

4) Extracted watermark are the LPC coefficients of speech. Pass those to LPC decoder to synthesize speech

5) Quantitative measure used to evaluate the fidelity of extracted watermark with original one is the Similarity Factor (S.F) which is defined by equation (7).

Watermark Key K

Watermark

Watermark Detector

Communication

Channel

Watermark Signal x

Watermark Key ,K

Watermark Embedder

Cover Signal I

Noise Attacks

Iw

X*

3

TENCON 2009

).(

).(),( **

**

XXsqrtXXXXsim = (7)

).(),(),(_

**

XXsqrtXXsimXXnormsim = (8)

IV. RESULTS Fig.2a shows the original image, watermark, watermarked image and recovered watermark without any attack. The watermarked image is subjected to common attacks such as JPEG Compression, cropping, histogram equalization, salt &pepper noise, unsharpening, weiner filtering and Gaussion noise which are simulated and checked for extraction. Fig.2b to 2h shows that watermark survives under different attacks. Fig.3 illustrates the variation of similarity factor with Q-factor for different lengths of speech watermark. As capacity is increased, the S.F decreases. Decrease in quality factor results in clear visible distortion of the image and decrease in Similarity Factor (S.F) Strength of Watermark (α ): Strength of watermark plays an important role in maintaining the trade off between robustness and imperceptibility. Fig. 4 shows variation of Similarity Factor with watermark strength. Table I readings are formulated on empirical results. The perceptual quality of recovered watermark is fair, and as original above Similarity Factor of 0.6. The recovered watermark is noisy but still intelligible for Similarity Factor between 0.4 to 0.6. However it does not make any sense if it below 0.4. As the strength of watermark is increased the robustness of watermark increases and its recovery is good but the image suffers visible degradation. TABLE I Similarity factor based on empirical result

Speech watermark recovery 0 ≤ S.F ≤ 0.4 Poor. 0.4<S.F ≤ 0.5 Presence of watermark Detection threshold T=0.4. 0.5<S.F ≤ 0.6 Average. 0.6<S.F ≤ 1 Good

Length of watermark: As watermark is embedded into one of the mid frequency band, only half of the image size coefficients are available for watermark casting. Obviously the length of watermark is upper bounded by the resolution of image. Higher is the resolution of image, more is the embedding capacity. Another factor that would determine the embedding capacity is the type of compression algorithm selected. Sixty percent compression of speech is achieved with this method.

(a) Host: Lena.jpg; speech watermark of 4 sec duration with S.F = 0.93 , SNR= 37.26db without any attacks

(b) JPEG compression with quality factor 50; S.F=0.45 SNR=14.851db

(c) Cropped image S.F=0.76; SNR=17.32db

4

TENCON 2009

(d) Histogram equalized image; S.F=0.85; SNR=12.01db

(e) salt and pepper noise; S.F=0.51 SNR=14.851db

(f) unsharpened image , S.F=0.831 SNR=13.83 db

(g) weiner filtering , S.F=0.61 SNR=25.3 db

(h) Gaussian Noise, S.F=0.40,SNR 13.15db

Figure 2 Watermarked images and recovered watermark

Figure 3 Variation of Similarity factor with Quality factor

5

TENCON 2009

Figure 4 Variation of Similarity Factor with Watermark

V. Conclusion The performance of the proposed scheme is illustrated by embedding speech data of different lengths within five different images to produce stegoimages. When these stegoimages are decoded, the speech data is completely recoverable and intelligible. In addition, the system’s ability to cope with added noise and compression of the stegoimage has been exhibited. However the proposed algorithm is fragile against scaling and rotation. Though few researchers [15] have attempted to embed speech as a watermark in image, reconstruction of speech is not addressed. In this method , speech is reconstructed from recovered watermark. Thus the copyright holder can make the use of the meaning and features of the speech to prove the copyright. The results from experiments show impressive quality of the watermarked image. Furthermore, when the embedded LPC Coefficients are extracted and passed to decoder, after being attacked by some common signal processing operations, the listening test is performed to evaluate the intelligibility contained in the synthesized speech. The perceptual quality of synthesized speech is verified which matches with original. Future work should be aimed towards increasing additional capacity. Security of the algorithm can be increased by using cryptography techniques with watermarking. The work can be extended to video watermarking.

REFERENCES [1] Hartung F, Kutter M, “ Multimedia watermarking Technique,” IEEE

proceeding on Signal Processing,” vol.87, No.7, pp.1079-1107, July 1999

[2] P.S.. Huang, C.S.Chiang, C.P.Chang and T.Mu,” Robust spatial watermarking technique for color images via direct saturation adjustment ,” Vision, Image and signal processing, IEE proceedings, vol. 152, pp561-574,2005

[3] S.Kimpan,A.Lasakul and S.Chitwong,” Variable block size based adaptive watermarking in spatial domain,” IEEE international symposium on Communication and Information technology, pp 374-377, 2004

[4] Y.K.Lee and L.H.Chen,”High capacity Image steganographicModel,” Vision, Image and Signal processing, IEE Proceedings,vol. 147, pp288-294,2000

[5] Cox, J. Kilian, F. Leighton, and T. Shamoon, “Secure spread Spectrum watermarking for multimedia,” IEEE Trans. Image Processing, vol. 6, pp. 1673–1687, Dec. 1997.

[6] Wenwu Zhu, Zixiang Xiong ,and, Ya-Qin zhang,” Multiresolution watermarking of images and video,” IEEE transactions on circuits and systems for video technology, vol. l 9, No.4, June 1999, PP.5 -550

[7] Chirawat Temi, Somsak Choomchuay and Attasit Lasakul,” Robust Image Watermarking Using Multiresolution Analysis of Wavelet,” in IEEE proceedings of ISCIT2005

[8] Anil K Jain, Umut uludag, “ Hiding Biometric data,” IEEE transaction on pattern analysis and Machine intelligence, vol. 25, No.11, pp.1494-1498, November 2003

[9] Anil K .Jain, Arun Ross, Salil Prabhakar,“An introduction to Biometric Recognition,” IEEE transaction on Circuits and system for video technology vol. 14, No.1, pp.4-20, January 2004

[10] L.R. Rabiner and R.W.Schafer,” Digital Processing of Speech Signal,” Prentice Hall

[11] D.T Magill and C.k.Un,” Residual excited Linear Predictive Coder,” The Journal of acoustic society of America, April 1974,vol .55, Issue S1,pp S81

[12] Haiqing Wang, Xinchun Cui and Zaihui Cao,“A speech based Algorithm for Watermarking of Relational Databases,” in IEEE international Symposiums on Information Processing

[13] Cheng-Yaw Low, Andrew Beng-Jin Teoh, Connie Tea, “ Fusion of LSB and DWT Biometric watermarking for offline handwritten signature,” 2008 Congress on Image and signal processing , IEEE Computer Society, pp.702-708, 2008

[14] Cheng-Yaw Low,Andrew Beng-Jin Teoh, Connie Tea,”Support Vector Machines(SVM) based Biometric Watermarking for Offline Handwritten Signature,” in proceedings of 3rd IEEE conference on Industrial Electronics and application, ICIEA 2008

[15] Mayank Vasta, Richa Sigh, Afzel noore ,“ Feature based RDWT watermarking for multimodal biometric system,” Science Direct, Image and vision Computing vol. 27 (2009)293-304

6

[ieee tencon 2009 - 2009 ieee region 10 conference - singapore (2009.01.23-2009.01.26)] tencon 2009...

Documents