chapter 7 information hiding in audio...

Information Hiding in Image and Audio Files 2007-2010

203

CHAPTER 7

INFORMATION HIDING IN AUDIO FILES

This Chapter deals with Hiding Information in Audio files. Today‟s

technology allows the copying and redistribution of audio files over

the Internet at a very low or almost no cost. So it is necessary to

have methods that confines access to these audio files and also for

its security. Therefore, usually Information is embedded in Audio

files for the purpose of copyright protection or for authentication of

digital media. In a computer-based audio Steganography system,

secret messages are embedded in digital sound [102]. In Audio

Steganography, the weakness of the Human Auditory System (HAS)

is used to hide information in the audio [101].

In the past few years, several algorithms for the embedding and

extraction of messages in audio sequences have been proposed. All

of the developed algorithms exploit the characteristics of the human

auditory system (HAS) in order to hide data into the host signal in a

perceptually transparent manner [103]. However, embedding secret

messages in digital sound is usually a more difficult process than

embedding messages in other media, such as digital images [102].

In addition, the amount of data that can be embedded transparently

into an audio sequence is considerably lower than the amount of

data that can be embedded in images or video sequences as an

audio signal has a dimension less than two-dimensional image or

video files. [103]. On the other hand, many attacks that are

malicious against image steganography algorithms (e.g. geometrical

distortions, spatial scaling, etc.) are not applicable to audio

steganography schemes [103]. Embedding information into audio

seems more secure due to less steganalysis techniques for

attacking to audio [103]. Furthermore, Natural sensitivity and

difficulty of working on audio resulted in far less algorithms and


204

techniques as compared to images. Existing Audio Steganography

schemes can embed messages in WAV, AU, AIFF and even MP3

sound file formats.

Information hiding in audio signals has wide range of applications.

The most important and obvious application of Audio

Steganography is covert communication using innocuous cover

signals, like a telephone conversation [104]. Another application,

known as (digital) watermarking [98], refers to embedding an

unobtrusive mark into an object, which can be used to identify the

object or act as a copyright protection of digital media. For

example, a digital watermark [98] can be inserted into a piece of

music so that it can be monitored automatically for payment

purposes [4]. One of the applications provides a mechanism for

embedding important control, descriptive or reference information

in a given signal. This information can be used for tracking the use

of a particular clip, including billing for commercials and audio

broadcast. It can be used to track audio creation, manipulation and

modification history within a given signal without the overhead

associated with creating a separate header or history file. It can

also be used to track access to a given signal. This information is

important in rights management applications [104].

7.1. Characteristics

An effective audio steganographic scheme should possess the

following three characteristics: Inaudibility of distortion (Perceptual

Transparency), Data Rate (Capacity) and Robustness. Figure 7.1

gives the simplest visualization of the requirements of Information

hiding in digital audio, so called the magic triangle: these three

requirements forms the corners of the magic triangle [106].


205

Figure 7.1 Magic Triangle for Data Hiding

Inaudibility of distortion: It evaluates the audible distortion due

to signal modifications like message embedding or attacking. The

data hiding scheme has to insert additional data without affecting

the perceptual quality of the host audio signal [103].

Robustness: It measures the ability of the embedded data to

withstand against intentional and unintentional attacks.

Unintentional attacks generally include common data manipulations

such as re-sampling, re-quantization etc. Intentional attacks include

addition of noise, resizing, rescaling etc [103].

Data Rate (Capacity): It refers to the amount of information that

a data hiding scheme can successfully embed without introducing

perceptual distortion. In other words the bit rate of the message is

the number of the embedded bits within a unit of time and is

usually given in bits per second (bps) [103].

7.2. Overview of Human Auditory System

The human auditory system (HAS) operates over a wide dynamic

range. When using digital images as cover files the difficulty of the

human eye to distinguish colors is taken advantage of, similarly,

when using digital audio one can count on the different sensitivity of

the human ear when it comes to sounds of low and high intensity;


206

usually, higher sounds are perceived better than lower ones and it

is thus easier to hide data among low sounds without the human

ear noticing the alteration [107]. In addition, there are some

environmental distortions so common as to be ignored by the

listener in most of the cases [108]. Such are the weaknesses of

HAS that can be exploited for addition of data in audio signals.

The effects of human auditory system (HAS) relative to

Steganography are temporal masking and frequency masking. In

temporal masking, a weaker audible signal on either side (pre and

post) of a strong masker becomes imperceptible. Similarly, in

frequency masking, if two signals occurring simultaneously are close

together in frequency, the stronger masking signal may make the

weaker signal inaudible [109].

7.2.1 Digital Audio Files

There are two critical parameters to most digital audio

representations: sample quantization method and temporal

sampling rate. The most popular format for representing samples of

high-quality digital audio is a 16-bit linear quantization e.g.;

Windows Audio-Visual (WAV) and Audio Interchange File Format

(AIFF). Popular temporal sampling rates for audio include 8 kHz

(kilohertz), 9.6 kHz, 10 kHz, 12 kHz, 16 kHz, 22.05 kHz and 44.1

kHz. Sampling rate impacts data hiding in that it puts an upper

bound on the usable portion of the frequency range. Generally, the

higher the sampling rate is, the higher the usable data space [108].

There are three problems which need to be considered while dealing

with audio files [109]:

Audio files in the Microsoft .wav (dot wave) format, the range

is mapped to (-1, 1). For processing these signals, each value

must be converted to integer format.


207

Human auditory system (HAS) is more sensitive than Human

visual system (HVS). Variations in the audio signal will be

easily perceived.

It is harder to manipulate audio signals than the digital

images.

Matlab supports two audio file formats: WAV and AU audio files.

WAV files: WAV (or WAVE) is also known as Audio for Windows. It is

a Microsoft and IBM audio file format standard for storing an audio

bit stream on PCs. It is the main format used on Windows systems

for raw and typically uncompressed audio. The usual bit stream

encoding is the Pulse Code Modulation (PCM) format. It supports

multi-channel data, with up to 32 bits per sample.

Au files: The Au file format is a simple audio file format introduced

by Sun Microsystems. Reading an .au file returns amplitude values

in the range [-1, +1]. It supports multichannel data in the following

formats:

8-bit mu-law

8-, 16-, and 32-bit linear

Floating-point

Here, WAV audio files are used for following reasons [110]:

It is the main format used on Windows systems for raw and

typically uncompressed audio.

It could digitize sounds 100% faithful to the original source,

thus maintaining maximum audio quality.

The wav file is very easy to edit and manipulate.

http://en.wikipedia.org/wiki/Microsoft

http://en.wikipedia.org/wiki/International_Business_Machines

http://en.wikipedia.org/wiki/Audio_file_format

http://en.wikipedia.org/wiki/Personal_computer

http://en.wikipedia.org/wiki/Microsoft_Windows

http://en.wikipedia.org/wiki/Linear_pulse_code_modulation

http://en.wikipedia.org/wiki/Audio_file_format

http://en.wikipedia.org/wiki/Sun_Microsystems

http://en.wikipedia.org/wiki/Microsoft_Windows


208

7.3. METHODS OF AUDIO STEGANOGRAPHY

Some commonly used methods of audio steganography are listed

and discussed below in brief.

1. Least Significant Bit (LSB) Coding

2. Parity Coding

3. Phase Encoding

4. Spread Spectrum

5. Echo Data Hiding

7.3.1 Least Significant Bit (LSB) Coding :It is one of the earliest

techniques studied in the information hiding of digital audio (as well

as other media types) is LSB coding. In this technique LSB of binary

sequence of each sample of digitized audio file is replaced with

binary equivalent of secret message [111,112]. The capacity is only

one bit per sample of the cover audio which could be less for many

applications.

7.3.2 Parity Coding: In[114] this method, Instead of breaking a

signal down into individual samples, the parity coding method

breaks a signal down into separate regions of samples and encodes

each bit from the secret message in a sample region's parity bit. If

the parity bit of a selected region does not match the secret bit to

be encoded, the process flips the LSB of one of the samples in the

region. Advantage: The sender has more of a choice in encoding the

secret bit, and the signal can be changed in a more unobtrusive

manner. Disadvantage: This method like LSB coding is not robust in

nature. The capacity remains the same as that of LSB method.

7.3.3 Phase Coding: Phase coding [114] relies on the fact that the

phase components of sound are not as perceptible to the human ear


209

as noise is. It “works by substituting the phase of an initial audio

segment with a reference phase that represents the data. The

phase of subsequent segments is then adjusted in order to preserve

the relative phase between segments”. Disadvantage: It is a

complex method and has low data transmission rate

7.3.4 Spread Spectrum (SS): It [95, 111] attempts to spread out

the encoded data across the available frequencies as much as

possible. This is analogous to a system using an implementation of

the LSB coding that randomly spreads the message bits over the

entire sound file. However, unlike LSB coding, the SS method

spreads the secret message over the sound file‟s frequency

spectrum, using a code that is independent of the actual signal. As

a result, the final signal occupies a bandwidth in excess of what is

actually required for transmission. Advantage: It offers moderate

data transmission rate while maintaining a high level of robustness.

Disadvantage: It can introduce noise into a sound file.

7.3.5 Echo data hiding : In [111]Text can be embedded in audio

data by introducing an echo to the original signal. The data is then

hidden by varying three parameters of the echo: initial amplitude,

decay rate, and offset. If only one echo is produced from the

original signal, then only one bit of information could be encoded.

7.4. TIME - DOMAIN METHODS

Data hiding in the least significant bits (LSBs) of audio sample in

the time domain is one of the simplest algorithms with very high

data rate of additional information [115]. This section discusses the

methods that have been implemented which use either least

significant bit or multiple least significant bits for hiding data. The

first method is simple LSB coding method in which only LSB, 2

LSB‟s, 4 LSB‟s and 8 LSB‟s have been used for hiding information in


210

the cover audio. The next two methods combine the LSB coding

method along with encryption method for hiding data. These two

methods ensure additional security of the hidden or secret data in

the host audio signal. The last method deals with different and

novel ways been proposed to increase the capacity of cover audio.

These methods increase the capacity of cover audio and also

maintain the perceptual quality of the audio signal.

The common performance measures used for all these methods are

MSE (Mean Squared Error), PSNR (Peak Signal-to- Noise Ratio) and

SNR (Signal-to-Noise Ratio). In addition to this, subjective listening

tests are also been performed to test the quality of the host audio

signals.

In all of these methods, the steps for data embedding and data

extraction procedures are explained considering an audio signal as

the secret message. Similar steps are followed when an image or

text is used as secret messages except that during extraction

procedure in case of images, every 8 bits will be transformed to a

byte and in case of text, every 7 bits will be transformed to form a

decimal value corresponding to a character.

For experimental results, 10 cover audio clips are used with 8 secret

messages. The audio clips are mono, wav audio files from different

genres of music (animal, speech, vocal, music) represented by 16

bits per sample. These 10 cover audio signals are with varying

sampling rates and varying duration as: Guitar (44100 Hz, 2

seconds), Triangle (44100 Hz, 4 seconds), Bugle (11025 Hz, 9

seconds), Speech1 (22050 Hz, 3 seconds), Speech (22050 Hz, 3

seconds), Alice (44100 Hz, 3 seconds), Trance (44100 Hz, 4

seconds), Echo (44100 Hz, 4 seconds), Birds (11025 Hz, 7 seconds)

and Faces (44100 Hz, 2 seconds).


211

Out of these 8 secret messages, 3 are audio clips, 2 are text files

and the remaining 3 secret messages are gray-scale images. The 3

audio clips as secret messages are of very short duration viz.,

Pingpong (22050 Hz), Chimes (11025 Hz) and Newmail (22050 Hz).

The text messages are taken from the excerpts of 2 famous

personalities viz., Abraham Lincoln and our former President Dr. A.

P J Abdul Kalam. These two text files have 2027 characters and

6826 in length respectively. The 3 images used are of standard

Lena and Baboon images along with a Logo image with 128*128

dimensions each.

Subjective quality evaluation of these methods has also been

carried out by performing listening tests involving ten people.

7.4.1 Least Significant Bit Coding:

Steps for Data embedding:

1. Read the cover audio file and a copy of the file is generated,

which can then be modified.

2. Read the secret message to be hidden, its size less than the

size of the cover audio signal (1/16 times the cover size if

only 1 LSB is to be used). Convert it into a binary sequence of

message bits.

3. The LSB of each sample of cover audio is replaced with the

message bits.

4. The modified cover samples are then written to a file forming

stego audio signal.

Steps for Data Extraction/ Reconstruction:

1. Read the stego audio file.


212

2. Extract the LSB of each sample of the audio file.

3. After every such 16 least significant bits are retrieved, they

are converted to their decimal equivalents.

4. Finally the secret signal is reconstructed.

Similar steps for embedding and extraction is applied for using 2

LSB‟s, 4 LSB‟s and 8 LSBs for hiding data.

7.4.2 LSB Coding with Encryption

Here instead of direct substituting the LSB the secret bit is first

encrypted and then is inserted in the Cover File. Later while

retrieving the decryption algorithm as to applied to get back the

secret message.

A. Considering Parity :

This method [48]combines the LSB coding and encryption technique

for hiding data in cover audio. In this method, instead of directly

replacing LSBs of digitized samples of cover audio with the message

bits, the method first checks the parity of the samples and then

carries out data embedding. The process of data embedding and

data retrieval is same as the method given in Section 3.2 of chapter

3. The only difference is here the parity for 16 bits is considered

instead of for 8 bits of the cover signal. The reason is that the

sample size is 16 bits.

B. Using XOR operation on LSB’s

This method [48] also like the earlier method combines the LSB

coding and encryption technique for hiding data in cover audio. This

method performs XOR operation on the LSBs and then depending

on the result of XOR operation and the message bit to be

embedded, the LSB of the sample is modified or kept unchanged.

The process of data embedding and data retrieval is same as the


213

method given in Section 3.1 of chapter 3. The only difference is

here 16 bits are considered instead of for 8 bits of the cover signal.

Here XOR operation is performed on 2 LSBs. The XOR operation can

be further expanded to 3 LSBs, 4 LSBs upto 16 LSBs so as to

increase the level of encryption. Figure 7.1 gives the tabular

representation of the data embedding process.

Table 7.1 Procedure for data embedding

LSB Bit next to LSB

XOR result

Action if message bit

is 0

Action if message bit

is 1

0 0 0 No Change Flip LSB

0 1 1 Flip LSB No Change

1 0 1 Flip LSB No Change

1 1 0 No Change Flip LSB

Results and Discussions:

Experiments have been carried out for Least Significant Bit method

by utilizing LSB, 2LSB‟s, 4 LSB‟s and 8 LSB‟s respectively of cover

samples for hiding data (audio, text, image). Table 7.2 gives the

results of the LSB method. Table 7.3 gives the results of the 2 LSB‟s

method. Table 7.4 gives the results of the 4 LSB‟s method. Table

7.5 gives the results of the 8 LSB‟s method. The entries for MSE,

PSNR and SNR for all cover signals in all these methods are the

average values taken considering all the secret messages

embedded in the cover audio.

Table 7.2 Results of LSB method

Cover MSE PSNR db SNR db BER

Guitar 3.58E-10 191.50 71.23 0.02404

Triangle 2.64E-10 193.32 69.47 0.01775

Bugle 3.49E-10 191.68 70.93 0.02342

Speech1 3.78E-10 191.09 87.69 0.02536

Speech2 3.78E-10 191.10 86.32 0.02535

Alice 2.84E-10 192.93 80.65 0.01906

Trance 2.68E-10 182.13 67.65 0.01804

Echo 2.69E-10 193.22 81.17 0.01805

Birds 3.67E-10 191.34 70.29 0.02463

Faces 3.35E-10 192.02 81.34 0.02245

Average 3.25E-10 191.03 76.67 0.02182

Remark: The maximum SNR value obtained is 87.69 db which is for a speech

signal when only LSB of cover sample is replaced by message bits


214

Table 7.3 Results of LSB2 method

Cover MSE PSNR SNR BER

Guitar 1.37E-09 186.22 65.95 0.03567

Triangle 9.37E-10 188.52 64.66 0.02363

Bugle 1.30E-09 186.55 65.81 0.03374

Speech1 1.57E-09 185.46 82.05 0.04136

Speech2 1.60E-09 185.36 80.58 0.04196

Alice 1.00E-09 188.22 75.94 0.02564

Trance 9.48E-10 188.55 62.96 0.02415

Echo 1.01E-09 187.89 75.85 0.02411

Birds 1.45E-09 185.92 64.87 0.03774

Faces 1.18E-09 187.12 76.45 0.03057

Average 1.24E-09 186.98 71.51 0.03186

Remark: The maximum SNR value obtained is 82.05 which is for a speech

signal when 2 LSB’s of cover sample are replaced by message bits



Guitar 1.79E-08 176.03 55.76 0.04756

Triangle 1.08E-08 178.33 54.48 0.02686

Bugle 1.69E-08 176.37 55.62 0.04462

Speech1 2.05E-08 175.20 71.79 0.05538

Speech2 2.06E-08 175.11 70.33 0.05576

Alice 1.17E-08 178.39 66.11 0.03013

Trance 1.07E-08 178.77 53.18 0.02763

Echo 1.29E-08 177.10 65.05 0.02753

Birds 1.90E-08 175.70 54.66 0.05065

Faces 1.51E-08 177.08 66.40 0.03963

Average 1.56E-08 176.81 61.34 0.04057


signal when 4 LSB’s of cover sample are replaced by message bits



Guitar 2.79E-06 154.64 34.36 0.05408

Triangle 1.52E-06 156.58 32.74 0.02682

Bugle 2.56E-06 155.04 34.28 0.04941

Speech1 3.52E-06 153.74 50.24 0.06873

Speech2 3.58E-06 153.56 48.78 0.06955

Alice 1.56E-06 157.23 44.94 0.03011

Trance 1.44E-06 157.48 31.89 0.02763

Echo 1.89E-06 155.45 43.41 0.02743

Birds 3.04E-06 154.25 33.22 0.05909

Faces 2.14E-06 155.80 45.13 0.04148

Average 2.40E-06 155.37 39.90 0.04543


signal when 8 LSB’s of cover sample is replaced by message bits


215

From listening tests, it has been observed that till 4 LSB‟s been

used for data embedding; there is no audible distortion in the host

audio signal. Thus, the perceptual quality of the host audio is good.

However, as more than 4 LSBs is used for hiding data, there is a

hissing sound introduced in the host audio signal. Thus, the

perceptual quality of the host audio signal deteriorates as the

number of LSB‟s are increased for hiding data.

Table 7.6 gives the results of the method considering parity. In this

table, the values against the entries named audio, text and image in

the secret column represents the average values of the three secret

audio clips, 2 text files and 3 images respectively for each cover

signal.

Figure 7.2 is the plot of the audio signal (Pingpong) which is the

secret message hidden in the cover audio signals. Figure 7.3 is the

plot of the secret signal retrieved according to the extraction

process given in 7.4.2 A. It can be seen from both these figures

that there is no difference between the original and the retrieved

message, thereby assuring that the recovery is 100%.

Figure 7.4 is the plot of the message retrieved when the LSBs of the

stego signal are extracted directly. This indicates that the direct

extraction of LSBs will only result in noise if embedding is done

using parity method, thereby increasing security.Figure 7.5 shows

the plot of image of Lena which was used as a secret message.

Figure 7.6 shows the plot of retrieved image of Lena using the

extraction process mentioned in 7.4.2 B.

Figure 7.7 shows the plot when the LSBs of the stego audio signal

are extracted directly without applying the extraction process which

clearly indicates that it results in noise.


216

Table 7.6 Results of Proposed method (considering parity)

Cover Secret MSE PSNR SNR BER

Guitar

audio 4.49E-10 189.81 69.53 0.03015

Text 1.60E-10 195.02 74.74 0.01074

image 4.22E-10 189.64 69.37 0.03126

Triangle

audio 3.80E-10 190.81 66.95 0.02556

Text 7.91E-11 198.1 74.25 0.00529

image 3.79E-10 191.07 67.22 0.02248

Bugle

audio 4.38E-10 189.93 69.17 0.02941

Text 1.46E-10 195.46 74.7 0.00977

image 4.66E-10 189.63 68.88 0.03131

Speech1

audio 4.66E-10 189.64 86.23 0.03129

Text 2.01E-10 194.04 90.65 0.01349

image 4.65E-10 189.65 86.25 0.03117

Speech2

audio 4.67E-10 189.63 84.85 0.03133

Text 2.04E-10 193.98 89.2 0.01367

image 4.33E-10 189.66 84.87 0.03115

Alice

audio 3.89E-10 190.64 78.36 0.02612

Text 8.84E-11 197.62 85.34 0.00593

image 3.66E-10 190.58 78.30 0.02518

Trance

audio 3.82E-10 190.77 65.18 0.02561

Text 8.15E-11 197.98 72.39 0.00546

image 3.44E-10 190.94 65.35 0.02316

Echo

audio 3.86E-10 190.73 78.69 0.02587

Text 8.21E-11 197.94 85.9 0.00551

image 3.86E-10 190.96 78.91 0.02307

Birds

audio 4.61E-10 189.69 68.65 0.03092

Text 1.75E-10 194.63 73.59 0.01176

image 4.66E-10 189.64 68.6 0.03128

Faces

audio 4.17E-10 190.19 79.52 0.02795

Text 1.22E-10 196.22 85.55 0.00818

image 3.10E-10 189.64 78.97 0.03125

Average 3.20E-10 192.14 76.67 0.02185

Remark: From this method, the maximum SNR value obtained is 90.65 for a

speech signal. The average of the average of these values for SNR comes out to

be 76.67 which is equivalent to the SNR value obtained from standard LSB

method as shown in Table 7.2.

Figure 7.2 Plot of secret signal used in parity method

Figure 7.3 Plot of secret signal retrieved using the parity method

Figure 7.4 Plot of signal retrieved by extracting LSBs directly which looks like noise

Figure 7.5 Original Secret Image

Figure 7.6 Retrieved Secret Image using Parity method

Figure 7.7 Image retrieved by extracting LSBs directly which looks like noise


217

Table 7.7 gives the results of the method considering XOR method.

Table 7.7 Results of XOR method


Guitar

audio 4.50E-10 189.8 69.53 0.03021

Text 1.60E-10 195.04 74.77 0.01073

image 4.66E-10 189.64 69.37 0.03127

Triangle

audio 3.78E-10 190.83 66.98 0.02539

Text 8.03E-11 198.04 74.2 0.00536

image 3.36E-10 191.06 67.21 0.02257

Bugle

audio 4.36E-10 189.95 69.19 0.02928

Text 1.47E-10 195.41 74.66 0.00984

image 4.65E-10 189.65 68.90 0.03119

Speech1

audio 4.67E-10 189.63 86.23 0.03132

Text 2.01E-10 194.06 90.65 0.01346

image 4.66E-10 189.64 86.23 0.03127

Speech2

audio 4.66E-10 189.64 84.85 0.03128

Text 2.05E-10 193.94 89.16 0.01376

image 4.67E-10 189.63 84.85 0.03135

Alice

audio 3.88E-10 190.65 81.33 0.02603

Text 8.87E-11 197.59 85.31 0.00595

image 3.74E-10 190.59 78.31 0.02513

Trance

audio 3.81E-10 190.78 65.19 0.02556

Text 8.12E-11 198 72.41 0.00545

image 3.46E-10 190.93 65.35 0.02321

Echo

audio 3.74E-10 190.84 78.8 0.02509

Text 8.28E-11 197.91 85.87 0.00555

image 3.49E-10 190.90 78.86 0.02339

Birds

audio 4.61E-10 189.68 68.64 0.03096

Text 1.75E-10 194.65 73.61 0.01174

image 4.66E-10 189.64 68.6 0.03127

Faces

audio 4.17E-10 190.18 79.51 0.02799

Text 1.22E-10 196.22 85.55 0.00817

image 4.65E-10 189.64 78.97 0.03124

Average 3.25E-10 192.13 76.76 0.02184

Remark: From this method, the maximum SNR value obtained is 90.65 for a

speech signal. The values for MSE, PSNR and SNR obtained from this method are

almost equal to the values obtained from parity method as shown in Table 7.6.

7.4.3 Methods to Increase the Capacity of Cover Audio

The use of only one LSB of the host audio sample gives a capacity

equivalent to the sampling rate which could vary from 8 kbps to

44.1 kbps (if all samples used). However, adjusting of LSBs of audio

samples introduces noise that becomes audible as number of LSBs

used for hiding data increases [114]. Thus, there is a limit for the

depth of the LSB layer in each sample of host audio that can be

used for data hiding. It is seen that the maximum number of bits

that can be used for LSB audio steganography without causing

noticeable perceptual distortion to the host audio signal is 4 LSBs, if

16 bits per sample audio sequences are used [113].


218

Thus, the methods proposed in this section attempt to increase the

capacity of the cover audio while maintaining the perceptual quality

of the host audio. The first method proposed in this section is based

on the magnitude of samples of cover audio. Depending on the

magnitude values, multiple and variable LSB‟s are used for data

hiding. Experimental results show that this method does not give

good results in terms of either increasing the capacity of the cover

audio or in maintaining the perceptual quality of the host audio.

This motivated to come up with better approaches as compared to

not only the earlier method but also the other existing approaches.

Here, three novel methods have been proposed.

The next method is an extension of the XOR method discussed in

7.4.2(B). In this, the XOR operation is performed on different

combinations of bits in the samples of cover audio and then 8 LSB‟s

of each cover sample is used for hiding data. The last 2 methods

are based on checking the Most Significant Bits MSB‟s of the

samples of cover audio and depending upon the values of MSB‟s of

the corresponding samples, the number of LSB‟s for data hiding is

decided. In this way, multiple and variable LSB‟s are used for

embedding secret data. These proposed methods remarkably

increase the capacity for data hiding as compared to the standard

LSB method without causing any noticeable perceptual distortion to

the host audio signal. In all these methods, the increase in capacity

of cover audio is compared with the original capacity which is

considered to be 4 LSB‟s for each sample of the cover audio.

A. Considering Magnitude of Samples of Cover audio

In this, multiple and variable number of LSBs are used for hiding

data based on the magnitude of the sample values.


219

It is observed that the magnitude of the samples of cover audio is

such that a maximum of 4 LSBs can be used for hiding. Thus, in

order to implement the proposed approach, the cover audio

samples are multiplied by a constant factor of 2. This is done so as

to increase the magnitude of the samples. Depending on the

magnitude of the samples, number of LSB‟s is decided to be used

for hiding purpose.

If the first 6 MSBs are ones, then use all 6 LSBs for data

embedding.

If the first 5 MSBs are ones, then use all 5 LSBs for data

embedding.

If the first 4 or less than 4 MSBs are ones, then use all 4 LSBs

for data embedding.

Here P1, P2 and P3 are the probabilities of the samples with 6 MSBs

as ones, 5 MSBs as ones and 4 or less than 4 MSBs as ones

respectively.

Here C = 4 bits per sample

B. Considering the XOR operation:

The primary merit of the XOR operation is that it is simple to

implement and it is computationally inexpensive. Hence, the LSB

coding using XOR method discussed in 7.4.2 (A) can be extended

further by utilizing more than just the LSB for data embedding. The

method can be modified so as to utilize multiple LSBs for data

embedding.

Capacity by the proposed method: CP = P1*6 + P2*5 + P3*4 (1)

Percentage Increase in Capacity = (CP/C)*100 (2)


220

One such approach proposed here uses 8 LSBs considering XOR

operation on different combinations of bits. In this, XOR is

performed on 16th bit and 8th bit, 15th bit and 7th bit, 14th bit

and 6th bit, 13th bit and 5th bit, 12th bit and 4th bit, 11th bit and

3rd bit, 10th bit and 2nd bit, and 9th bit and 1st bit. Depending

upon the result of XOR operation and the message bit to be

embedded, the 8 LSB‟s of the digitized sample of cover audio can be

used for data embedding. This increases the capacity of the cover

for data embedding by 8 times as compared to the earlier method

considering XOR operation which uses only LSB for hiding the data.

To clearly understand the above mentioned approach, an example is

presented below.

Consider the bits in the binary representation of a sample of cover

audio and the message bit to be embedded is as given below. Table

7.8 gives the tabular representation of the procedure for data

embedding using above approach.

Original Sample bits: 1000000000000001

Message bits: 10100010

Table 7.8 Data embedding procedure for multiple LSBs

Bit 1 Bit 2 XOR result

Message Bit

Action

1 (16) 0 (8) 1 1 No change

0 (15) 0 (7) 0 0 No change

0 (14) 0 (6) 0 1 Flip Bit 2

0 (13) 0 (5) 0 0 No change

0 (12) 0 (4) 0 0 No change

0 (11) 0 (3) 0 0 No change

0 (10) 0 (2) 0 1 Flip Bit 2

0 (9) 1 (1) 1 0 Flip Bit 2

Modified Sample bits: 1000000000100010

In the above table, the numbers in the brackets in the first 2

columns indicated the position of the bits of digitized samples of

cover audio. As can be seen from the table, action takes place on


221

bit2 of second column; it is either flipped or unchanged as shown in

last column, as these bits form the 8 LSBs.

The retrieval of bits is done by performing XOR operation on bits as

done in the embedding process, and the result of the XOR operation

will give the message bits back.

The MSE value for the given example where three bits have been

changed comes out to be 9.39e-07. If we assume that all 8 bits are

changed during the data embedding process, then the MSE value is

6.08e-05. However, all 8 bits being changed during embedding has

very least probability of occurrence. Hence, this proposed method

proves to be better in increasing the capacity of cover audio to

embed additional information.

C. Considering 2 MSB’s:

This method checks the values of the first 2 Most Significant Bits

(MSB‟s) of the digitized samples of the cover audio for data

embedding. Table 7.9 gives the tabular representation of the data

embedding procedure. The steps for data embedding and data

retrieval are as follows:


1. Read the cover audio signal.

2. Read the audio signal to be embedded. Convert it into a

sequence of binary bits.

3. Every message bit from step 2 is embedded into the variable

and multiple LSBs of the samples of the digitized cover audio.

4. For embedding purpose, first 2 MSBs of cover samples are

checked:

if they are „00‟, then use 4 LSBs for data embedding.


222




5. The modified cover audio samples are then written to the file

forming the stego audio signal.

Table 7.9 Data embedding procedure for proposed method using 2 MSBs

MSB1

MSB2

No. of LSBs used for

data embedding

0 0 4

0 1 5

1 0 6

1 1 7

Steps for Data Retrieval:

1. Read the Stego audio signal.

2. Retrieval of message bits is done by checking the first 2 MSBs

of the samples.

if they are „00‟, then retrieve 4 LSBs




3. After every such 16 message bits are retrieved, they are

converted into their decimal equivalents and finally the secret

audio signal is reconstructed.

The capacity by the proposed method is computed using Eq. (1).

Here CP = increase in capacity; P1, P2, P3 and P4 are the

probabilities of the samples with the first 2 MSBs as „11‟, first 2

MSBs as „10‟, first 2 MSBs as „01‟, and first 2 MSBs as „00‟

respectively.

CP = P1*7 + P2*6 + P3*5 + P4*4 (1)


223

The percentage increase in capacity is given by Eq. (2)

Here C = 4 bits per sample

Assuming that all the four probabilities P1, P2, P3 and P4 are 0.25

each, that is they are equi-probable, then the capacity will be as

given in Eq. (3)

Here ECP = Estimated increase in capacity

The above Eq. (3) gives the estimate of increase in capacity of

cover audio for given method.

Table 7.10 gives the percentage distribution of samples for cover

audio signals used for this method.

Table 7.10 Distribution of samples using proposed method (2 MSBs)

Cover signal

% of samples

with first 2

MSBs as ‘11’

% of samples

with first 2

MSBs as ‘10’

% of samples

with first 2

MSBs as ‘01’

% of samples

with first 2

MSBs as ‘00’

Guitar 0.44 45.47 53.80 0.29

Triangle 0.11 57.39 42.42 0.08

Bugle 0 86.04 13.96 0

Speech1 10.46 34.34 44.50 10.71

Speech2 8.59 28.10 53.74 9.57

Alice 0.12 49.92 49.86 0.10

Trance 0 50.21 49.79 0

Echo 0 10.40 89.60 0

Birds 0 49.90 50.10 0

Faces 0.51 48.39 50.65 0.45

Average 2.02 46.02 49.84 2.12


ECP = 0.25*7 + 0.25*6 + 0.25*5 + 0.25*4 = 5.5 (3)


224

The last row in Table 7.10 gives the average values of the

percentage of samples for the 4 combinations of first 2 MSBs. It can

be seen from Table 7.10, that most of the samples have their first 2

MSBs as „10‟ and „01‟. The average percentage of samples with the

first 2 MSBs as „10‟ is 46.02. The average percentage of samples

with the first 2 MSBs as „01‟ is 49.84. Both these combinations

contribute to almost 95% of the total number of samples. The other

two combinations of MSBs as „11‟ and „00‟ contribute to very low

percentage of total number of samples and thereby, it can be

neglected. Hence, only the MSB of the samples can be considered.

Considering only MSB being 1 will include the combination of „11‟

and „10‟. Considering only MSB being 0 will include the combination

of „00‟ and „01‟. This clearly indicates that just by looking at the

Most Significant Bit (MSB) of the digitized samples of cover would

suffice to extend and further simplify the logic of the method.

D. Considering 1 MSB:

This method is an extension of the above method. This method

considers the value of the only Most Significant Bit (MSB) of the

digitized samples of cover audio for data hiding. Table 7.11 gives

the tabular representation of the embedding procedure. The steps

for data embedding and extraction are also explained in detail as

follows:


1. Read the cover audio signal.

2. Read the secret message to be embedded. Convert it into a

sequence of binary bits.

3. Every message bit from step 2 is embedded into the variable

and multiple LSBs of the samples of the digitized cover audio.

4. For embedding purpose, the MSB of cover samples is

checked:


225

If MSB is „0‟, then use 6 LSBs for data embedding

If MSB is „1‟, then use 7 LSBs for data embedding

5. The modified cover audio samples are then written to the file

forming the stego audio signal.

Table 7.11 Embedding procedure for proposed method using MSB

MSB

No. of LSBs used for

data embedding

0 6

1 7

Steps for Data Retrieval:

1. Read the stego audio signal.

2. Retrieval of message bits is done by checking the MSB of the

samples.

If MSB of the sample is „0‟ then retrieve 6 LSBs

If MSB of the sample is „1‟, then retrieve 7 LSBs

3. After every such 16 message bits are retrieved, they are

converted into their decimal equivalents and finally the secret

audio signal is reconstructed.

The capacity by the proposed method is computed using Eq. (4).

Here P1 and P2 are the probabilities of the samples with MSB value

as „1‟ and „0‟ respectively.

The percentage increase in capacity is given by Eq. (5)

Here C = 4 bits per sample.

Assuming that the probabilities P1 and P2 are 0.5 each, that is, they

are equi-probable, then the capacity will be as given in Eq. (6).

Here ECP = Estimated increase in capacity.

CP = P1*7 + P2*6 (4)


ECP = 0.5*7 + 0.5*6 = 6.5 (6)


226

The above Eq. (6) gives the estimate of increase in capacity of

cover audio for given method.

Results and Discussions:

In experimental results of these methods, there are 2 additional

performance measures such as incr_cap (increase in capacity in

terms of bits per sample) and % incr_cap (percentage increase in

capacity). Table 7.12 gives the results of the method which

considers magnitude of samples.

Table 7.12 Results of method considering magnitude of samples of cover audio

Cover Signal

Secret Signal MSE PSNR Increased Capacity

% Increased capacity

BER

Guitar

Audio 0.00022 132.73 4.0007 100.019 0.34396

Text 0.00023 132.73 4.0058 100.146 0.34395

Image 0.00023 132.73 4.0009 100.024 0.34386

Triangle

Audio 2.39E-05 142.54 4.0003 100.007 0.32832

Text 2.39E-05 142.54 4.0021 100.054 0.32866

Image 2.39E-05 142.54 4.0003 100.009 0.32872

Bugle

Audio 3.26E-08 172.15 4.0000 100.000 0.38189

Text 3.15E-09 182.12 4.0000 100.000 0.38166

Image 1.42E-08 174.83 4.0000 100.000 0.38164

Speech1

Audio 0.09742 106.44 4.3020 107.550 0.44762

Text 0.09731 106.44 4.2321 105.803 0.44749

Image 0.09735 106.44 4.3002 107.504 0.44765

Speech2

Audio 6.22E-02 108.39 4.2152 105.379 0.44477

Text 0.06222 108.38 4.0942 102.356 0.44468

Image 0.06218 108.39 4.2291 105.728 0.44488

Alice

Audio 0.00096 126.45 4.0159 100.400 0.42863

Text 0.00096 126.46 4.0086 100.215 0.42859

Image 0.00097 126.46 4.0148 100.370 0.42850

Trance

audio 2.17E-08 174.25 4.0000 100.000 0.36265

Text 1.77E-09 184.63 4.0000 100.000 0.36273

image 8.08E-09 177.31 4.0000 100.000 0.36282

Echo

audio 0.00234 122.62 4.0020 100.050 0.40077

Text 0.00234 122.62 4.0000 100.000 0.39822

image 0.00234 122.62 4.0000 100.000 0.40391

Birds

audio 3.43E-08 171.75 4.0000 100.000 0.38578

Text 3.81E-09 181.29 4.0000 100.000 0.38522

image 1.71E-08 174.05 4.0000 100.000 0.38529

Faces

audio 0.00097 126.44 4.0034 100.087 0.43654

Text 0.00097 126.44 4.0024 100.062 0.43645

image 0.00097 126.44 4.0026 100.060 0.43628

Average 0.00032 134.91 4.0043 100.108 0.39607

Remark: The maximum increase in capacity obtained using this method is 4.3 bits per

sample as compared to original capacity which is considered to be 4 bits per sample.


227

Table 7.13 gives the results of the XOR method with multiple LSB‟s.

Table 7.13 Results of XOR method with multiple LSB’s


Guitar

Audio 4.88E-06 150.3 30.03 0.10626

Text 4.41E-07 160.63 40.36 0.01064

Image 2.09E-06 153.13 32.87 0.04548

Triangle

Audio 2.55E-06 152.97 29.13 0.05233

Text 3.24E-07 161.63 37.78 0.00536

Image 1.26E-06 155.30 31.45 0.02302

Bugle

Audio 4.48E-06 150.61 29.85 0.09655

Text 3.92E-07 161.12 40.37 0.00973

Image 1.95E-06 153.45 32.70 0.04141

Speech1

Audio 5.67E-06 149.71 46.31 0.13415

Text 5.46E-07 159.73 56.33 0.01353

Image 2.48E-06 152.38 48.98 0.05719

Speech2

Audio 5.96E-06 149.44 44.66 0.13674

Text 5.61E-07 159.62 54.84 0.01382

Image 2.66E-06 152.08 47.30 0.05821

Alice

Audio 2.66E-06 152.88 40.6 0.05921

Text 2.34E-07 163.35 51.073 0.00597

Image 1.18E-06 155.63 43.35 0.02514

Trance

Audio 2.56E-06 153.04 27.45 0.05433

Text 2.26E-07 163.56 37.97 0.00545

Image 1.11E-06 155.86 30.28 0.02321

Echo

audio 3.08E-06 151.96 39.92 0.05205

Text 4.13E-07 160.97 48.93 0.00557

Image 1.78E-06 153.83 41.79 0.02519

Birds

audio 5.39E-06 149.78 28.74 0.11608

Text 4.78E-07 160.26 39.22 0.01175

Image 2.35E-06 152.64 31.61 0.04945

Faces

audio 3.65E-06 151.54 40.86 0.08153

Text 3.35E-07 161.80 51.13 0.00821

Image 1.61E-06 154.27 43.59 0.03470

Average 2.11E-06 155.45 39.98 0.04541

Remark: It can be seen from the above table that the MSE values are better

than the estimated MSE value 6.08e-05 which considers all 8 bits being changed

during the embedding process.


228

Table 7.14 gives the results of the method considering 2 MSB‟s.

Table 7.14 Results of proposed method (considering 2 MSBs)

Cover Signal

Secret Signal

MSE

PSNR SNR Incr_cap %

Incr_cap BER

Guitar

Audio 3.04E-07 162.71 42.44 5.53 138.30 0.10241

Text 2.70E-08 172.8 52.53 5.51 137.95 0.01073

Image 1.13E-07 165.81 45.54 5.52 138.21 0.04538

Triangle

Audio 1.60E-07 165.38 41.54 5.53 138.34 0.05243

Text 2.31E-08 172.91 49.06 5.8 145.02 0.00537

Image 6.60E-08 168.17 44.33 5.52 138.65 0.02273

Bugle

Audio 3.66E-07 162.01 41.25 5.82 145.64 0.09702

Text 3.12E-08 172.16 51.41 5.90 147.81 0.00984

Image 1.32E-07 165.15 44.4 5.84 146.14 0.04125

Speech1

Audio 5.24E-07 160.29 56.88 5.45 136.28 0.11361

Text 4.89E-08 170.56 67.15 5.39 135.02 0.01343

Image 2.17E-07 163 59.59 5.44 136.19 0.05706

Speech2

Audio 4.63E-07 160.76 55.98 5.41 135.41 0.11443

Text 2.78E-08 173.1 68.32 5.26 131.7 0.01373

Image 1.92E-07 163.52 58.74 5.4 135.03 0.05815

Alice

Audio 1.79E-07 165.19 52.91 5.5 137.62 0.05909

Text 1.41E-08 175.62 63.34 5.475 136.99 0.00594

Image 6.26E-08 168.39 56.11 5.5 137.79 0.02521

Trance

Audio 1.59E-07 165.68 40.09 5.49 137.55 0.05437

Text 1.29E-08 175.94 50.36 5.5 137.63 0.00548

Image 5.59E-08 168.88 43.29 5.49 137.49 0.02312

Echo

Audio 1.82E-07 164.36 52.32 5.67 141.84 0.05215

Text 3.17E-08 172.13 60.09 6 150 0.00561

Image 1.11E-07 165.98 53.94 5.75 143.83 0.02471

Birds

Audio 3.07E-07 162.61 41.57 5.49 137.49 0.10632

Text 2.70E-08 172.71 51.67 5.5 137.59 0.01177

Image 1.20E-07 165.57 44.53 5.49 137.49 0.04956

Faces

Audio 2.42E-07 163.86 53.19 5.48 137.32 0.08163

Text 1.99E-08 174.04 63.37 5.51 137.92 0.00825

Image 8.40E-08 167.11 56.44 5.49 137.28 0.03473

Average 1.43E-07 167.55 52.08 5.55 139.05 0.04352

Remark: It can be seen from the table that the highest increase in capacity

obtained is 6bits per sample. The estimated increase in capacity for this method

was calculated to be (4+5+6+7)/4 =5.5 from Eq. (3) given in (C) of this section.

From the table, it is obvious that for all cover signals, the increase in capacity is

either close or more than this estimated value. The average value for increase in

capacity comes out to be 5.55.


229

Table 7.15 gives the results of the method considering only 1 MSB.

Table 7.15 Results of proposed method (considering 1 MSB)

Cover Signal

Secret Signal

MSE

PSNR (db)

SNR Increased Capacity in bits

% Increased

in Capacity

BER

Guitar

Audio 1.07E-06 157.35 37.08 6.52 163.14 0.10648

Text 8.83E-08 167.67 47.40 6.51 163.13 0.01072

Image 3.89E-07 160.46 40.19 6.53 163.43 0.04536

Triangle

Audio 5.44E-07 159.99 36.14 6.53 163.51 0.05226

Text 6.99E-08 168.2 44.35 6.82 170.66 0.00535

Image 2.31E-07 162.73 38.88 6.53 163.85 0.02269

Bugle

Audio 1.26E-06 156.61 35.85 6.83 170.95 0.09703

Text 1.02E-07 167.13 46.38 6.91 172.93 0.00966

Image 4.56E-07 159.77 39.02 6.84 171.22 0.04122

Speech1

Audio 1.16E-06 156.88 53.47 6.45 161.40 0.12397

Text 9.69E-08 167.23 63.82 6.40 160.26 0.01355

Image 4.47E-07 159.85 56.45 6.45 161.3 0.05713

Speech2

Audio 1.13E-06 157.01 52.23 6.41 160.40 0.12467

Text 8.30E-08 168.2 63.42 6.26 156.55 0.01368

Image 4.26E-07 160.07 55.29 6.39 159.78 0.05808

Alice

Audio 5.88E-07 160.01 47.73 6.50 162.63 0.05908

Text 4.52E-08 170.56 58.27 6.47 161.91 0.00595

Image 2.09E-07 163.14 50.86 6.50 162.75 0.02524

Trance

Audio 5.31E-07 160.42 34.84 6.49 162.57 0.05415

Text 4.44E-08 170.65 45.07 6.505 162.78 0.00547

Image 1.90E-07 163.57 37.98 6.5 162.61 0.02308

Echo

Audio 6.41E-07 158.72 46.68 6.69 167.84 0.05182

Text 9.63E-08 167.38 55.33 7 175 0.00567

Image 4.31E-07 160.12 48.08 6.77 169.28 0.02509

Birds

Audio 1.14E-06 157.10 36.06 6.49 162.47 0.11600

Text 9.43E-08 167.4 46.36 6.49 162.56 0.01166

Image 4.04E-07 160.29 39.25 6.49 162.49 0.04925

Faces

Audio 8.04E-07 158.62 47.95 6.49 162.36 0.08148

Text 6.65E-08 168.77 58.09 6.52 163.08 0.00817

Image 2.84E-07 161.82 51.15 6.49 162.39 0.03470

Average 4.37E-07 162.59 47.12 6.56 164.17 0.04462

Remark: It can be seen from the table that the highest increase in capacity

obtained is 7. The estimated increase in capacity for this method was calculated

to be 6.5 from Eq. (3) given in (D) of this section. From the table, it is obvious

that for all cover signals, the increase in capacity is either close or more than this

estimated value. The average value for increase in capacity comes out to be 6.56.


230

Figure 7.8 shows the plotting of the cover audio signal (Trance).

Figure 7.9 shows the plotting of the stego signal obtained after

applying the method (considering 2 MSBs) and Figure 7.10 shows

the plotting of the stego signal obtained after applying the method

(considering MSB). From the figures, no difference is found in the

stego signals obtained from either of the methods as compared to

the original or cover audio signal.

Figure 7.8 Plot of Cover

Audio Signal

Figure 7.9 Plot of Stego

audio signal (Using 2MSBs)

Figure 7.10 Plot of Stego

audio signal (Using MSB)

Discussion

In Time-domain, LSB method has been implemented for LSB, 2

LSBs, 4 LSBs and 8 LSBs for hiding data in cover audio. It is seen

from results that if the number of LSBs for data embedding exceeds

4 LSBs then there is some audible distortion in the host audio

signal.

In order to increase the security, two methods using LSB coding

along with encryption to hide information (audio, image and text) in

digital audio files have been proposed. In the first method the

information is hidden by altering LSBs indirectly considering parity.

In the second method, information is hidden based on the result of

XOR operation of LSBs and the message bit to be embedded. In

both these methods, direct LSB extraction will only result in noise.

Thus, by using encryption along with steganography, these methods

provide an additional level of security. From experimental results, it

is seen that the proposed methods are effective. From listening


231

tests, no difference is found between the original audio signal and

the stego audio signal. The hidden information is recovered without

any error.

In order to increase the capacity of the cover audio, several

methods have been proposed. The first method uses multiple and

variable LSBs for hiding data considering the magnitude of the

cover samples. Experimental results of this method has shown that

the method do not succeed in increasing the capacity of the cover

audio. So, three novel approaches to increase the capacity have

been proposed and they give good results. The first method among

the three is based on XOR operation performed on different

combination of bits and uses last 8 LSB‟s of cover samples for

hiding data. The other two methods embed data in multiple and

variable LSBs depending on the MSBs of the cover audio samples.

The first method checks the first 2 MSBs of the cover samples. The

second method is an extension of the first method and checks only

the one MSB of the cover samples. From results, it is seen that

there is a remarkable increase in capacity of cover audio for hiding

additional data and without affecting the perceptual transparency of

the host audio signal.

Using the first method considering the XOR operation, the results

obtained are much better than the estimated results. Considering 2

MSBs, the average increase in capacity is to 5.55 (bits per sample)

as compared to the original capacity i.e., 4 bits per sample.

Considering MSB, the average increase in capacity is to 6.56 (bits

per sample) as compared to the original capacity i.e., 4 bits per

sample.

From subjective listening tests, it has been seen that there is no

noticeable difference in the perceptual quality of the stego audio


232

signals obtained from the proposed methods and the cover audio

signal. The main advantages of the proposed methods are that they

are simple in logic and the hidden information is recovered without

any error. Thus it succeeds in attaining the basic requirements of

data hiding. The steganalysis of the proposed methods are more

challenging as well, because there is varied number of bits flipped in

audio samples and the adversary cannot identify exactly how many

bits are used for hiding the data.

7.5 Transform-based Methods

Transform based method embeds secret information by modifying

transform coefficients of the cover object. It is seen that the

transform based method has the potential to achieve higher payload

capacity and is more robust than LSB method.

Here, two transforms i.e., DCT and Haar transform have been used

for hiding data in cover audio signals. For experimental results, 3

audio signals (Guitar, Triangle and Bugle) as cover and 3 audio

signals (Pingpong, Chimes and Newmail) as secret messaged have

been used. The performance measures used are MSE, PSNR and

BER between cover and stego audio signal and MSE between

original secret message and the retrieved secret message.

Using DCT transform, three different methods have been

implemented. In the first 2 methods, DCT is applied to blocks of 8

samples each. However after looking at the results obtained which

are not favorable, the block size has been increased to blocks of 64

samples each.


233

The last method using DCT is based on considering various dividing

factors (2, 5, 10, 20, 25, 50, 75, and 100). The similar approach

has been applied using Haar transform as well.

7.5.1 DCT Transform

Discrete Cosine Transform (DCT) is a very popular technique for

Data compression as it gives maximum energy compaction.

Therefore there is more possibility of hiding more data than other

transforms. Here 3 different are proposed for hiding Information

using DCT.

7.5.1.1 Method 1:

In this method, DCT is applied to blocks of 8 samples each of cover

and secret audio signals. Every 8th DCT coefficient of cover is

replaced by a DCT coefficient of the secret message. The steps

involved in data embedding and extraction are discussed in detail

below:


1. Read the cover and secret audio file.

2. Split both the files into blocks of 8 samples each. Each block

of the cover audio is used for data embedding.

3. Apply DCT to blocks of 8 samples of both the files.

4. Each 8th DCT coefficient of each block of the cover audio is

replaced by the DCT coefficient of the secret message.

5. Apply inverse DCT to the modified DCT coefficients of each

block.

6. The resultant coefficients are then written back to the file.

This becomes the stego audio file.


234

Steps for Data Extraction/ Retrieval:


2. Split the file into block of 8 samples each and apply DCT to it.

3. Extract every 8th DCT coefficient from each block.

4. Group each such coefficient into blocks of 8 and apply inverse

DCT to it

5. The resultant coefficients are written back to the file.

This becomes the retrieved secret message.

7.5.1.2 Method 2:

In this method, DCT is applied to blocks of 8 samples each of cover

and secret audio. Unlike the earlier method where the secret

coefficients are directly embedded in every 8th cover coefficients,

here the secret coefficients are first modified and then embedded.

The process of data embedding and extraction are discussed below.


1. Read the cover and secret audio files.




4. The maximum coefficient value in each secret block is divided

by 2. The value is then multiplied with all other coefficients in

their respective blocks. The computed values from each block

are saved and used in the decoding process.

5. The 8th DCT coefficient of each block of the cover audio is

replaced by each DCT coefficient of the secret audio file.

6. Apply inverse DCT to the modified DCT coefficients of each

block.

7. The resultant coefficients are then written back to the file.


235



2. Split the file into block of 8 samples and apply DCT to it.

3. Extract 8th DCT coefficient from each block. Each coefficient is

then divided by the value computed in the encoding process

in its respective block.

4. Apply inverse DCT to the modified DCT coefficients.


7.5.1.3 DCT Using a dividing Factor:

In order to improve the results of the DCT applied to audio signals

in the earlier two methods implemented, a new method has been

proposed here. In this method, unlike the earlier methods where

the audio signal is divided into blocks of 8 samples each, here the

audio signal is split into blocks of 64 samples each.






4. The secret DCT coefficients are then divided by a

predetermined constant factor.

5. The last 8 DCT coefficients of each block of the cover audio

are replaced by the secret DCT coefficients from step 4.

6. Apply inverse DCT to the modified cover DCT coefficients of

each block.

7. The resultant coefficients are then written back to the file to

form the stego file.


236



2. Split the file into block of 64 samples and apply DCT to it.

3. Extract the last 8 DCT coefficients from each block.

4. The extracted coefficients are then multiplied by the constant

factor that was used during embedding process.

5. Group such coefficients from step 4 into blocks of 64 and

apply inverse DCT to it.


This becomes the retrieved secret message.

Different factors such as 2, 5, 10, 20, 25, 50, 75 and 100 were used

in the implementation.

7.5.2 Haar Transform

Haar Transform has become more popular after the introduction of

wavelets. Haar transform is fast for computation and gives more

data compression.

7.5.2.1 Haar Transform Using a dividing Factor:

In this method, Haar transform is applied for hiding audio data in

cover audio.





3. Apply Haar transform to blocks of 64 samples of both the

files.


237

4. The secret transform coefficients are then divided by the

same predetermined constant factor as is used in DCT for

comparison.

5. The last 8 coefficients of each block of the cover audio are

replaced by the secret coefficients from step 4.

6. Apply inverse haar transform to the modified cover transform

coefficients of each block.

7. The resultant coefficients are then written back to the file to

form the stego file.



2. Split the file into block of 64 samples and apply Haar

transform to it.

3. Extract the last 8 coefficients from each block.

4. The extracted coefficients are then multiplied by the constant

factor that was used during embedding process.

5. Group such coefficients from step 4 into blocks of 64 and

apply inverse Haar transform to it.


This forms the retrieved secret message.

Different factors such as 2, 5, 10, 20, 25, 50, 75 and 100 were used

in the implementation.

7.6 Results:

Table 7.16 gives the results of Method 1 using DCT. Table 7.17

gives the results of Method 2 using DCT.


238

Table 7.16 Results of Method 1 using DCT

Cover Secret MSE PSNR MSE(message) BER

Guitar

Pingpong 0.000239 132.54 2.05E-10 0.10905

Chimes 0.001914 123.51 1.48E-08 0.33038

Newmail 0.022459 112.82 0.001245 0.38871

Triangle

Pingpong 0.000138 134.92 1.55E-10 0.06375

Chimes 0.000994 126.35 8.89E-07 0.23517

Newmail 0.020875 113.13 0.002986 0.40665

Bugle

Pingpong 0.000218 132.94 1.57E-10 0.10930

Chimes 0.001772 123.84 5.79E-10 0.31338

Newmail 0.01872 113.6 0.01065 0.36702

Average 0.00748 123.74 0.00165 0.25815

Remarks: The minimum MSE value obtained is 0.000138 and the minimum MSE

value for the message is 1.55E-10.

Table 7.17 Results of Method 2 using DCT


Guitar

Pingpong 4.58E-05 139.71 0.001821 0.05705

Chimes 8.53E-05 137.02 0.012151 0.18178

Newmail 0.014356 114.75 0.00101 0.27789

Triangle

Pingpong 4.47E-05 139.83 0.004064 0.05473

Chimes 6.43E-05 138.24 1.46643 0.15011

Newmail 0.012909 115.22 0.001868 0.30976

Bugle

Pingpong 4.23E-05 140.06 0.00205 0.08592

Chimes 7.96E-05 137.32 0.006976 0.23206

Newmail 0.01447 114.72 0.001079 0.28766

Average 4.68E-03 130.76 0.16638 0.18188

Remarks: The minimum MSE value obtained is 4.23E-05 and the minimum MSE

value for the message is 0.00101. It can be seen that the MSE values obtained

from this method are better than the MSE values obtained from method1 as

shown in Table 7.16. On the contrary, the MSE for the message gives better

results from method1 than the method2.

Table 7.18 – Table 7.25 show the results for using DCT with various

factors like 2, 5, 10, 20, 25, 50, 75 and 100


239

Table 7.18 Results of Using DCT with a factor 2


Guitar

Pingpong 5.97E-05 138.56 1.31E-09 0.09193

Chimes 0.00048 129.53 1.00E-08 0.28185

Newmail 0.00661 118.13 9.19E-08 0.34975

Triangle

Pingpong 4.43E-05 139.86 3.13E-10 0.05722

Chimes 0.00026 132.16 3.46E-10 0.19971

Newmail 0.00627 118.35 1.43E-06 0.36926

Bugle

Pingpong 5.45E-05 138.96 3.09E-10 0.09534

Chimes 0.00044 129.86 4.76E-10 0.27221

Newmail 0.00678 118.02 4.20E-08 0.32807

Average 2.33E-03 129.27 1.75E-07 0.22726





Guitar

Pingpong 9.56E-06 146.53 2.18E-09 0.07297

Chimes 7.66E-05 137.49 1.10E-08 0.23606

Newmail 0.00106 126.08 1.29E-07 0.30464

Triangle

Pingpong 1.88E-05 143.59 1.93E-09 0.05427

Chimes 5.41E-05 139 2.29E-09 0.16844

Newmail 0.001 126.26 1.43E-06 0.32585

Bugle

Pingpong 8.82E-06 146.87 2.02E-09 0.08647

Chimes 7.12E-05 137.8 1.52E-09 0.24022

Newmail 0.00108 125.98 4.30E-08 0.28879

Average 3.75E-04 136.62 1.80E-07 0.19753





Guitar

Pingpong 2.39E-06 152.55 8.63E-09 0.05978

Chimes 1.92E-05 143.51 1.45E-08 0.20082

Newmail 0.00026 132.1 1.06E-07 0.26828

Triangle

Pingpong 1.50E-05 144.56 7.73E-09 0.05238

Chimes 2.41E-05 142.51 9.76E-09 0.14756

Newmail 0.00026 132.11 1.44E-06 0.29201

Bugle

Pingpong 2.29E-06 152.72 7.79E-09 0.08171

Chimes 1.81E-05 143.76 5.13E-09 0.22104

Newmail 0.00027 132 4.60E-08 0.26219

Average 9.68E-05 141.75 1.83E-07 0.17619




240



Guitar

Pingpong 5.98E-07 158.57 3.84E-08 0.04871

Chimes 4.79E-06 149.53 2.94E-08 0.16668

Newmail 6.62E-05 138.12 1.07E-07 0.23208

Triangle

Pingpong 1.40E-05 144.86 3.17E-08 0.05105

Chimes 1.64E-05 144.17 4.10E-08 0.12906

Newmail 7.62E-05 137.03 1.47E-06 0.25879

Bugle

Pingpong 6.62E-07 158.12 3.07E-08 0.07856

Chimes 4.78E-06 149.54 1.98E-08 0.20655

Newmail 6.80E-05 138 5.88E-08 0.23853

Average 2.80E-05 146.44 2.03E-07 0.15666





Guitar

Pingpong 3.83E-07 160.5 6.01E-08 0.04563

Chimes 3.07E-06 151.46 4.16E-08 0.15653

Newmail 4.24E-05 140.06 1.11E-07 0.22069

Triangle

Pingpong 1.39E-05 144.9 5.04E-08 0.05058

Chimes 1.55E-05 144.44 6.53E-08 0.12388

Newmail 5.37E-05 139.03 1.47E-06 0.24789

Bugle

Pingpong 4.66E-07 159.65 4.80E-08 0.07756

Chimes 3.18E-06 151.3 3.12E-08 0.20294

Newmail 4.36E-05 139.93 7.91E-08 0.23155

Average 1.96E-05 147.92 2.17E-07 0.15081



Table 7.23 Results of using DCT with a factor 50


Guitar

Pingpong 9.66E-08 166.48 2.60E-07 0.03723

Chimes 7.68E-07 157.48 1.61E-07 0.12444

Newmail 1.06E-05 146.08 2.49E-07 0.18627

Triangle

Pingpong 1.37E-05 144.96 2.07E-07 0.04922

Chimes 1.41E-05 144.82 2.77E-07 0.10985

Newmail 2.36E-05 142.59 1.66E-06 0.21607

Bugle

Pingpong 2.05E-07 163.22 1.95E-07 0.07536

Chimes 1.06E-06 156.07 1.23E-07 0.19323

Newmail 1.12E-05 145.86 1.40E-07 0.21395

Average 8.37E-06 151.95 3.64E-07 0.13396




241



Guitar

Pingpong 4.37E-08 169.93 5.97E-07 0.03365

Chimes 3.42E-07 160.99 3.86E-07 0.10756

Newmail 4.71E-06 149.59 4.25E-07 0.16708

Triangle

Pingpong 1.37E-05 144.97 4.51E-07 0.04862

Chimes 1.39E-05 144.9 6.32E-07 0.10305

Newmail 1.81E-05 143.76 1.90E-06 0.19809

Bugle Pingpong 1.56E-07 164.38 4.50E-07 0.07429

Chimes 6.66E-07 158.09 2.79E-07 0.18937

Newmail 5.14E-06 149.22 4.11E-07 0.20557

Average 6.31E-06 153.98 6.15E-07 0.12525





Guitar

Pingpong 2.52E-08 172.32 1.08E-06 0.03152

Chimes 1.93E-07 163.47 7.19E-07 0.09655

Newmail 2.65E-06 152.09 4.82E-07 0.15345

Triangle

Pingpong 1.36E-05 144.98 7.99E-07 0.04825

Chimes 1.37E-05 144.94 1.14E-06 0.09934

Newmail 1.61E-05 144.26 2.45E-06 0.18599

Bugle

Pingpong 1.40E-07 164.88 7.82E-07 0.07382

Chimes 5.29E-07 159.09 4.95E-07 0.18671

Newmail 3.04E-06 151.5 3.85E-07 0.20077

Average 5.55E-06 155.28 9.26E-07 0.11960



Table 7.26 Results of Haar Transform using factor 2


Guitar

Pingpong 0.001602 124.28 0.001616 0.30611

Chimes 0.003039 121.5 0.006040 0.33029

Newmail 0.004626 119.67 0.059137 0.33848

Triangle

Pingpong 0.000674 128.04 0.001436 0.29409

Chimes 0.00137 124.96 0.006171 0.30435

Newmail 0.002641 122.11 0.113585 0.32930

Bugle

Pingpong 0.00133 125.09 0.001463 0.34766

Chimes 0.002759 121.92 0.006068 0.35639

Newmail 0.00433 119.96 0.066949 0.36226

Average 0.002486 123.06 0.029163 0.32988

Remarks: The minimum MSE value obtained is 0.000674 and the minimum MSE

value for the message is 0.001436.


242



Guitar

Pingpong 0.000146 134.67 9.13e-05 0.27855

Chimes 0.001626 124.22 0.002970 0.32694

Newmail 0.003708 120.64 0.043101 0.33230

Triangle

Pingpong 9.43E-05 136.58 1.89e-09 0.02231

Chimes 0.000757 127.54 0.002932 0.30493

Newmail 0.002233 122.84 0.083414 0.32267

Bugle

Pingpong 9.61E-05 136.5 1.82e-09 0.03666

Chimes 0.001471 124.65 0.002725 0.35773

Newmail 0.003527 120.85 0.049024 0.35860

Average 0.001518 127.61 0.020473 0.26007





Guitar

Pingpong 2.42E-05 142.49 8.82e-09 0.03391

Chimes 0.000436 129.94 0.000421 0.30724

Newmail 0.002509 122.34 0.022203 0.33143

Triangle

Pingpong 6.71E-05 138.06 8.84e-09 0.02222

Chimes 0.000249 132.36 0.000290 0.28337

Newmail 0.0017 124.02 0.044652 0.32140

Bugle

Pingpong 4.50E-05 139.79 8.93e-09 0.03612

Chimes 0.00046 129.7 0.000115 0.32392

Newmail 0.002468 122.41 0.026795 0.35648

Average 8.84E-04 131.23 1.05E-02 0.22401





Guitar

Pingpong 9.95E-06 146.35 3.75e-08 0.03335

Chimes 0.000103 136.18 5.26e-08 0.07698

Newmail 0.001285 125.24 0.001178 0.29326

Triangle

Pingpong 6.03E-05 138.52 3.78e-08 0.02211

Chimes 0.000111 135.88 3.80e-08 0.07316

Newmail 0.001107 125.88 0.004375 0.29919

Bugle

Pingpong 3.21E-05 141.26 3.84e-08 0.03577

Chimes 0.00022 132.89 3.18e-08 0.09144

Newmail 0.001305 125.17 0.003715 0.34331

Average 4.70E-04 134.15 1.03E-03 0.14095




243



Guitar

Pingpong 8.21E-06 147.18 6.19e-08 0.03302

Chimes 6.81E-05 137.99 6.78e-08 0.07502

Newmail 0.001027 126.21 9.61e-07 0.08422

Triangle

Pingpong 5.96E-05 138.58 6.18e-08 0.02211

Chimes 9.30E-05 136.64 6.13e-08 0.07272

Newmail 0.001002 126.32 1.35e-006 0.09584

Bugle

Pingpong 3.05E-05 141.47 6.04e-08 0.03565

Chimes 0.000143 134.76 4.70e-08 0.09098

Newmail 0.001084 125.97 0.000226 0.30057

Average 3.91E-04 135.01 2.54E-05 0.09002





Guitar

Pingpong 5.85E-06 148.65 2.47e-07 0.03285

Chimes 2.09E-05 143.13 1.80e-07 0.06986

Newmail 0.000261 132.16 1.08e-06 0.07763

Triangle

Pingpong 5.85E-05 138.65 2.61e-07 0.02207

Chimes 6.93E-05 137.92 2.54e-07 0.07182

Newmail 0.000297 131.6 1.54e-06 0.09165

Bugle

Pingpong 2.84E-05 141.79 2.53e-07 0.03553

Chimes 0.000143 134.76 1.68e-07 0.08996

Newmail 0.000392 130.39 5.45e-07 0.09202

Average 1.42E-04 137.67 5.03E-07 0.06482





Guitar

Pingpong 5.39E-06 149.01 5.69e-07 0.03282

Chimes 1.21E-05 145.5 3.66e-07 0.06736

Newmail 0.000119 135.58 1.18e-06 0.07438

Triangle

Pingpong 5.83E-05 138.67 5.77e-07 0.02197

Chimes 6.50E-05 138.2 5.82e-07 0.0715

Newmail 0.000166 134.12 1.92e-06 0.08978

Bugle

Pingpong 2.80E-05 141.85 5.81e-07 0.03547

Chimes 0.000135 135.02 3.77e-07 0.08924

Newmail 0.000246 132.43 6.42e-07 0.09097

Average 9.28E-05 138.93 7.55E-07 0.06372




244



Guitar

Pingpong 5.22E-06 149.15 1.01e-06 0.03282

Chimes 9.01E-06 146.78 6.25e-07 0.06585

Newmail 6.90E-05 137.94 1.38e-06 0.07177

Triangle

Pingpong 5.83E-05 138.67 1.03e-06 0.02195

Chimes 6.35E-05 138.3 1.04e-06 0.07121

Newmail 0.000121 135.52 2.25e-06 0.08819

Bugle

Pingpong 2.79E-05 141.88 9.82e-07 0.03542

Chimes 0.000132 135.12 6.68e-07 0.08872

Newmail 0.000194 133.44 9.18e-07 0.09058

Average 7.55E-05 139.64 1.10E-06 0.06295



Table 7.34 Comparison of MSE (between cover and stego) and MSE (message) for

DCT and Haar transform methods using factor

Factor MSE MSE (Message)

DCT Haar transform DCT Haar transform

2 2.33E-03 0.002486 1.75E-07 0.029163

5 3.75E-04 0.001518 1.80E-07 1.05E-02

10 9.68E-05 8.84E-04 1.83E-07 1.03E-03

20 2.80E-05 4.70E-04 2.03E-07 2.54E-05

25 1.96E-05 3.91E-04 2.17E-07 2.54E-05

50 8.37E-06 1.42E-04 3.64E-07 5.03E-07

75 6.31E-06 9.28E-05 6.15E-07 7.55E-07

100 5.55E-06 7.55E-05 9.26E-07 1.10E-06

Remarks: The minimum MSE value obtained using DCT and Haar transform is

5.55E-06 and 7.55E-05 for the dividing factor 100 respectively. The minimum

MSE value for the message using DCT and Haar transform is 1.75E-07 for factor 2

and 5.03E-07 for factor 50 respectively.

Table 7.35 Comparison of BER for DCT and Haar transform methods for

various factors

Factor BER (using DCT) BER (using Haar Transform)

2 0.22726 0.32988

5 0.19753 0.26007

10 0.17619 0.22401

20 0.15666 0.14095

25 0.15081 0.09002

50 0.13396 0.06482

75 0.12525 0.06372

100 0.11960 0.06295

Remarks: The minimum BER value obtained using DCT and Haar transform is

0.11960 and 0.06295 for the dividing factor 100 respectively.


245

7.7 Discussion

DCT and Haar transforms have been used for hiding data in audio

signals. The first two methods in which DCT transform has been

applied to cover audio and secret audio signals is in blocks of 8

samples. In the first method, every 8th cover DCT coefficient is

directly replaced by each secret DCT coefficient. The results of this

method show that there is audible distortion introduced in the stego

audio signal whereas the perceptual quality of secret audio retrieved

is high. In the second method, the secret coefficients are multiplied

with the max coefficient for each block of 8 samples. These modified

coefficients are then embedded at every 8th cover coefficients. The

results of this method show that the perceptual quality of the

resultant host signal is quite good whereas the quality of the secret

signal retrieved is bad. So, there is a trade-off between the data

embedding and the quality of the secret message retrieved.

In the third method, instead of applying DCT to blocks of 8 samples,

it is applied to blocks of 64 samples each for cover as well as secret

audio signals. In this method, various constant factors as a dividing

factor have been used for normalizing the secret coefficients. These

modified secret coefficients were then embedded in the last 8 cover

coefficients of each block. This has been done to improve the results

of the earlier 2 methods. On a similar basis, Haar transform is used

for hiding information in audio using the same method and their

performance is compared.

However, after using various dividing factors for different cover

audio signals, it was difficult to conclude on a single factor as it

highly depends on the cover and secret audio to be embedded. It is

observed that performance of DCT is better than Haar considering

MSE and PSNR as performance measures. However, for BER


246

performance of DCT is better for dividing factor upto 10 only and

thereafter performance of Haar is better.

chapter 7 information hiding in audio...

Documents