chapter 7 information hiding in audio...
TRANSCRIPT
Information Hiding in Image and Audio Files 2007-2010
203
CHAPTER 7
INFORMATION HIDING IN AUDIO FILES
This Chapter deals with Hiding Information in Audio files. Today‟s
technology allows the copying and redistribution of audio files over
the Internet at a very low or almost no cost. So it is necessary to
have methods that confines access to these audio files and also for
its security. Therefore, usually Information is embedded in Audio
files for the purpose of copyright protection or for authentication of
digital media. In a computer-based audio Steganography system,
secret messages are embedded in digital sound [102]. In Audio
Steganography, the weakness of the Human Auditory System (HAS)
is used to hide information in the audio [101].
In the past few years, several algorithms for the embedding and
extraction of messages in audio sequences have been proposed. All
of the developed algorithms exploit the characteristics of the human
auditory system (HAS) in order to hide data into the host signal in a
perceptually transparent manner [103]. However, embedding secret
messages in digital sound is usually a more difficult process than
embedding messages in other media, such as digital images [102].
In addition, the amount of data that can be embedded transparently
into an audio sequence is considerably lower than the amount of
data that can be embedded in images or video sequences as an
audio signal has a dimension less than two-dimensional image or
video files. [103]. On the other hand, many attacks that are
malicious against image steganography algorithms (e.g. geometrical
distortions, spatial scaling, etc.) are not applicable to audio
steganography schemes [103]. Embedding information into audio
seems more secure due to less steganalysis techniques for
attacking to audio [103]. Furthermore, Natural sensitivity and
difficulty of working on audio resulted in far less algorithms and
Information Hiding in Image and Audio Files 2007-2010
204
techniques as compared to images. Existing Audio Steganography
schemes can embed messages in WAV, AU, AIFF and even MP3
sound file formats.
Information hiding in audio signals has wide range of applications.
The most important and obvious application of Audio
Steganography is covert communication using innocuous cover
signals, like a telephone conversation [104]. Another application,
known as (digital) watermarking [98], refers to embedding an
unobtrusive mark into an object, which can be used to identify the
object or act as a copyright protection of digital media. For
example, a digital watermark [98] can be inserted into a piece of
music so that it can be monitored automatically for payment
purposes [4]. One of the applications provides a mechanism for
embedding important control, descriptive or reference information
in a given signal. This information can be used for tracking the use
of a particular clip, including billing for commercials and audio
broadcast. It can be used to track audio creation, manipulation and
modification history within a given signal without the overhead
associated with creating a separate header or history file. It can
also be used to track access to a given signal. This information is
important in rights management applications [104].
7.1. Characteristics
An effective audio steganographic scheme should possess the
following three characteristics: Inaudibility of distortion (Perceptual
Transparency), Data Rate (Capacity) and Robustness. Figure 7.1
gives the simplest visualization of the requirements of Information
hiding in digital audio, so called the magic triangle: these three
requirements forms the corners of the magic triangle [106].
Information Hiding in Image and Audio Files 2007-2010
205
Figure 7.1 Magic Triangle for Data Hiding
Inaudibility of distortion: It evaluates the audible distortion due
to signal modifications like message embedding or attacking. The
data hiding scheme has to insert additional data without affecting
the perceptual quality of the host audio signal [103].
Robustness: It measures the ability of the embedded data to
withstand against intentional and unintentional attacks.
Unintentional attacks generally include common data manipulations
such as re-sampling, re-quantization etc. Intentional attacks include
addition of noise, resizing, rescaling etc [103].
Data Rate (Capacity): It refers to the amount of information that
a data hiding scheme can successfully embed without introducing
perceptual distortion. In other words the bit rate of the message is
the number of the embedded bits within a unit of time and is
usually given in bits per second (bps) [103].
7.2. Overview of Human Auditory System
The human auditory system (HAS) operates over a wide dynamic
range. When using digital images as cover files the difficulty of the
human eye to distinguish colors is taken advantage of, similarly,
when using digital audio one can count on the different sensitivity of
the human ear when it comes to sounds of low and high intensity;
Information Hiding in Image and Audio Files 2007-2010
206
usually, higher sounds are perceived better than lower ones and it
is thus easier to hide data among low sounds without the human
ear noticing the alteration [107]. In addition, there are some
environmental distortions so common as to be ignored by the
listener in most of the cases [108]. Such are the weaknesses of
HAS that can be exploited for addition of data in audio signals.
The effects of human auditory system (HAS) relative to
Steganography are temporal masking and frequency masking. In
temporal masking, a weaker audible signal on either side (pre and
post) of a strong masker becomes imperceptible. Similarly, in
frequency masking, if two signals occurring simultaneously are close
together in frequency, the stronger masking signal may make the
weaker signal inaudible [109].
7.2.1 Digital Audio Files
There are two critical parameters to most digital audio
representations: sample quantization method and temporal
sampling rate. The most popular format for representing samples of
high-quality digital audio is a 16-bit linear quantization e.g.;
Windows Audio-Visual (WAV) and Audio Interchange File Format
(AIFF). Popular temporal sampling rates for audio include 8 kHz
(kilohertz), 9.6 kHz, 10 kHz, 12 kHz, 16 kHz, 22.05 kHz and 44.1
kHz. Sampling rate impacts data hiding in that it puts an upper
bound on the usable portion of the frequency range. Generally, the
higher the sampling rate is, the higher the usable data space [108].
There are three problems which need to be considered while dealing
with audio files [109]:
Audio files in the Microsoft .wav (dot wave) format, the range
is mapped to (-1, 1). For processing these signals, each value
must be converted to integer format.
Information Hiding in Image and Audio Files 2007-2010
207
Human auditory system (HAS) is more sensitive than Human
visual system (HVS). Variations in the audio signal will be
easily perceived.
It is harder to manipulate audio signals than the digital
images.
Matlab supports two audio file formats: WAV and AU audio files.
WAV files: WAV (or WAVE) is also known as Audio for Windows. It is
a Microsoft and IBM audio file format standard for storing an audio
bit stream on PCs. It is the main format used on Windows systems
for raw and typically uncompressed audio. The usual bit stream
encoding is the Pulse Code Modulation (PCM) format. It supports
multi-channel data, with up to 32 bits per sample.
Au files: The Au file format is a simple audio file format introduced
by Sun Microsystems. Reading an .au file returns amplitude values
in the range [-1, +1]. It supports multichannel data in the following
formats:
8-bit mu-law
8-, 16-, and 32-bit linear
Floating-point
Here, WAV audio files are used for following reasons [110]:
It is the main format used on Windows systems for raw and
typically uncompressed audio.
It could digitize sounds 100% faithful to the original source,
thus maintaining maximum audio quality.
The wav file is very easy to edit and manipulate.
Information Hiding in Image and Audio Files 2007-2010
208
7.3. METHODS OF AUDIO STEGANOGRAPHY
Some commonly used methods of audio steganography are listed
and discussed below in brief.
1. Least Significant Bit (LSB) Coding
2. Parity Coding
3. Phase Encoding
4. Spread Spectrum
5. Echo Data Hiding
7.3.1 Least Significant Bit (LSB) Coding :It is one of the earliest
techniques studied in the information hiding of digital audio (as well
as other media types) is LSB coding. In this technique LSB of binary
sequence of each sample of digitized audio file is replaced with
binary equivalent of secret message [111,112]. The capacity is only
one bit per sample of the cover audio which could be less for many
applications.
7.3.2 Parity Coding: In[114] this method, Instead of breaking a
signal down into individual samples, the parity coding method
breaks a signal down into separate regions of samples and encodes
each bit from the secret message in a sample region's parity bit. If
the parity bit of a selected region does not match the secret bit to
be encoded, the process flips the LSB of one of the samples in the
region. Advantage: The sender has more of a choice in encoding the
secret bit, and the signal can be changed in a more unobtrusive
manner. Disadvantage: This method like LSB coding is not robust in
nature. The capacity remains the same as that of LSB method.
7.3.3 Phase Coding: Phase coding [114] relies on the fact that the
phase components of sound are not as perceptible to the human ear
Information Hiding in Image and Audio Files 2007-2010
209
as noise is. It “works by substituting the phase of an initial audio
segment with a reference phase that represents the data. The
phase of subsequent segments is then adjusted in order to preserve
the relative phase between segments”. Disadvantage: It is a
complex method and has low data transmission rate
7.3.4 Spread Spectrum (SS): It [95, 111] attempts to spread out
the encoded data across the available frequencies as much as
possible. This is analogous to a system using an implementation of
the LSB coding that randomly spreads the message bits over the
entire sound file. However, unlike LSB coding, the SS method
spreads the secret message over the sound file‟s frequency
spectrum, using a code that is independent of the actual signal. As
a result, the final signal occupies a bandwidth in excess of what is
actually required for transmission. Advantage: It offers moderate
data transmission rate while maintaining a high level of robustness.
Disadvantage: It can introduce noise into a sound file.
7.3.5 Echo data hiding : In [111]Text can be embedded in audio
data by introducing an echo to the original signal. The data is then
hidden by varying three parameters of the echo: initial amplitude,
decay rate, and offset. If only one echo is produced from the
original signal, then only one bit of information could be encoded.
7.4. TIME - DOMAIN METHODS
Data hiding in the least significant bits (LSBs) of audio sample in
the time domain is one of the simplest algorithms with very high
data rate of additional information [115]. This section discusses the
methods that have been implemented which use either least
significant bit or multiple least significant bits for hiding data. The
first method is simple LSB coding method in which only LSB, 2
LSB‟s, 4 LSB‟s and 8 LSB‟s have been used for hiding information in
Information Hiding in Image and Audio Files 2007-2010
210
the cover audio. The next two methods combine the LSB coding
method along with encryption method for hiding data. These two
methods ensure additional security of the hidden or secret data in
the host audio signal. The last method deals with different and
novel ways been proposed to increase the capacity of cover audio.
These methods increase the capacity of cover audio and also
maintain the perceptual quality of the audio signal.
The common performance measures used for all these methods are
MSE (Mean Squared Error), PSNR (Peak Signal-to- Noise Ratio) and
SNR (Signal-to-Noise Ratio). In addition to this, subjective listening
tests are also been performed to test the quality of the host audio
signals.
In all of these methods, the steps for data embedding and data
extraction procedures are explained considering an audio signal as
the secret message. Similar steps are followed when an image or
text is used as secret messages except that during extraction
procedure in case of images, every 8 bits will be transformed to a
byte and in case of text, every 7 bits will be transformed to form a
decimal value corresponding to a character.
For experimental results, 10 cover audio clips are used with 8 secret
messages. The audio clips are mono, wav audio files from different
genres of music (animal, speech, vocal, music) represented by 16
bits per sample. These 10 cover audio signals are with varying
sampling rates and varying duration as: Guitar (44100 Hz, 2
seconds), Triangle (44100 Hz, 4 seconds), Bugle (11025 Hz, 9
seconds), Speech1 (22050 Hz, 3 seconds), Speech (22050 Hz, 3
seconds), Alice (44100 Hz, 3 seconds), Trance (44100 Hz, 4
seconds), Echo (44100 Hz, 4 seconds), Birds (11025 Hz, 7 seconds)
and Faces (44100 Hz, 2 seconds).
Information Hiding in Image and Audio Files 2007-2010
211
Out of these 8 secret messages, 3 are audio clips, 2 are text files
and the remaining 3 secret messages are gray-scale images. The 3
audio clips as secret messages are of very short duration viz.,
Pingpong (22050 Hz), Chimes (11025 Hz) and Newmail (22050 Hz).
The text messages are taken from the excerpts of 2 famous
personalities viz., Abraham Lincoln and our former President Dr. A.
P J Abdul Kalam. These two text files have 2027 characters and
6826 in length respectively. The 3 images used are of standard
Lena and Baboon images along with a Logo image with 128*128
dimensions each.
Subjective quality evaluation of these methods has also been
carried out by performing listening tests involving ten people.
7.4.1 Least Significant Bit Coding:
Steps for Data embedding:
1. Read the cover audio file and a copy of the file is generated,
which can then be modified.
2. Read the secret message to be hidden, its size less than the
size of the cover audio signal (1/16 times the cover size if
only 1 LSB is to be used). Convert it into a binary sequence of
message bits.
3. The LSB of each sample of cover audio is replaced with the
message bits.
4. The modified cover samples are then written to a file forming
stego audio signal.
Steps for Data Extraction/ Reconstruction:
1. Read the stego audio file.
Information Hiding in Image and Audio Files 2007-2010
212
2. Extract the LSB of each sample of the audio file.
3. After every such 16 least significant bits are retrieved, they
are converted to their decimal equivalents.
4. Finally the secret signal is reconstructed.
Similar steps for embedding and extraction is applied for using 2
LSB‟s, 4 LSB‟s and 8 LSBs for hiding data.
7.4.2 LSB Coding with Encryption
Here instead of direct substituting the LSB the secret bit is first
encrypted and then is inserted in the Cover File. Later while
retrieving the decryption algorithm as to applied to get back the
secret message.
A. Considering Parity :
This method [48]combines the LSB coding and encryption technique
for hiding data in cover audio. In this method, instead of directly
replacing LSBs of digitized samples of cover audio with the message
bits, the method first checks the parity of the samples and then
carries out data embedding. The process of data embedding and
data retrieval is same as the method given in Section 3.2 of chapter
3. The only difference is here the parity for 16 bits is considered
instead of for 8 bits of the cover signal. The reason is that the
sample size is 16 bits.
B. Using XOR operation on LSB’s
This method [48] also like the earlier method combines the LSB
coding and encryption technique for hiding data in cover audio. This
method performs XOR operation on the LSBs and then depending
on the result of XOR operation and the message bit to be
embedded, the LSB of the sample is modified or kept unchanged.
The process of data embedding and data retrieval is same as the
Information Hiding in Image and Audio Files 2007-2010
213
method given in Section 3.1 of chapter 3. The only difference is
here 16 bits are considered instead of for 8 bits of the cover signal.
Here XOR operation is performed on 2 LSBs. The XOR operation can
be further expanded to 3 LSBs, 4 LSBs upto 16 LSBs so as to
increase the level of encryption. Figure 7.1 gives the tabular
representation of the data embedding process.
Table 7.1 Procedure for data embedding
LSB Bit next to LSB
XOR result
Action if message bit
is 0
Action if message bit
is 1
0 0 0 No Change Flip LSB
0 1 1 Flip LSB No Change
1 0 1 Flip LSB No Change
1 1 0 No Change Flip LSB
Results and Discussions:
Experiments have been carried out for Least Significant Bit method
by utilizing LSB, 2LSB‟s, 4 LSB‟s and 8 LSB‟s respectively of cover
samples for hiding data (audio, text, image). Table 7.2 gives the
results of the LSB method. Table 7.3 gives the results of the 2 LSB‟s
method. Table 7.4 gives the results of the 4 LSB‟s method. Table
7.5 gives the results of the 8 LSB‟s method. The entries for MSE,
PSNR and SNR for all cover signals in all these methods are the
average values taken considering all the secret messages
embedded in the cover audio.
Table 7.2 Results of LSB method
Cover MSE PSNR db SNR db BER
Guitar 3.58E-10 191.50 71.23 0.02404
Triangle 2.64E-10 193.32 69.47 0.01775
Bugle 3.49E-10 191.68 70.93 0.02342
Speech1 3.78E-10 191.09 87.69 0.02536
Speech2 3.78E-10 191.10 86.32 0.02535
Alice 2.84E-10 192.93 80.65 0.01906
Trance 2.68E-10 182.13 67.65 0.01804
Echo 2.69E-10 193.22 81.17 0.01805
Birds 3.67E-10 191.34 70.29 0.02463
Faces 3.35E-10 192.02 81.34 0.02245
Average 3.25E-10 191.03 76.67 0.02182
Remark: The maximum SNR value obtained is 87.69 db which is for a speech
signal when only LSB of cover sample is replaced by message bits
Information Hiding in Image and Audio Files 2007-2010
214
Table 7.3 Results of LSB2 method
Cover MSE PSNR SNR BER
Guitar 1.37E-09 186.22 65.95 0.03567
Triangle 9.37E-10 188.52 64.66 0.02363
Bugle 1.30E-09 186.55 65.81 0.03374
Speech1 1.57E-09 185.46 82.05 0.04136
Speech2 1.60E-09 185.36 80.58 0.04196
Alice 1.00E-09 188.22 75.94 0.02564
Trance 9.48E-10 188.55 62.96 0.02415
Echo 1.01E-09 187.89 75.85 0.02411
Birds 1.45E-09 185.92 64.87 0.03774
Faces 1.18E-09 187.12 76.45 0.03057
Average 1.24E-09 186.98 71.51 0.03186
Remark: The maximum SNR value obtained is 82.05 which is for a speech
signal when 2 LSB’s of cover sample are replaced by message bits
Table 7.4 Results of LSB4 method
Cover MSE PSNR SNR BER
Guitar 1.79E-08 176.03 55.76 0.04756
Triangle 1.08E-08 178.33 54.48 0.02686
Bugle 1.69E-08 176.37 55.62 0.04462
Speech1 2.05E-08 175.20 71.79 0.05538
Speech2 2.06E-08 175.11 70.33 0.05576
Alice 1.17E-08 178.39 66.11 0.03013
Trance 1.07E-08 178.77 53.18 0.02763
Echo 1.29E-08 177.10 65.05 0.02753
Birds 1.90E-08 175.70 54.66 0.05065
Faces 1.51E-08 177.08 66.40 0.03963
Average 1.56E-08 176.81 61.34 0.04057
Remark: The maximum SNR value obtained is 71.79 which is for a speech
signal when 4 LSB’s of cover sample are replaced by message bits
Table 7.5 Results of LSB8 method
Cover MSE PSNR SNR BER
Guitar 2.79E-06 154.64 34.36 0.05408
Triangle 1.52E-06 156.58 32.74 0.02682
Bugle 2.56E-06 155.04 34.28 0.04941
Speech1 3.52E-06 153.74 50.24 0.06873
Speech2 3.58E-06 153.56 48.78 0.06955
Alice 1.56E-06 157.23 44.94 0.03011
Trance 1.44E-06 157.48 31.89 0.02763
Echo 1.89E-06 155.45 43.41 0.02743
Birds 3.04E-06 154.25 33.22 0.05909
Faces 2.14E-06 155.80 45.13 0.04148
Average 2.40E-06 155.37 39.90 0.04543
Remark: The maximum SNR value obtained is 50.24 which is for a speech
signal when 8 LSB’s of cover sample is replaced by message bits
Information Hiding in Image and Audio Files 2007-2010
215
From listening tests, it has been observed that till 4 LSB‟s been
used for data embedding; there is no audible distortion in the host
audio signal. Thus, the perceptual quality of the host audio is good.
However, as more than 4 LSBs is used for hiding data, there is a
hissing sound introduced in the host audio signal. Thus, the
perceptual quality of the host audio signal deteriorates as the
number of LSB‟s are increased for hiding data.
Table 7.6 gives the results of the method considering parity. In this
table, the values against the entries named audio, text and image in
the secret column represents the average values of the three secret
audio clips, 2 text files and 3 images respectively for each cover
signal.
Figure 7.2 is the plot of the audio signal (Pingpong) which is the
secret message hidden in the cover audio signals. Figure 7.3 is the
plot of the secret signal retrieved according to the extraction
process given in 7.4.2 A. It can be seen from both these figures
that there is no difference between the original and the retrieved
message, thereby assuring that the recovery is 100%.
Figure 7.4 is the plot of the message retrieved when the LSBs of the
stego signal are extracted directly. This indicates that the direct
extraction of LSBs will only result in noise if embedding is done
using parity method, thereby increasing security.Figure 7.5 shows
the plot of image of Lena which was used as a secret message.
Figure 7.6 shows the plot of retrieved image of Lena using the
extraction process mentioned in 7.4.2 B.
Figure 7.7 shows the plot when the LSBs of the stego audio signal
are extracted directly without applying the extraction process which
clearly indicates that it results in noise.
Information Hiding in Image and Audio Files 2007-2010
216
Table 7.6 Results of Proposed method (considering parity)
Cover Secret MSE PSNR SNR BER
Guitar
audio 4.49E-10 189.81 69.53 0.03015
Text 1.60E-10 195.02 74.74 0.01074
image 4.22E-10 189.64 69.37 0.03126
Triangle
audio 3.80E-10 190.81 66.95 0.02556
Text 7.91E-11 198.1 74.25 0.00529
image 3.79E-10 191.07 67.22 0.02248
Bugle
audio 4.38E-10 189.93 69.17 0.02941
Text 1.46E-10 195.46 74.7 0.00977
image 4.66E-10 189.63 68.88 0.03131
Speech1
audio 4.66E-10 189.64 86.23 0.03129
Text 2.01E-10 194.04 90.65 0.01349
image 4.65E-10 189.65 86.25 0.03117
Speech2
audio 4.67E-10 189.63 84.85 0.03133
Text 2.04E-10 193.98 89.2 0.01367
image 4.33E-10 189.66 84.87 0.03115
Alice
audio 3.89E-10 190.64 78.36 0.02612
Text 8.84E-11 197.62 85.34 0.00593
image 3.66E-10 190.58 78.30 0.02518
Trance
audio 3.82E-10 190.77 65.18 0.02561
Text 8.15E-11 197.98 72.39 0.00546
image 3.44E-10 190.94 65.35 0.02316
Echo
audio 3.86E-10 190.73 78.69 0.02587
Text 8.21E-11 197.94 85.9 0.00551
image 3.86E-10 190.96 78.91 0.02307
Birds
audio 4.61E-10 189.69 68.65 0.03092
Text 1.75E-10 194.63 73.59 0.01176
image 4.66E-10 189.64 68.6 0.03128
Faces
audio 4.17E-10 190.19 79.52 0.02795
Text 1.22E-10 196.22 85.55 0.00818
image 3.10E-10 189.64 78.97 0.03125
Average 3.20E-10 192.14 76.67 0.02185
Remark: From this method, the maximum SNR value obtained is 90.65 for a
speech signal. The average of the average of these values for SNR comes out to
be 76.67 which is equivalent to the SNR value obtained from standard LSB
method as shown in Table 7.2.
Figure 7.2 Plot of secret signal used in parity method
Figure 7.3 Plot of secret signal retrieved using the parity method
Figure 7.4 Plot of signal retrieved by extracting LSBs directly which looks like noise
Figure 7.5 Original Secret Image
Figure 7.6 Retrieved Secret Image using Parity method
Figure 7.7 Image retrieved by extracting LSBs directly which looks like noise
Information Hiding in Image and Audio Files 2007-2010
217
Table 7.7 gives the results of the method considering XOR method.
Table 7.7 Results of XOR method
Cover Secret MSE PSNR SNR BER
Guitar
audio 4.50E-10 189.8 69.53 0.03021
Text 1.60E-10 195.04 74.77 0.01073
image 4.66E-10 189.64 69.37 0.03127
Triangle
audio 3.78E-10 190.83 66.98 0.02539
Text 8.03E-11 198.04 74.2 0.00536
image 3.36E-10 191.06 67.21 0.02257
Bugle
audio 4.36E-10 189.95 69.19 0.02928
Text 1.47E-10 195.41 74.66 0.00984
image 4.65E-10 189.65 68.90 0.03119
Speech1
audio 4.67E-10 189.63 86.23 0.03132
Text 2.01E-10 194.06 90.65 0.01346
image 4.66E-10 189.64 86.23 0.03127
Speech2
audio 4.66E-10 189.64 84.85 0.03128
Text 2.05E-10 193.94 89.16 0.01376
image 4.67E-10 189.63 84.85 0.03135
Alice
audio 3.88E-10 190.65 81.33 0.02603
Text 8.87E-11 197.59 85.31 0.00595
image 3.74E-10 190.59 78.31 0.02513
Trance
audio 3.81E-10 190.78 65.19 0.02556
Text 8.12E-11 198 72.41 0.00545
image 3.46E-10 190.93 65.35 0.02321
Echo
audio 3.74E-10 190.84 78.8 0.02509
Text 8.28E-11 197.91 85.87 0.00555
image 3.49E-10 190.90 78.86 0.02339
Birds
audio 4.61E-10 189.68 68.64 0.03096
Text 1.75E-10 194.65 73.61 0.01174
image 4.66E-10 189.64 68.6 0.03127
Faces
audio 4.17E-10 190.18 79.51 0.02799
Text 1.22E-10 196.22 85.55 0.00817
image 4.65E-10 189.64 78.97 0.03124
Average 3.25E-10 192.13 76.76 0.02184
Remark: From this method, the maximum SNR value obtained is 90.65 for a
speech signal. The values for MSE, PSNR and SNR obtained from this method are
almost equal to the values obtained from parity method as shown in Table 7.6.
7.4.3 Methods to Increase the Capacity of Cover Audio
The use of only one LSB of the host audio sample gives a capacity
equivalent to the sampling rate which could vary from 8 kbps to
44.1 kbps (if all samples used). However, adjusting of LSBs of audio
samples introduces noise that becomes audible as number of LSBs
used for hiding data increases [114]. Thus, there is a limit for the
depth of the LSB layer in each sample of host audio that can be
used for data hiding. It is seen that the maximum number of bits
that can be used for LSB audio steganography without causing
noticeable perceptual distortion to the host audio signal is 4 LSBs, if
16 bits per sample audio sequences are used [113].
Information Hiding in Image and Audio Files 2007-2010
218
Thus, the methods proposed in this section attempt to increase the
capacity of the cover audio while maintaining the perceptual quality
of the host audio. The first method proposed in this section is based
on the magnitude of samples of cover audio. Depending on the
magnitude values, multiple and variable LSB‟s are used for data
hiding. Experimental results show that this method does not give
good results in terms of either increasing the capacity of the cover
audio or in maintaining the perceptual quality of the host audio.
This motivated to come up with better approaches as compared to
not only the earlier method but also the other existing approaches.
Here, three novel methods have been proposed.
The next method is an extension of the XOR method discussed in
7.4.2(B). In this, the XOR operation is performed on different
combinations of bits in the samples of cover audio and then 8 LSB‟s
of each cover sample is used for hiding data. The last 2 methods
are based on checking the Most Significant Bits MSB‟s of the
samples of cover audio and depending upon the values of MSB‟s of
the corresponding samples, the number of LSB‟s for data hiding is
decided. In this way, multiple and variable LSB‟s are used for
embedding secret data. These proposed methods remarkably
increase the capacity for data hiding as compared to the standard
LSB method without causing any noticeable perceptual distortion to
the host audio signal. In all these methods, the increase in capacity
of cover audio is compared with the original capacity which is
considered to be 4 LSB‟s for each sample of the cover audio.
A. Considering Magnitude of Samples of Cover audio
In this, multiple and variable number of LSBs are used for hiding
data based on the magnitude of the sample values.
Information Hiding in Image and Audio Files 2007-2010
219
It is observed that the magnitude of the samples of cover audio is
such that a maximum of 4 LSBs can be used for hiding. Thus, in
order to implement the proposed approach, the cover audio
samples are multiplied by a constant factor of 2. This is done so as
to increase the magnitude of the samples. Depending on the
magnitude of the samples, number of LSB‟s is decided to be used
for hiding purpose.
If the first 6 MSBs are ones, then use all 6 LSBs for data
embedding.
If the first 5 MSBs are ones, then use all 5 LSBs for data
embedding.
If the first 4 or less than 4 MSBs are ones, then use all 4 LSBs
for data embedding.
Here P1, P2 and P3 are the probabilities of the samples with 6 MSBs
as ones, 5 MSBs as ones and 4 or less than 4 MSBs as ones
respectively.
Here C = 4 bits per sample
B. Considering the XOR operation:
The primary merit of the XOR operation is that it is simple to
implement and it is computationally inexpensive. Hence, the LSB
coding using XOR method discussed in 7.4.2 (A) can be extended
further by utilizing more than just the LSB for data embedding. The
method can be modified so as to utilize multiple LSBs for data
embedding.
Capacity by the proposed method: CP = P1*6 + P2*5 + P3*4 (1)
Percentage Increase in Capacity = (CP/C)*100 (2)
Information Hiding in Image and Audio Files 2007-2010
220
One such approach proposed here uses 8 LSBs considering XOR
operation on different combinations of bits. In this, XOR is
performed on 16th bit and 8th bit, 15th bit and 7th bit, 14th bit
and 6th bit, 13th bit and 5th bit, 12th bit and 4th bit, 11th bit and
3rd bit, 10th bit and 2nd bit, and 9th bit and 1st bit. Depending
upon the result of XOR operation and the message bit to be
embedded, the 8 LSB‟s of the digitized sample of cover audio can be
used for data embedding. This increases the capacity of the cover
for data embedding by 8 times as compared to the earlier method
considering XOR operation which uses only LSB for hiding the data.
To clearly understand the above mentioned approach, an example is
presented below.
Consider the bits in the binary representation of a sample of cover
audio and the message bit to be embedded is as given below. Table
7.8 gives the tabular representation of the procedure for data
embedding using above approach.
Original Sample bits: 1000000000000001
Message bits: 10100010
Table 7.8 Data embedding procedure for multiple LSBs
Bit 1 Bit 2 XOR result
Message Bit
Action
1 (16) 0 (8) 1 1 No change
0 (15) 0 (7) 0 0 No change
0 (14) 0 (6) 0 1 Flip Bit 2
0 (13) 0 (5) 0 0 No change
0 (12) 0 (4) 0 0 No change
0 (11) 0 (3) 0 0 No change
0 (10) 0 (2) 0 1 Flip Bit 2
0 (9) 1 (1) 1 0 Flip Bit 2
Modified Sample bits: 1000000000100010
In the above table, the numbers in the brackets in the first 2
columns indicated the position of the bits of digitized samples of
cover audio. As can be seen from the table, action takes place on
Information Hiding in Image and Audio Files 2007-2010
221
bit2 of second column; it is either flipped or unchanged as shown in
last column, as these bits form the 8 LSBs.
The retrieval of bits is done by performing XOR operation on bits as
done in the embedding process, and the result of the XOR operation
will give the message bits back.
The MSE value for the given example where three bits have been
changed comes out to be 9.39e-07. If we assume that all 8 bits are
changed during the data embedding process, then the MSE value is
6.08e-05. However, all 8 bits being changed during embedding has
very least probability of occurrence. Hence, this proposed method
proves to be better in increasing the capacity of cover audio to
embed additional information.
C. Considering 2 MSB’s:
This method checks the values of the first 2 Most Significant Bits
(MSB‟s) of the digitized samples of the cover audio for data
embedding. Table 7.9 gives the tabular representation of the data
embedding procedure. The steps for data embedding and data
retrieval are as follows:
Steps for Data embedding:
1. Read the cover audio signal.
2. Read the audio signal to be embedded. Convert it into a
sequence of binary bits.
3. Every message bit from step 2 is embedded into the variable
and multiple LSBs of the samples of the digitized cover audio.
4. For embedding purpose, first 2 MSBs of cover samples are
checked:
if they are „00‟, then use 4 LSBs for data embedding.
Information Hiding in Image and Audio Files 2007-2010
222
if they are „01‟, then use 5 LSBs for data embedding.
if they are „10‟, then use 6 LSBs for data embedding.
if they are „11‟, then use 7 LSBs for data embedding.
5. The modified cover audio samples are then written to the file
forming the stego audio signal.
Table 7.9 Data embedding procedure for proposed method using 2 MSBs
MSB1
MSB2
No. of LSBs used for
data embedding
0 0 4
0 1 5
1 0 6
1 1 7
Steps for Data Retrieval:
1. Read the Stego audio signal.
2. Retrieval of message bits is done by checking the first 2 MSBs
of the samples.
if they are „00‟, then retrieve 4 LSBs
if they are „01‟, then retrieve 5 LSBs
if they are „10‟, then retrieve 6 LSBs
if they are „11‟, then retrieve 7 LSBs
3. After every such 16 message bits are retrieved, they are
converted into their decimal equivalents and finally the secret
audio signal is reconstructed.
The capacity by the proposed method is computed using Eq. (1).
Here CP = increase in capacity; P1, P2, P3 and P4 are the
probabilities of the samples with the first 2 MSBs as „11‟, first 2
MSBs as „10‟, first 2 MSBs as „01‟, and first 2 MSBs as „00‟
respectively.
CP = P1*7 + P2*6 + P3*5 + P4*4 (1)
Information Hiding in Image and Audio Files 2007-2010
223
The percentage increase in capacity is given by Eq. (2)
Here C = 4 bits per sample
Assuming that all the four probabilities P1, P2, P3 and P4 are 0.25
each, that is they are equi-probable, then the capacity will be as
given in Eq. (3)
Here ECP = Estimated increase in capacity
The above Eq. (3) gives the estimate of increase in capacity of
cover audio for given method.
Table 7.10 gives the percentage distribution of samples for cover
audio signals used for this method.
Table 7.10 Distribution of samples using proposed method (2 MSBs)
Cover signal
% of samples
with first 2
MSBs as ‘11’
% of samples
with first 2
MSBs as ‘10’
% of samples
with first 2
MSBs as ‘01’
% of samples
with first 2
MSBs as ‘00’
Guitar 0.44 45.47 53.80 0.29
Triangle 0.11 57.39 42.42 0.08
Bugle 0 86.04 13.96 0
Speech1 10.46 34.34 44.50 10.71
Speech2 8.59 28.10 53.74 9.57
Alice 0.12 49.92 49.86 0.10
Trance 0 50.21 49.79 0
Echo 0 10.40 89.60 0
Birds 0 49.90 50.10 0
Faces 0.51 48.39 50.65 0.45
Average 2.02 46.02 49.84 2.12
Percentage Increase in Capacity = (CP/C)*10 (2)
ECP = 0.25*7 + 0.25*6 + 0.25*5 + 0.25*4 = 5.5 (3)
Information Hiding in Image and Audio Files 2007-2010
224
The last row in Table 7.10 gives the average values of the
percentage of samples for the 4 combinations of first 2 MSBs. It can
be seen from Table 7.10, that most of the samples have their first 2
MSBs as „10‟ and „01‟. The average percentage of samples with the
first 2 MSBs as „10‟ is 46.02. The average percentage of samples
with the first 2 MSBs as „01‟ is 49.84. Both these combinations
contribute to almost 95% of the total number of samples. The other
two combinations of MSBs as „11‟ and „00‟ contribute to very low
percentage of total number of samples and thereby, it can be
neglected. Hence, only the MSB of the samples can be considered.
Considering only MSB being 1 will include the combination of „11‟
and „10‟. Considering only MSB being 0 will include the combination
of „00‟ and „01‟. This clearly indicates that just by looking at the
Most Significant Bit (MSB) of the digitized samples of cover would
suffice to extend and further simplify the logic of the method.
D. Considering 1 MSB:
This method is an extension of the above method. This method
considers the value of the only Most Significant Bit (MSB) of the
digitized samples of cover audio for data hiding. Table 7.11 gives
the tabular representation of the embedding procedure. The steps
for data embedding and extraction are also explained in detail as
follows:
Steps for Data embedding:
1. Read the cover audio signal.
2. Read the secret message to be embedded. Convert it into a
sequence of binary bits.
3. Every message bit from step 2 is embedded into the variable
and multiple LSBs of the samples of the digitized cover audio.
4. For embedding purpose, the MSB of cover samples is
checked:
Information Hiding in Image and Audio Files 2007-2010
225
If MSB is „0‟, then use 6 LSBs for data embedding
If MSB is „1‟, then use 7 LSBs for data embedding
5. The modified cover audio samples are then written to the file
forming the stego audio signal.
Table 7.11 Embedding procedure for proposed method using MSB
MSB
No. of LSBs used for
data embedding
0 6
1 7
Steps for Data Retrieval:
1. Read the stego audio signal.
2. Retrieval of message bits is done by checking the MSB of the
samples.
If MSB of the sample is „0‟ then retrieve 6 LSBs
If MSB of the sample is „1‟, then retrieve 7 LSBs
3. After every such 16 message bits are retrieved, they are
converted into their decimal equivalents and finally the secret
audio signal is reconstructed.
The capacity by the proposed method is computed using Eq. (4).
Here P1 and P2 are the probabilities of the samples with MSB value
as „1‟ and „0‟ respectively.
The percentage increase in capacity is given by Eq. (5)
Here C = 4 bits per sample.
Assuming that the probabilities P1 and P2 are 0.5 each, that is, they
are equi-probable, then the capacity will be as given in Eq. (6).
Here ECP = Estimated increase in capacity.
CP = P1*7 + P2*6 (4)
Percentage Increase in Capacity = (CP/C)*100 (5)
ECP = 0.5*7 + 0.5*6 = 6.5 (6)
Information Hiding in Image and Audio Files 2007-2010
226
The above Eq. (6) gives the estimate of increase in capacity of
cover audio for given method.
Results and Discussions:
In experimental results of these methods, there are 2 additional
performance measures such as incr_cap (increase in capacity in
terms of bits per sample) and % incr_cap (percentage increase in
capacity). Table 7.12 gives the results of the method which
considers magnitude of samples.
Table 7.12 Results of method considering magnitude of samples of cover audio
Cover Signal
Secret Signal MSE PSNR Increased Capacity
% Increased capacity
BER
Guitar
Audio 0.00022 132.73 4.0007 100.019 0.34396
Text 0.00023 132.73 4.0058 100.146 0.34395
Image 0.00023 132.73 4.0009 100.024 0.34386
Triangle
Audio 2.39E-05 142.54 4.0003 100.007 0.32832
Text 2.39E-05 142.54 4.0021 100.054 0.32866
Image 2.39E-05 142.54 4.0003 100.009 0.32872
Bugle
Audio 3.26E-08 172.15 4.0000 100.000 0.38189
Text 3.15E-09 182.12 4.0000 100.000 0.38166
Image 1.42E-08 174.83 4.0000 100.000 0.38164
Speech1
Audio 0.09742 106.44 4.3020 107.550 0.44762
Text 0.09731 106.44 4.2321 105.803 0.44749
Image 0.09735 106.44 4.3002 107.504 0.44765
Speech2
Audio 6.22E-02 108.39 4.2152 105.379 0.44477
Text 0.06222 108.38 4.0942 102.356 0.44468
Image 0.06218 108.39 4.2291 105.728 0.44488
Alice
Audio 0.00096 126.45 4.0159 100.400 0.42863
Text 0.00096 126.46 4.0086 100.215 0.42859
Image 0.00097 126.46 4.0148 100.370 0.42850
Trance
audio 2.17E-08 174.25 4.0000 100.000 0.36265
Text 1.77E-09 184.63 4.0000 100.000 0.36273
image 8.08E-09 177.31 4.0000 100.000 0.36282
Echo
audio 0.00234 122.62 4.0020 100.050 0.40077
Text 0.00234 122.62 4.0000 100.000 0.39822
image 0.00234 122.62 4.0000 100.000 0.40391
Birds
audio 3.43E-08 171.75 4.0000 100.000 0.38578
Text 3.81E-09 181.29 4.0000 100.000 0.38522
image 1.71E-08 174.05 4.0000 100.000 0.38529
Faces
audio 0.00097 126.44 4.0034 100.087 0.43654
Text 0.00097 126.44 4.0024 100.062 0.43645
image 0.00097 126.44 4.0026 100.060 0.43628
Average 0.00032 134.91 4.0043 100.108 0.39607
Remark: The maximum increase in capacity obtained using this method is 4.3 bits per
sample as compared to original capacity which is considered to be 4 bits per sample.
Information Hiding in Image and Audio Files 2007-2010
227
Table 7.13 gives the results of the XOR method with multiple LSB‟s.
Table 7.13 Results of XOR method with multiple LSB’s
Cover Secret MSE PSNR SNR BER
Guitar
Audio 4.88E-06 150.3 30.03 0.10626
Text 4.41E-07 160.63 40.36 0.01064
Image 2.09E-06 153.13 32.87 0.04548
Triangle
Audio 2.55E-06 152.97 29.13 0.05233
Text 3.24E-07 161.63 37.78 0.00536
Image 1.26E-06 155.30 31.45 0.02302
Bugle
Audio 4.48E-06 150.61 29.85 0.09655
Text 3.92E-07 161.12 40.37 0.00973
Image 1.95E-06 153.45 32.70 0.04141
Speech1
Audio 5.67E-06 149.71 46.31 0.13415
Text 5.46E-07 159.73 56.33 0.01353
Image 2.48E-06 152.38 48.98 0.05719
Speech2
Audio 5.96E-06 149.44 44.66 0.13674
Text 5.61E-07 159.62 54.84 0.01382
Image 2.66E-06 152.08 47.30 0.05821
Alice
Audio 2.66E-06 152.88 40.6 0.05921
Text 2.34E-07 163.35 51.073 0.00597
Image 1.18E-06 155.63 43.35 0.02514
Trance
Audio 2.56E-06 153.04 27.45 0.05433
Text 2.26E-07 163.56 37.97 0.00545
Image 1.11E-06 155.86 30.28 0.02321
Echo
audio 3.08E-06 151.96 39.92 0.05205
Text 4.13E-07 160.97 48.93 0.00557
Image 1.78E-06 153.83 41.79 0.02519
Birds
audio 5.39E-06 149.78 28.74 0.11608
Text 4.78E-07 160.26 39.22 0.01175
Image 2.35E-06 152.64 31.61 0.04945
Faces
audio 3.65E-06 151.54 40.86 0.08153
Text 3.35E-07 161.80 51.13 0.00821
Image 1.61E-06 154.27 43.59 0.03470
Average 2.11E-06 155.45 39.98 0.04541
Remark: It can be seen from the above table that the MSE values are better
than the estimated MSE value 6.08e-05 which considers all 8 bits being changed
during the embedding process.
Information Hiding in Image and Audio Files 2007-2010
228
Table 7.14 gives the results of the method considering 2 MSB‟s.
Table 7.14 Results of proposed method (considering 2 MSBs)
Cover Signal
Secret Signal
MSE
PSNR SNR Incr_cap %
Incr_cap BER
Guitar
Audio 3.04E-07 162.71 42.44 5.53 138.30 0.10241
Text 2.70E-08 172.8 52.53 5.51 137.95 0.01073
Image 1.13E-07 165.81 45.54 5.52 138.21 0.04538
Triangle
Audio 1.60E-07 165.38 41.54 5.53 138.34 0.05243
Text 2.31E-08 172.91 49.06 5.8 145.02 0.00537
Image 6.60E-08 168.17 44.33 5.52 138.65 0.02273
Bugle
Audio 3.66E-07 162.01 41.25 5.82 145.64 0.09702
Text 3.12E-08 172.16 51.41 5.90 147.81 0.00984
Image 1.32E-07 165.15 44.4 5.84 146.14 0.04125
Speech1
Audio 5.24E-07 160.29 56.88 5.45 136.28 0.11361
Text 4.89E-08 170.56 67.15 5.39 135.02 0.01343
Image 2.17E-07 163 59.59 5.44 136.19 0.05706
Speech2
Audio 4.63E-07 160.76 55.98 5.41 135.41 0.11443
Text 2.78E-08 173.1 68.32 5.26 131.7 0.01373
Image 1.92E-07 163.52 58.74 5.4 135.03 0.05815
Alice
Audio 1.79E-07 165.19 52.91 5.5 137.62 0.05909
Text 1.41E-08 175.62 63.34 5.475 136.99 0.00594
Image 6.26E-08 168.39 56.11 5.5 137.79 0.02521
Trance
Audio 1.59E-07 165.68 40.09 5.49 137.55 0.05437
Text 1.29E-08 175.94 50.36 5.5 137.63 0.00548
Image 5.59E-08 168.88 43.29 5.49 137.49 0.02312
Echo
Audio 1.82E-07 164.36 52.32 5.67 141.84 0.05215
Text 3.17E-08 172.13 60.09 6 150 0.00561
Image 1.11E-07 165.98 53.94 5.75 143.83 0.02471
Birds
Audio 3.07E-07 162.61 41.57 5.49 137.49 0.10632
Text 2.70E-08 172.71 51.67 5.5 137.59 0.01177
Image 1.20E-07 165.57 44.53 5.49 137.49 0.04956
Faces
Audio 2.42E-07 163.86 53.19 5.48 137.32 0.08163
Text 1.99E-08 174.04 63.37 5.51 137.92 0.00825
Image 8.40E-08 167.11 56.44 5.49 137.28 0.03473
Average 1.43E-07 167.55 52.08 5.55 139.05 0.04352
Remark: It can be seen from the table that the highest increase in capacity
obtained is 6bits per sample. The estimated increase in capacity for this method
was calculated to be (4+5+6+7)/4 =5.5 from Eq. (3) given in (C) of this section.
From the table, it is obvious that for all cover signals, the increase in capacity is
either close or more than this estimated value. The average value for increase in
capacity comes out to be 5.55.
Information Hiding in Image and Audio Files 2007-2010
229
Table 7.15 gives the results of the method considering only 1 MSB.
Table 7.15 Results of proposed method (considering 1 MSB)
Cover Signal
Secret Signal
MSE
PSNR (db)
SNR Increased Capacity in bits
% Increased
in Capacity
BER
Guitar
Audio 1.07E-06 157.35 37.08 6.52 163.14 0.10648
Text 8.83E-08 167.67 47.40 6.51 163.13 0.01072
Image 3.89E-07 160.46 40.19 6.53 163.43 0.04536
Triangle
Audio 5.44E-07 159.99 36.14 6.53 163.51 0.05226
Text 6.99E-08 168.2 44.35 6.82 170.66 0.00535
Image 2.31E-07 162.73 38.88 6.53 163.85 0.02269
Bugle
Audio 1.26E-06 156.61 35.85 6.83 170.95 0.09703
Text 1.02E-07 167.13 46.38 6.91 172.93 0.00966
Image 4.56E-07 159.77 39.02 6.84 171.22 0.04122
Speech1
Audio 1.16E-06 156.88 53.47 6.45 161.40 0.12397
Text 9.69E-08 167.23 63.82 6.40 160.26 0.01355
Image 4.47E-07 159.85 56.45 6.45 161.3 0.05713
Speech2
Audio 1.13E-06 157.01 52.23 6.41 160.40 0.12467
Text 8.30E-08 168.2 63.42 6.26 156.55 0.01368
Image 4.26E-07 160.07 55.29 6.39 159.78 0.05808
Alice
Audio 5.88E-07 160.01 47.73 6.50 162.63 0.05908
Text 4.52E-08 170.56 58.27 6.47 161.91 0.00595
Image 2.09E-07 163.14 50.86 6.50 162.75 0.02524
Trance
Audio 5.31E-07 160.42 34.84 6.49 162.57 0.05415
Text 4.44E-08 170.65 45.07 6.505 162.78 0.00547
Image 1.90E-07 163.57 37.98 6.5 162.61 0.02308
Echo
Audio 6.41E-07 158.72 46.68 6.69 167.84 0.05182
Text 9.63E-08 167.38 55.33 7 175 0.00567
Image 4.31E-07 160.12 48.08 6.77 169.28 0.02509
Birds
Audio 1.14E-06 157.10 36.06 6.49 162.47 0.11600
Text 9.43E-08 167.4 46.36 6.49 162.56 0.01166
Image 4.04E-07 160.29 39.25 6.49 162.49 0.04925
Faces
Audio 8.04E-07 158.62 47.95 6.49 162.36 0.08148
Text 6.65E-08 168.77 58.09 6.52 163.08 0.00817
Image 2.84E-07 161.82 51.15 6.49 162.39 0.03470
Average 4.37E-07 162.59 47.12 6.56 164.17 0.04462
Remark: It can be seen from the table that the highest increase in capacity
obtained is 7. The estimated increase in capacity for this method was calculated
to be 6.5 from Eq. (3) given in (D) of this section. From the table, it is obvious
that for all cover signals, the increase in capacity is either close or more than this
estimated value. The average value for increase in capacity comes out to be 6.56.
Information Hiding in Image and Audio Files 2007-2010
230
Figure 7.8 shows the plotting of the cover audio signal (Trance).
Figure 7.9 shows the plotting of the stego signal obtained after
applying the method (considering 2 MSBs) and Figure 7.10 shows
the plotting of the stego signal obtained after applying the method
(considering MSB). From the figures, no difference is found in the
stego signals obtained from either of the methods as compared to
the original or cover audio signal.
Figure 7.8 Plot of Cover
Audio Signal
Figure 7.9 Plot of Stego
audio signal (Using 2MSBs)
Figure 7.10 Plot of Stego
audio signal (Using MSB)
Discussion
In Time-domain, LSB method has been implemented for LSB, 2
LSBs, 4 LSBs and 8 LSBs for hiding data in cover audio. It is seen
from results that if the number of LSBs for data embedding exceeds
4 LSBs then there is some audible distortion in the host audio
signal.
In order to increase the security, two methods using LSB coding
along with encryption to hide information (audio, image and text) in
digital audio files have been proposed. In the first method the
information is hidden by altering LSBs indirectly considering parity.
In the second method, information is hidden based on the result of
XOR operation of LSBs and the message bit to be embedded. In
both these methods, direct LSB extraction will only result in noise.
Thus, by using encryption along with steganography, these methods
provide an additional level of security. From experimental results, it
is seen that the proposed methods are effective. From listening
Information Hiding in Image and Audio Files 2007-2010
231
tests, no difference is found between the original audio signal and
the stego audio signal. The hidden information is recovered without
any error.
In order to increase the capacity of the cover audio, several
methods have been proposed. The first method uses multiple and
variable LSBs for hiding data considering the magnitude of the
cover samples. Experimental results of this method has shown that
the method do not succeed in increasing the capacity of the cover
audio. So, three novel approaches to increase the capacity have
been proposed and they give good results. The first method among
the three is based on XOR operation performed on different
combination of bits and uses last 8 LSB‟s of cover samples for
hiding data. The other two methods embed data in multiple and
variable LSBs depending on the MSBs of the cover audio samples.
The first method checks the first 2 MSBs of the cover samples. The
second method is an extension of the first method and checks only
the one MSB of the cover samples. From results, it is seen that
there is a remarkable increase in capacity of cover audio for hiding
additional data and without affecting the perceptual transparency of
the host audio signal.
Using the first method considering the XOR operation, the results
obtained are much better than the estimated results. Considering 2
MSBs, the average increase in capacity is to 5.55 (bits per sample)
as compared to the original capacity i.e., 4 bits per sample.
Considering MSB, the average increase in capacity is to 6.56 (bits
per sample) as compared to the original capacity i.e., 4 bits per
sample.
From subjective listening tests, it has been seen that there is no
noticeable difference in the perceptual quality of the stego audio
Information Hiding in Image and Audio Files 2007-2010
232
signals obtained from the proposed methods and the cover audio
signal. The main advantages of the proposed methods are that they
are simple in logic and the hidden information is recovered without
any error. Thus it succeeds in attaining the basic requirements of
data hiding. The steganalysis of the proposed methods are more
challenging as well, because there is varied number of bits flipped in
audio samples and the adversary cannot identify exactly how many
bits are used for hiding the data.
7.5 Transform-based Methods
Transform based method embeds secret information by modifying
transform coefficients of the cover object. It is seen that the
transform based method has the potential to achieve higher payload
capacity and is more robust than LSB method.
Here, two transforms i.e., DCT and Haar transform have been used
for hiding data in cover audio signals. For experimental results, 3
audio signals (Guitar, Triangle and Bugle) as cover and 3 audio
signals (Pingpong, Chimes and Newmail) as secret messaged have
been used. The performance measures used are MSE, PSNR and
BER between cover and stego audio signal and MSE between
original secret message and the retrieved secret message.
Using DCT transform, three different methods have been
implemented. In the first 2 methods, DCT is applied to blocks of 8
samples each. However after looking at the results obtained which
are not favorable, the block size has been increased to blocks of 64
samples each.
Information Hiding in Image and Audio Files 2007-2010
233
The last method using DCT is based on considering various dividing
factors (2, 5, 10, 20, 25, 50, 75, and 100). The similar approach
has been applied using Haar transform as well.
7.5.1 DCT Transform
Discrete Cosine Transform (DCT) is a very popular technique for
Data compression as it gives maximum energy compaction.
Therefore there is more possibility of hiding more data than other
transforms. Here 3 different are proposed for hiding Information
using DCT.
7.5.1.1 Method 1:
In this method, DCT is applied to blocks of 8 samples each of cover
and secret audio signals. Every 8th DCT coefficient of cover is
replaced by a DCT coefficient of the secret message. The steps
involved in data embedding and extraction are discussed in detail
below:
Steps for Data embedding:
1. Read the cover and secret audio file.
2. Split both the files into blocks of 8 samples each. Each block
of the cover audio is used for data embedding.
3. Apply DCT to blocks of 8 samples of both the files.
4. Each 8th DCT coefficient of each block of the cover audio is
replaced by the DCT coefficient of the secret message.
5. Apply inverse DCT to the modified DCT coefficients of each
block.
6. The resultant coefficients are then written back to the file.
This becomes the stego audio file.
Information Hiding in Image and Audio Files 2007-2010
234
Steps for Data Extraction/ Retrieval:
1. Read the stego audio file.
2. Split the file into block of 8 samples each and apply DCT to it.
3. Extract every 8th DCT coefficient from each block.
4. Group each such coefficient into blocks of 8 and apply inverse
DCT to it
5. The resultant coefficients are written back to the file.
This becomes the retrieved secret message.
7.5.1.2 Method 2:
In this method, DCT is applied to blocks of 8 samples each of cover
and secret audio. Unlike the earlier method where the secret
coefficients are directly embedded in every 8th cover coefficients,
here the secret coefficients are first modified and then embedded.
The process of data embedding and extraction are discussed below.
Steps for Data embedding:
1. Read the cover and secret audio files.
2. Split both the files into blocks of 8 samples each. Each block
of the cover audio is used for data embedding.
3. Apply DCT to blocks of 8 samples of both the files.
4. The maximum coefficient value in each secret block is divided
by 2. The value is then multiplied with all other coefficients in
their respective blocks. The computed values from each block
are saved and used in the decoding process.
5. The 8th DCT coefficient of each block of the cover audio is
replaced by each DCT coefficient of the secret audio file.
6. Apply inverse DCT to the modified DCT coefficients of each
block.
7. The resultant coefficients are then written back to the file.
Information Hiding in Image and Audio Files 2007-2010
235
Steps for Data Extraction/ Retrieval:
1. Read the stego audio file.
2. Split the file into block of 8 samples and apply DCT to it.
3. Extract 8th DCT coefficient from each block. Each coefficient is
then divided by the value computed in the encoding process
in its respective block.
4. Apply inverse DCT to the modified DCT coefficients.
5. The resultant coefficients are written back to the file.
7.5.1.3 DCT Using a dividing Factor:
In order to improve the results of the DCT applied to audio signals
in the earlier two methods implemented, a new method has been
proposed here. In this method, unlike the earlier methods where
the audio signal is divided into blocks of 8 samples each, here the
audio signal is split into blocks of 64 samples each.
Steps for Data embedding:
1. Read the cover and secret audio files.
2. Split both the files into blocks of 64 samples each. Each block
of the cover audio is used for data embedding.
3. Apply DCT to blocks of 64 samples of both the files.
4. The secret DCT coefficients are then divided by a
predetermined constant factor.
5. The last 8 DCT coefficients of each block of the cover audio
are replaced by the secret DCT coefficients from step 4.
6. Apply inverse DCT to the modified cover DCT coefficients of
each block.
7. The resultant coefficients are then written back to the file to
form the stego file.
Information Hiding in Image and Audio Files 2007-2010
236
Steps for Data Extraction/ Retrieval:
1. Read the stego audio file.
2. Split the file into block of 64 samples and apply DCT to it.
3. Extract the last 8 DCT coefficients from each block.
4. The extracted coefficients are then multiplied by the constant
factor that was used during embedding process.
5. Group such coefficients from step 4 into blocks of 64 and
apply inverse DCT to it.
6. The resultant coefficients are written back to the file.
This becomes the retrieved secret message.
Different factors such as 2, 5, 10, 20, 25, 50, 75 and 100 were used
in the implementation.
7.5.2 Haar Transform
Haar Transform has become more popular after the introduction of
wavelets. Haar transform is fast for computation and gives more
data compression.
7.5.2.1 Haar Transform Using a dividing Factor:
In this method, Haar transform is applied for hiding audio data in
cover audio.
Steps for Data embedding:
1. Read the cover and secret audio files.
2. Split both the files into blocks of 64 samples each. Each block
of the cover audio is used for data embedding.
3. Apply Haar transform to blocks of 64 samples of both the
files.
Information Hiding in Image and Audio Files 2007-2010
237
4. The secret transform coefficients are then divided by the
same predetermined constant factor as is used in DCT for
comparison.
5. The last 8 coefficients of each block of the cover audio are
replaced by the secret coefficients from step 4.
6. Apply inverse haar transform to the modified cover transform
coefficients of each block.
7. The resultant coefficients are then written back to the file to
form the stego file.
Steps for Data Extraction/ Retrieval:
1. Read the stego audio file.
2. Split the file into block of 64 samples and apply Haar
transform to it.
3. Extract the last 8 coefficients from each block.
4. The extracted coefficients are then multiplied by the constant
factor that was used during embedding process.
5. Group such coefficients from step 4 into blocks of 64 and
apply inverse Haar transform to it.
6. The resultant coefficients are written back to the file.
This forms the retrieved secret message.
Different factors such as 2, 5, 10, 20, 25, 50, 75 and 100 were used
in the implementation.
7.6 Results:
Table 7.16 gives the results of Method 1 using DCT. Table 7.17
gives the results of Method 2 using DCT.
Information Hiding in Image and Audio Files 2007-2010
238
Table 7.16 Results of Method 1 using DCT
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 0.000239 132.54 2.05E-10 0.10905
Chimes 0.001914 123.51 1.48E-08 0.33038
Newmail 0.022459 112.82 0.001245 0.38871
Triangle
Pingpong 0.000138 134.92 1.55E-10 0.06375
Chimes 0.000994 126.35 8.89E-07 0.23517
Newmail 0.020875 113.13 0.002986 0.40665
Bugle
Pingpong 0.000218 132.94 1.57E-10 0.10930
Chimes 0.001772 123.84 5.79E-10 0.31338
Newmail 0.01872 113.6 0.01065 0.36702
Average 0.00748 123.74 0.00165 0.25815
Remarks: The minimum MSE value obtained is 0.000138 and the minimum MSE
value for the message is 1.55E-10.
Table 7.17 Results of Method 2 using DCT
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 4.58E-05 139.71 0.001821 0.05705
Chimes 8.53E-05 137.02 0.012151 0.18178
Newmail 0.014356 114.75 0.00101 0.27789
Triangle
Pingpong 4.47E-05 139.83 0.004064 0.05473
Chimes 6.43E-05 138.24 1.46643 0.15011
Newmail 0.012909 115.22 0.001868 0.30976
Bugle
Pingpong 4.23E-05 140.06 0.00205 0.08592
Chimes 7.96E-05 137.32 0.006976 0.23206
Newmail 0.01447 114.72 0.001079 0.28766
Average 4.68E-03 130.76 0.16638 0.18188
Remarks: The minimum MSE value obtained is 4.23E-05 and the minimum MSE
value for the message is 0.00101. It can be seen that the MSE values obtained
from this method are better than the MSE values obtained from method1 as
shown in Table 7.16. On the contrary, the MSE for the message gives better
results from method1 than the method2.
Table 7.18 – Table 7.25 show the results for using DCT with various
factors like 2, 5, 10, 20, 25, 50, 75 and 100
Information Hiding in Image and Audio Files 2007-2010
239
Table 7.18 Results of Using DCT with a factor 2
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 5.97E-05 138.56 1.31E-09 0.09193
Chimes 0.00048 129.53 1.00E-08 0.28185
Newmail 0.00661 118.13 9.19E-08 0.34975
Triangle
Pingpong 4.43E-05 139.86 3.13E-10 0.05722
Chimes 0.00026 132.16 3.46E-10 0.19971
Newmail 0.00627 118.35 1.43E-06 0.36926
Bugle
Pingpong 5.45E-05 138.96 3.09E-10 0.09534
Chimes 0.00044 129.86 4.76E-10 0.27221
Newmail 0.00678 118.02 4.20E-08 0.32807
Average 2.33E-03 129.27 1.75E-07 0.22726
Remarks: The minimum MSE value obtained is 4.43E-05 and the minimum MSE
value for the message is 3.09E-10.
Table 7.19 Results of Using DCT with a factor 5
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 9.56E-06 146.53 2.18E-09 0.07297
Chimes 7.66E-05 137.49 1.10E-08 0.23606
Newmail 0.00106 126.08 1.29E-07 0.30464
Triangle
Pingpong 1.88E-05 143.59 1.93E-09 0.05427
Chimes 5.41E-05 139 2.29E-09 0.16844
Newmail 0.001 126.26 1.43E-06 0.32585
Bugle
Pingpong 8.82E-06 146.87 2.02E-09 0.08647
Chimes 7.12E-05 137.8 1.52E-09 0.24022
Newmail 0.00108 125.98 4.30E-08 0.28879
Average 3.75E-04 136.62 1.80E-07 0.19753
Remarks: The minimum MSE value obtained is 8.82E-06 and the minimum MSE
value for the message is 1.52E-09.
Table 7.20 Results of Using DCT with a factor 10
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 2.39E-06 152.55 8.63E-09 0.05978
Chimes 1.92E-05 143.51 1.45E-08 0.20082
Newmail 0.00026 132.1 1.06E-07 0.26828
Triangle
Pingpong 1.50E-05 144.56 7.73E-09 0.05238
Chimes 2.41E-05 142.51 9.76E-09 0.14756
Newmail 0.00026 132.11 1.44E-06 0.29201
Bugle
Pingpong 2.29E-06 152.72 7.79E-09 0.08171
Chimes 1.81E-05 143.76 5.13E-09 0.22104
Newmail 0.00027 132 4.60E-08 0.26219
Average 9.68E-05 141.75 1.83E-07 0.17619
Remarks: The minimum MSE value obtained is 2.29E-06 and the minimum MSE
value for the message is 5.13E-09.
Information Hiding in Image and Audio Files 2007-2010
240
Table 7.21 Results of Using DCT with a factor 20
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 5.98E-07 158.57 3.84E-08 0.04871
Chimes 4.79E-06 149.53 2.94E-08 0.16668
Newmail 6.62E-05 138.12 1.07E-07 0.23208
Triangle
Pingpong 1.40E-05 144.86 3.17E-08 0.05105
Chimes 1.64E-05 144.17 4.10E-08 0.12906
Newmail 7.62E-05 137.03 1.47E-06 0.25879
Bugle
Pingpong 6.62E-07 158.12 3.07E-08 0.07856
Chimes 4.78E-06 149.54 1.98E-08 0.20655
Newmail 6.80E-05 138 5.88E-08 0.23853
Average 2.80E-05 146.44 2.03E-07 0.15666
Remarks: The minimum MSE value obtained is 5.98E-07 and the minimum MSE
value for the message is 1.98E-08.
Table 7.22 Results of Using DCT with a factor 25
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 3.83E-07 160.5 6.01E-08 0.04563
Chimes 3.07E-06 151.46 4.16E-08 0.15653
Newmail 4.24E-05 140.06 1.11E-07 0.22069
Triangle
Pingpong 1.39E-05 144.9 5.04E-08 0.05058
Chimes 1.55E-05 144.44 6.53E-08 0.12388
Newmail 5.37E-05 139.03 1.47E-06 0.24789
Bugle
Pingpong 4.66E-07 159.65 4.80E-08 0.07756
Chimes 3.18E-06 151.3 3.12E-08 0.20294
Newmail 4.36E-05 139.93 7.91E-08 0.23155
Average 1.96E-05 147.92 2.17E-07 0.15081
Remarks: The minimum MSE value obtained is 3.83E-07 and the minimum MSE
value for the message is 3.12E-08.
Table 7.23 Results of using DCT with a factor 50
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 9.66E-08 166.48 2.60E-07 0.03723
Chimes 7.68E-07 157.48 1.61E-07 0.12444
Newmail 1.06E-05 146.08 2.49E-07 0.18627
Triangle
Pingpong 1.37E-05 144.96 2.07E-07 0.04922
Chimes 1.41E-05 144.82 2.77E-07 0.10985
Newmail 2.36E-05 142.59 1.66E-06 0.21607
Bugle
Pingpong 2.05E-07 163.22 1.95E-07 0.07536
Chimes 1.06E-06 156.07 1.23E-07 0.19323
Newmail 1.12E-05 145.86 1.40E-07 0.21395
Average 8.37E-06 151.95 3.64E-07 0.13396
Remarks: The minimum MSE value obtained is 9.66E-08 and the minimum MSE
value for the message is 1.23E-07.
Information Hiding in Image and Audio Files 2007-2010
241
Table 7.24 Results of using DCT with a factor 75
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 4.37E-08 169.93 5.97E-07 0.03365
Chimes 3.42E-07 160.99 3.86E-07 0.10756
Newmail 4.71E-06 149.59 4.25E-07 0.16708
Triangle
Pingpong 1.37E-05 144.97 4.51E-07 0.04862
Chimes 1.39E-05 144.9 6.32E-07 0.10305
Newmail 1.81E-05 143.76 1.90E-06 0.19809
Bugle Pingpong 1.56E-07 164.38 4.50E-07 0.07429
Chimes 6.66E-07 158.09 2.79E-07 0.18937
Newmail 5.14E-06 149.22 4.11E-07 0.20557
Average 6.31E-06 153.98 6.15E-07 0.12525
Remarks: The minimum MSE value obtained is 4.37E-08 and the minimum MSE
value for the message is 2.79E-07.
Table 7.25 Results of using DCT with a factor 100
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 2.52E-08 172.32 1.08E-06 0.03152
Chimes 1.93E-07 163.47 7.19E-07 0.09655
Newmail 2.65E-06 152.09 4.82E-07 0.15345
Triangle
Pingpong 1.36E-05 144.98 7.99E-07 0.04825
Chimes 1.37E-05 144.94 1.14E-06 0.09934
Newmail 1.61E-05 144.26 2.45E-06 0.18599
Bugle
Pingpong 1.40E-07 164.88 7.82E-07 0.07382
Chimes 5.29E-07 159.09 4.95E-07 0.18671
Newmail 3.04E-06 151.5 3.85E-07 0.20077
Average 5.55E-06 155.28 9.26E-07 0.11960
Remarks: The minimum MSE value obtained is 2.52E-08 and the minimum MSE
value for the message is 3.85E-07.
Table 7.26 Results of Haar Transform using factor 2
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 0.001602 124.28 0.001616 0.30611
Chimes 0.003039 121.5 0.006040 0.33029
Newmail 0.004626 119.67 0.059137 0.33848
Triangle
Pingpong 0.000674 128.04 0.001436 0.29409
Chimes 0.00137 124.96 0.006171 0.30435
Newmail 0.002641 122.11 0.113585 0.32930
Bugle
Pingpong 0.00133 125.09 0.001463 0.34766
Chimes 0.002759 121.92 0.006068 0.35639
Newmail 0.00433 119.96 0.066949 0.36226
Average 0.002486 123.06 0.029163 0.32988
Remarks: The minimum MSE value obtained is 0.000674 and the minimum MSE
value for the message is 0.001436.
Information Hiding in Image and Audio Files 2007-2010
242
Table 7.27 Results of Haar Transform using factor 5
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 0.000146 134.67 9.13e-05 0.27855
Chimes 0.001626 124.22 0.002970 0.32694
Newmail 0.003708 120.64 0.043101 0.33230
Triangle
Pingpong 9.43E-05 136.58 1.89e-09 0.02231
Chimes 0.000757 127.54 0.002932 0.30493
Newmail 0.002233 122.84 0.083414 0.32267
Bugle
Pingpong 9.61E-05 136.5 1.82e-09 0.03666
Chimes 0.001471 124.65 0.002725 0.35773
Newmail 0.003527 120.85 0.049024 0.35860
Average 0.001518 127.61 0.020473 0.26007
Remarks: The minimum MSE value obtained is 9.43E-05 and the minimum MSE
value for the message is 1.89E-09.
Table 7.28 Results of Haar Transform using factor 10
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 2.42E-05 142.49 8.82e-09 0.03391
Chimes 0.000436 129.94 0.000421 0.30724
Newmail 0.002509 122.34 0.022203 0.33143
Triangle
Pingpong 6.71E-05 138.06 8.84e-09 0.02222
Chimes 0.000249 132.36 0.000290 0.28337
Newmail 0.0017 124.02 0.044652 0.32140
Bugle
Pingpong 4.50E-05 139.79 8.93e-09 0.03612
Chimes 0.00046 129.7 0.000115 0.32392
Newmail 0.002468 122.41 0.026795 0.35648
Average 8.84E-04 131.23 1.05E-02 0.22401
Remarks: The minimum MSE value obtained is 2.42E-05 and the minimum MSE
value for the message is 8.82E-09.
Table 7.29 Results of Haar Transform using factor 20
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 9.95E-06 146.35 3.75e-08 0.03335
Chimes 0.000103 136.18 5.26e-08 0.07698
Newmail 0.001285 125.24 0.001178 0.29326
Triangle
Pingpong 6.03E-05 138.52 3.78e-08 0.02211
Chimes 0.000111 135.88 3.80e-08 0.07316
Newmail 0.001107 125.88 0.004375 0.29919
Bugle
Pingpong 3.21E-05 141.26 3.84e-08 0.03577
Chimes 0.00022 132.89 3.18e-08 0.09144
Newmail 0.001305 125.17 0.003715 0.34331
Average 4.70E-04 134.15 1.03E-03 0.14095
Remarks: The minimum MSE value obtained is 9.95E-06 and the minimum MSE
value for the message is 3.18E-08.
Information Hiding in Image and Audio Files 2007-2010
243
Table 7.30 Results of Haar Transform using factor 25
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 8.21E-06 147.18 6.19e-08 0.03302
Chimes 6.81E-05 137.99 6.78e-08 0.07502
Newmail 0.001027 126.21 9.61e-07 0.08422
Triangle
Pingpong 5.96E-05 138.58 6.18e-08 0.02211
Chimes 9.30E-05 136.64 6.13e-08 0.07272
Newmail 0.001002 126.32 1.35e-006 0.09584
Bugle
Pingpong 3.05E-05 141.47 6.04e-08 0.03565
Chimes 0.000143 134.76 4.70e-08 0.09098
Newmail 0.001084 125.97 0.000226 0.30057
Average 3.91E-04 135.01 2.54E-05 0.09002
Remarks: The minimum MSE value obtained is 8.12E-06 and the minimum MSE
value for the message is 4.70E-08.
Table 7.31 Results of Haar Transform using factor 50
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 5.85E-06 148.65 2.47e-07 0.03285
Chimes 2.09E-05 143.13 1.80e-07 0.06986
Newmail 0.000261 132.16 1.08e-06 0.07763
Triangle
Pingpong 5.85E-05 138.65 2.61e-07 0.02207
Chimes 6.93E-05 137.92 2.54e-07 0.07182
Newmail 0.000297 131.6 1.54e-06 0.09165
Bugle
Pingpong 2.84E-05 141.79 2.53e-07 0.03553
Chimes 0.000143 134.76 1.68e-07 0.08996
Newmail 0.000392 130.39 5.45e-07 0.09202
Average 1.42E-04 137.67 5.03E-07 0.06482
Remarks: The minimum MSE value obtained is 5.85E-06 and the minimum MSE
value for the message is 1.68E-07.
Table 7.32 Results of Haar Transform using factor 75
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 5.39E-06 149.01 5.69e-07 0.03282
Chimes 1.21E-05 145.5 3.66e-07 0.06736
Newmail 0.000119 135.58 1.18e-06 0.07438
Triangle
Pingpong 5.83E-05 138.67 5.77e-07 0.02197
Chimes 6.50E-05 138.2 5.82e-07 0.0715
Newmail 0.000166 134.12 1.92e-06 0.08978
Bugle
Pingpong 2.80E-05 141.85 5.81e-07 0.03547
Chimes 0.000135 135.02 3.77e-07 0.08924
Newmail 0.000246 132.43 6.42e-07 0.09097
Average 9.28E-05 138.93 7.55E-07 0.06372
Remarks: The minimum MSE value obtained is 5.39E-06 and the minimum MSE
value for the message is 3.66E-07.
Information Hiding in Image and Audio Files 2007-2010
244
Table 7.33 Results of Haar Transform using factor 100
Cover Secret MSE PSNR MSE(message) BER
Guitar
Pingpong 5.22E-06 149.15 1.01e-06 0.03282
Chimes 9.01E-06 146.78 6.25e-07 0.06585
Newmail 6.90E-05 137.94 1.38e-06 0.07177
Triangle
Pingpong 5.83E-05 138.67 1.03e-06 0.02195
Chimes 6.35E-05 138.3 1.04e-06 0.07121
Newmail 0.000121 135.52 2.25e-06 0.08819
Bugle
Pingpong 2.79E-05 141.88 9.82e-07 0.03542
Chimes 0.000132 135.12 6.68e-07 0.08872
Newmail 0.000194 133.44 9.18e-07 0.09058
Average 7.55E-05 139.64 1.10E-06 0.06295
Remarks: The minimum MSE value obtained is 5.22E-06 and the minimum MSE
value for the message is 6.25E-07.
Table 7.34 Comparison of MSE (between cover and stego) and MSE (message) for
DCT and Haar transform methods using factor
Factor MSE MSE (Message)
DCT Haar transform DCT Haar transform
2 2.33E-03 0.002486 1.75E-07 0.029163
5 3.75E-04 0.001518 1.80E-07 1.05E-02
10 9.68E-05 8.84E-04 1.83E-07 1.03E-03
20 2.80E-05 4.70E-04 2.03E-07 2.54E-05
25 1.96E-05 3.91E-04 2.17E-07 2.54E-05
50 8.37E-06 1.42E-04 3.64E-07 5.03E-07
75 6.31E-06 9.28E-05 6.15E-07 7.55E-07
100 5.55E-06 7.55E-05 9.26E-07 1.10E-06
Remarks: The minimum MSE value obtained using DCT and Haar transform is
5.55E-06 and 7.55E-05 for the dividing factor 100 respectively. The minimum
MSE value for the message using DCT and Haar transform is 1.75E-07 for factor 2
and 5.03E-07 for factor 50 respectively.
Table 7.35 Comparison of BER for DCT and Haar transform methods for
various factors
Factor BER (using DCT) BER (using Haar Transform)
2 0.22726 0.32988
5 0.19753 0.26007
10 0.17619 0.22401
20 0.15666 0.14095
25 0.15081 0.09002
50 0.13396 0.06482
75 0.12525 0.06372
100 0.11960 0.06295
Remarks: The minimum BER value obtained using DCT and Haar transform is
0.11960 and 0.06295 for the dividing factor 100 respectively.
Information Hiding in Image and Audio Files 2007-2010
245
7.7 Discussion
DCT and Haar transforms have been used for hiding data in audio
signals. The first two methods in which DCT transform has been
applied to cover audio and secret audio signals is in blocks of 8
samples. In the first method, every 8th cover DCT coefficient is
directly replaced by each secret DCT coefficient. The results of this
method show that there is audible distortion introduced in the stego
audio signal whereas the perceptual quality of secret audio retrieved
is high. In the second method, the secret coefficients are multiplied
with the max coefficient for each block of 8 samples. These modified
coefficients are then embedded at every 8th cover coefficients. The
results of this method show that the perceptual quality of the
resultant host signal is quite good whereas the quality of the secret
signal retrieved is bad. So, there is a trade-off between the data
embedding and the quality of the secret message retrieved.
In the third method, instead of applying DCT to blocks of 8 samples,
it is applied to blocks of 64 samples each for cover as well as secret
audio signals. In this method, various constant factors as a dividing
factor have been used for normalizing the secret coefficients. These
modified secret coefficients were then embedded in the last 8 cover
coefficients of each block. This has been done to improve the results
of the earlier 2 methods. On a similar basis, Haar transform is used
for hiding information in audio using the same method and their
performance is compared.
However, after using various dividing factors for different cover
audio signals, it was difficult to conclude on a single factor as it
highly depends on the cover and secret audio to be embedded. It is
observed that performance of DCT is better than Haar considering
MSE and PSNR as performance measures. However, for BER
Information Hiding in Image and Audio Files 2007-2010
246
performance of DCT is better for dividing factor upto 10 only and
thereafter performance of Haar is better.