informed watermarking and compression of multi-sourceselise, fab, antho and claris for their weekly...

INSA de Lyon

2007 N◦ 07 ISAL 0093

THESIS

presented to obtain the degree of

Doctor of Philosophy

in Computer Science

A dissertation presented by

Çağatay Dikici

December 3, 2007

Informed Watermarking andCompression of Multi-Sources

prepared at LIRIS

under the supervision ofAtilla Baskurt and Khalid Idrissi

The thesis jury is composed of:

Reviewers:

M. Jean Marc Chassery (Directeur de Recherche CNRS)

M. Bülent Sankur (Professeur)

Examiners:

Mme. Christine Guillemot (Directrice de Recherche INRIA)

M. Fabrice Meriaudeau (Professeur)

M. William Puech (Maître de conférences, HDR)

M. Florent Dupont (Maître de conférences, HDR)

M. Khalid Idrissi (Maître de conférences)

M. Atilla Baskurt (Professeur)

Abstract

Informed Watermarking and Compression of Multi-Sources, (December 2007)Çağatay Dikici, B.S., Bogazici University;

M.S., Bogazici University;

Technological advances in the fields of telecommunications, multimedia and thediverse choice of portable handhelds during the last decade, derive to create novelservices such as sharing of multimedia content, video-conferencing or content pro-tection, where all running on low-power devices. Hence alternative low complexitycoding techniques need to be developed for replacing conventional ones. Codingwith state information, a potential solution to shifting the encoder complexity to thedecoder, has two main applications:

1)Distributed Source Coding(DSC) for compressing a source given a correlatedversion of it is available only to the decoder.

2)Informed Data Hiding(IDH) for embedding a watermark to a host signal wherethe host signal is available only to the encoder.

For each problem stated above, practical code designs that operate close to thetheoretical limits are proposed. The combination of good error correcting codes suchas Low Density Parity-Check (LDPC) Codes and good quantization codes such asTrellis Coded Quantization (TCQ) are used at the design of the proposed capacityapproaching codes.

Moreover, the theoretical achievable rate limits for a relaxed IDH setup, suchthat a noisy observation of the host signal is available to the decoder is derived.

Finally, motivated by the strong duality between DSC and IDH, a hybrid schemethat uses both data hiding and compression is proposed. In addition to the derivationof theoretical channel capacity and rate distortion function, a complete frameworkis proposed.

Keywords: Coding with State Information, Compression, Watermarking, Dis-tributed Source Coding, Writing on Dirty Paper, Low Density Parity Check Codes,Trellis Coded Quantization.

- i -

- ii -

Résumé

Tatouage informé et Compression Multi-Sources, (Décembre 2007)Çağatay Dikici, B.S., Bogazici University;

M.S., Bogazici University;

Les avancées technologiques qu’ont connu les télécommunications, le multimédiaet les systèmes mobiles ont ouvert la porte à l’émergence, puis au développementde nouveaux services tels que le partage de bases de donées multimedia, la vidéo-conférence ou la protection des contenus, tout en utilisant des systèmes à faible puis-sance. D’où la nécessité de disposer de nouvelles techniques de codage à complexitéréduite. Les techniques de codage exploitant la présence d’une information paral-lèle peuvent constituer une solution potentielle permettant de déporter la complexitéde codage vers le décodeur. Celles-ci s’appliquent notamment à deux principes decodage :

1) Le codage de source distribué (Distributed Source Coding DSC ) pour com-presser un signal donné, sachant qu’un autre signal corrélé à celui d’origine estdisponible au niveau du décodeur.

2) La dissimulation de données informée (Informed Data Hiding IDH ) permettantd’insérer un message dans un signal hôte, ce dernier n’étant connu qu’au codeur.

Pour chacune de ces deux techniques, nous proposons des solutions qui ap-prochent les limites théoriques. Nous combinons pour cela des techniques perfor-mantes tant de codage canal, de type LDPC, que de quantification de type Treillis(TCQ). Par ailleurs, nous étudions les limites théoriques pouvant être atteintes parIDH, dans le cas où une version bruitée du signal hôte est disponible au décodeur.

Enfin, exploitant la forte dualité qui existe entre DSC et IDH, nous proposons unschéma pratique hybride complet mettant en œuvre les deux techniques, ainsi qu’uneétude théorique de la fonction débit / distorsion et de la capacité d’un tel système.

Mots clés : Codage avec information adjacente, compression, tatouage, codage desources distribuées, LDPC, TCQ.

- iii -

- iv -

Acknowledgements

to my family

First of all, I would like to express my deepest sense of gratitude to my supervisorsProf. Atilla Baskurt and Khalid Idrissi for their patient guidance, encouragementand excellent advices throughout this study.

I am grateful to Prof. Christine Guillemot for her enthusiasm, sharing her fruit-ful ideas on information theory, her valuable assistance for maturing my theoreticalfoundation on source-channel coding, and her hospitality during our collaboration.I would have special thanks to Caroline Fontaine for her advice and valuable discus-sions.

I am thankful to my thesis reviewers Prof. Bülent Sankur and Prof. Jean-MarcChassery. They provided me a critical reading, valuable suggestions and constructiveremarks which have been very important for the improvement of this dissertation.I would like to thank my other committee members Mr. Fabrice Meriaudeau, Mr.William Puech and Mr. Florent Dupont.

All LIRIS members, especially the 3 stimulator: the judo zen Guillaume Lavoué,the sailor Julien Ricard, and the theater boy Nicolas Zlatoff. I would like to thankmy interns Benoît, Damien, David et Stephane. Also la Migraine team: Rémi, Greg,Elise, Fab, Antho and Claris for their weekly motivation.

Finally, great thanks to Laurent, Eléonore and my family who encourage me tofinalize this dissertation.

- v -

- vi -

Contents

I Problem Statement and Preliminaries xvii

Introduction 1

1 Preliminaries 13

1.1 Notations and Conventions . . . . . . . . . . . . . . . . . . . . . . . 14

1.2 Entropy and Mutual Information . . . . . . . . . . . . . . . . . . . . 15

1.3 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.6 Distributed Source Coding . . . . . . . . . . . . . . . . . . . . . . . . 19

1.7 Writing on Dirty Paper . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.8 Message Passing Algorithm . . . . . . . . . . . . . . . . . . . . . . . 20

1.9 Trellis Coded Quantization (TCQ) . . . . . . . . . . . . . . . . . . . 21

1.10 Low Density Parity Check (LDPC) Codes . . . . . . . . . . . . . . . 25

1.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

II Contributions 35

2 Distributed Source Coding 37

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.4 Practical Code Design . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.5 Practical Application for Still-Image Coding . . . . . . . . . . . . . . 54

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Informed Data Hiding 59

- vii -

CONTENTS

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60


3.3 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 Proposed Scheme-1: Extension to Cox Miller . . . . . . . . . . . . . 68

3.5 Proposed Scheme-2: Superposition Coding . . . . . . . . . . . . . . . 73

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4 Dirty Paper Coding with Partial State Information 83

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3 Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 Capacity/rate gain / loss analysis . . . . . . . . . . . . . . . . . . . . 92

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5 Data Hiding and Distributed Source Coding 99

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


5.3 Contribution 1: Capacity Analysis for Multivariate Gaussian Source 108

5.4 Contribution 2: Practical Code Design . . . . . . . . . . . . . . . . . 115

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Conclusion 121

A Achievable Rate Region Calculations for two partially side informa-tion known to the encoder and decoder respectively 127

A.1 Derivation of the Achievable Rate Region . . . . . . . . . . . . . . . 127

A.2 Maximization of the Rate . . . . . . . . . . . . . . . . . . . . . . . . 128

A.3 Entropy of Multivariate Gaussian Distribution . . . . . . . . . . . . . 129

B Codes and Degree Distributions for Generating LDPC Matrices 131

B.1 Degree Distributions of rate 2/3 code, for 2 : 1 compression rate in DSC131

B.2 Degree Distribution of rate 1/2 code, for Informed Data Hiding . . . 132

C Publications of the author 133

D Cited Author Index 135

- viii -

List of Figures

1 A multimedia communication setup for a low-power device which hasdata hiding and efficient compression capability. . . . . . . . . . . . . 1

2 A point to point source-channel coding setup. . . . . . . . . . . . . . 2

3 Coding with state information. . . . . . . . . . . . . . . . . . . . . . 3

4 Coding of two correlated sources. . . . . . . . . . . . . . . . . . . . . 6

5 Costa’s “Writing on Dirty Paper” setup. . . . . . . . . . . . . . . . . 7

6 Channel Coding with state informations. . . . . . . . . . . . . . . . . 8

7 Data Hiding + Source Coding Scheme. . . . . . . . . . . . . . . . . . 10

8 Chapter Dependencies of this dissertation. . . . . . . . . . . . . . . . 12

1.1 The Venn diagram of the relationship between entropy and mutualinformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2 Binary entropy function H(a) versus a. . . . . . . . . . . . . . . . . . 18

1.3 Uniform density function p(x) versus x where p(x) = 1/a for 0 ≤ x ≤ a. 18

1.4 A compression system. . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 A communication system. . . . . . . . . . . . . . . . . . . . . . . . . 19

1.6 Counting problem on a straight line. . . . . . . . . . . . . . . . . . . 21

1.7 A 1/2 recursive systematic convolution code with memory 2 and gen-erator matrix (3, 5) in octal digits. . . . . . . . . . . . . . . . . . . . 22

1.8 State transition of the recursive systematic convolutional code (1, 3)in octal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.9 Output points and corresponding partitions for 2 bits per sample. . . 23

1.10 Viterbi decoding of a vector with length 4. . . . . . . . . . . . . . . . 24

1.11 Bipartite graph representation of the parity check matrix H. . . . . . 27

1.12 Belief propagation on bipartite graph H. . . . . . . . . . . . . . . . . 28

1.13 Performance comparison of the error rates of (3, 6) regular LDPC code,turbo code and optimized irregular LDPC code. The channel is Binaryinput, additive white gaussian noise. . . . . . . . . . . . . . . . . . . 31

- ix -

LIST OF FIGURES

1.14 LDPC coding example. Cartoon copyright c© 2007 piyalemadra.com,used with permission. (a) Original binary cartoon with size 100×100with 0s correspond to white and 1s correspond to black pixels. Theratio between the number of black pixels and the number of total pixelsis 0.2445. (b) Visualization of the cartoon coded with 1/2 systematicLDPC code such that the output of the encoder contains the originalimage and its parity checks with size 100 × 100. (c) Throughout thetransmission, both the cartoon and its parity check bits are exposedto bit errors such that the error probability of a received bit is 0.07. . 32

1.15 LDPC decoding. (a) After 1 iteration. (b) After 5 iterations. (c)The original cartoon is decoded without any error after 10 iterations. 33

2.1 16 Cases of correlated source coding. . . . . . . . . . . . . . . . . . . 39

2.2 Lines and points of Table-2.2. . . . . . . . . . . . . . . . . . . . . . . 41

2.3 Admissible Slepian-Wolf rate region R for the case {1011}. . . . . . . 41

2.4 Admissible Slepian-Wolf rate region R for the cases {0011} and {0001}. 42

2.5 Wyner-Ziv Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.6 Graph of RX|Y (D), R∗X|Y (D), and H(pz)−H(D) versus D curves for

pz = 0.28. For binary symmetric case, R∗X|Y (D) has a rate loss with

respect to RX|Y (D) except the points (H(pz), 0) and (0, pz), and thereis no rate loss for these two points. . . . . . . . . . . . . . . . . . . . 44

2.7 Wyner-Ziv Setup for Gaussian case. . . . . . . . . . . . . . . . . . . . 44

2.8 2 : 1 rate DSC compression using a 2/3 convolutional code. . . . . . 46

2.9 2 : 1 rate DSC compression code design using two systematic 4/5convolutional codes with an interleaver and iterative MAP decoding.Blocks π correspond to a pseudo-random interleaver, and the blockπ−1 is the corresponding deinterleaver. For the Log-Likelihood Ra-tio(LLR) calculations log(p(x=1|y)

p(x=0|y)), the correlation noise level and thereceived side information Y is used. An iterative decoding is doneusing Soft-Input Soft-Output (SISO) decoder. . . . . . . . . . . . . . 48

2.10 2 : 1 rate DSC compression code design using two systematic 2/3rate parallel concatenation convolution codes and 1/2 rate puncturingmatrices P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.11 2 : 1 rate DSC compression using a systematic 2/3 rate LDPC code. 49

2.12 Eight output points and corresponding partitions for 4 subset. . . . . 50

2.13 Wyner Ziv Coding as a concatenation of a good quantization code anda Slepian-Wolf Coder. . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.14 Our proposed 2 : 1 rate DSC compression code design using LDPCcodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

- x -

LIST OF FIGURES

2.15 Decoding bit error rate versus entropy rate of the correlation noisepower H(p1) graph for 2 : 1 rate Slepian Wolf compression comparison.The simulations for LDPC is made for input length 4000 length regularLDPC matrix and 104 length irregular LDPC matrix. The graphalso contains the S-W limit, the best performances achieved usingconvolutional code (Aaron and Girod, 2002), punctured turbo code(Lajnef, 2006), and irregular LDPC with length 105 (Liveris et al,2002a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.16 Encoder and decoder structure. The source is compressed using LDPCbinning, the side information Y available to the decoder is the im-age reconstructed from low frequency (LL2) wavelet composition, andjoint decoding of the two received signal. . . . . . . . . . . . . . . . . 55

2.17 Construction of the Side Information. The Low-Low wavelet compo-sition of the second level is transmitted only. Decoder reconstructsthe side information by setting all other coefficients to 0. . . . . . . . 56

2.18 Left: Side Info at the receiver; Center: First iteration output of thedecoded image; Right: decoding output after 5 iterations. . . . . . . 57

3.1 Channel Coding with State Information Setup. . . . . . . . . . . . . 62

3.2 Watermarked image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.3 Costa setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.4 Informed embedding of Miller et al. on DCT coefficients of still images. 67

3.5 Proposed informed embedding setup on DWT coefficients of still images. 68

3.6 Analysis and Synthesis steps of Le Gall DWT. . . . . . . . . . . . . . 69

3.7 Wavelet composition of Lena image. . . . . . . . . . . . . . . . . . . 70

3.8 100 bit message M is inserted into Lena image using LH2, HL2 andHH2 DWT coefficients. No perceptual shaping is applied. . . . . . . 71

3.9 The same 100 bit message M is inserted into Lena image using per-ceptual shaping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.10 Comparison of 40 bit length M embedding process into asia imagewith and without perceptual shaping. . . . . . . . . . . . . . . . . . . 72

3.11 Superposition of 2 codes. . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.12 Embedding process of the message M into the work s using superpo-sition coding. An LDPC coding of M to find the channel code c1 isfollowed by TCQ coding of αs − c1 to find the source code c0. Thewatermarked signal c0+c1+(1−α)s is sent through the attack channel. 76

3.13 Superposition watermarking extraction by BCJR and LDPC decodingiterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.14 Embedding 40 bit length payload to Cameraman image. . . . . . . . 80

- xi -

LIST OF FIGURES

3.15 Maximum level of attacked images that the secret message can be stilldecoded perfectly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.1 Channel Coding with state informations. . . . . . . . . . . . . . . . . 86

4.2 P = Q = N = 1, Graphs of R(α) for {L,K} pairs {0, 0}, {0, 1},{1, 0}, {1, 1}, {0,∞} and {1,∞}. The rate of transmission R(α)is calculated in nats per unit transmission (Maximum value 0.3466nats/transmission corresponds to 1 bit/transmission). . . . . . . . . 89

4.3 Capacity gain (between RCase-B(α) and RCosta(α)) versus SNR, fordifferent α values where P = 1, SSR= −6 dB and various 10 log(Q/K)values, with perfect knowledge of the channel state information at theencoder (L = 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.4 Maximum achievable rate loss (between RCase E(α) and RCosta(α))versus SNR, for different α values where P= 1, SSR= −6 dB andvarious 10 log(Q/L) values. . . . . . . . . . . . . . . . . . . . . . . . 96

4.5 Maximum achievable rate gain or loss (between R(α) and RCosta(α))versus SNR, for different α values where P= 1, SSR= −6 dB andvarious 10 log(Q/K) values, with partial knowledge of the channelstate information at the encoder (L=1). . . . . . . . . . . . . . . . . 97

5.1 A Communication System between Alice and Bob via a non-secureCarrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.2 Data Hiding + Source Coding Scheme. . . . . . . . . . . . . . . . . . 104

5.3 Channel Coding with two sided state information scheme . . . . . . 107

5.4 Rate Distortion theory with side information at the decoder: Wyner-Ziv Setup. Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.5 Multivariate Gaussian Channel of IDH DSC scheme. . . . . . . . . . 109

5.6 Multivariate Gaussian Case: Carrier point of view. . . . . . . . . . . 110

5.7 Gaussian test channel that achieves lower bound found in Equation5.21. Input: W ∼ N (0, Q + D1 − D2), output: W ∼ N (0, Q + D1). . 111

5.8 Equivalent setup of the test channel in Figure-5.7 by using an additionand a multiplication operator. . . . . . . . . . . . . . . . . . . . . . . 112

5.9 Equivalent Scheme of Gaussian Channel. . . . . . . . . . . . . . . . . 113

5.10 Embedding performance for 1/200 bit per sample with a compressionof 2 : 1 of the watermarked string using 2/3 rate LDPC code withblock length 4000. Minimum 0.02 bit per sample entropy rate losswith respect to no-embedding case. . . . . . . . . . . . . . . . . . . . 119

- xii -

List of Tables

2.2 Achievable rate regions according to the Slepian-Wolf Theorem . . . 40

3.2 Robustness test of the proposed algorithm for the image “asia.pgm”.40 bit message is embedded into asia image with DWT perceptualshaping. For each attack listed below, the corresponding maximumattack that the secret message M can be decoded without any error. 80

4.2 Special Cases of the proposed channel coding setup. . . . . . . . . . 88

5.2 Channel Coding with State Information Problems . . . . . . . . . . . 122

5.3 Source Coding with State Information Problems . . . . . . . . . . . . 122

- xiii -

LIST OF TABLES

- xiv -

List of Abbreviations

Notation Description

AWGN Additive White Gaussian NoiseBCJR Bahl, Cocke, Jelinek and Raviv algorithmBSC Binary Symmetric ChannelCCSI Channel Coding with State InformationC-SNR Correlation-Signal to Noise RatioDCT Discrete Cosine TransformDISCUS DIstributed Source Coding Using SyndromesDSC Distributed Source CodingDRM Digital Rights ManagementDWT Discrete Wavelet TransformECC Error Correcting CodesG-P Gel’fand-PinskerIDCT Inverse Discrete Cosine TransformIDH Informed Data HidingIDWT Inverse Discrete Wavelet Transformi.i.d. independent identically distributedl.c.e. lower complex envelopLDPC Low Density Parity CheckLDPCA Low Density Parity Check AccumulateLLR Log-Likelihood RatioLR Likelihood RatioMAC Multiple-Access ChannelMAP Maximum A PosterioriML Maximum LikelihoodMSE Mean Squared ErrorPAM Pulse Amplitude ModulationQIM Quantized Index ModulationRSC Reed Solomon Codesr.v. random variableSCSI Source Coding with State InformationSISO Soft-Input Soft-OutputSLDPCA Sum Low Density Parity Check Accumulate

- xv -

LIST OF TABLES

Notation Description

SNR Signal to Noise RatioS-W Slepian-WolfTCM Trellis Coded ModulationTCQ Trellis Coded QuantizationTTCQ Turbo Trellis Coded QuantizationW-Z Wyner-Ziv

- xvi -

Part I

Problem Statement and

Preliminaries

- xvii -

Introduction

Consider the communication setup in Figure-1, which strongly stimulates the foun-dation of this dissertation. A low-power multimedia device such as mobile phone hasvarious functionalities i.e. an embedded camera, WiFi or 3G network connection.Regardless of the limited power and bandwidth, telecommunications operators wantto deploy multimedia applications like video-conferencing in which the mobile devicehas to handle several tasks like: Capturing sound and video, compressing them withsome fidelity criteria, and sending over the network at the uplink side; and receivingthe stream, decompressing the content and displaying at the downlink side. Duringmultimedia content transmission, one would like to hide seamless (in the sense of noteasily detectable by human audio-visual system) extra information for enhancing themultimedia content or simply for Digital Right Management (DRM) reasons.

Figure 1: A multimedia communication setup for a low-power device which has datahiding and efficient compression capability.

The state of art conventional audio-video compression standards exploit the re-dundancy of the data only at the encoder, and with the help of entropy coding, cancompress it close to the theoretical limits. Hence in classical compression techniquesthe encoder is more complex than the decoder. One of the objectives of our workis to shift the encoder complexity to an intermediate powerful server and transcodethe data to a conventional compression stream and send to the receiver for a simple

- 1 -

Introduction

decoding end.

The second problematic is to hide information in the multimedia data at thesender side with a fidelity criterion such that you can modify the host multimediacontent not more than an acceptable noise level, transmit the modified version of thecontent to the receiver, and be able to extract the hidden information at the receiverend without access to the original multimedia data. Since the original multimedia isonly accessible to the sender, this setup is also known as “blind watermarking”.

For the formalization of blind watermarking and the compression problem statedabove in a rigorous manner, we will briefly explain the coding concept expressed byShannon and coding with state information.

Coding Concept

Figure 2: A point to point source-channel coding setup.

Consider the communication system in Figure-2. A signal generated by an in-formation source need to be transmitted to a receiver through a channel where thechannel is generally imperfect hence creates errors during transmission. The aim ofthe transmitter and receiver pair is to minimize its resources such as transmittingpower and the number of channel uses, while guaranteeing signal reconstruction witha given fidelity. One can try to minimize the number of bits for representing the in-put source, which corresponds to compression. On the other hand, a redundant dataneed to be added for recovering the errors during transmission. Hence the compres-sion process is called as source coding while the error correction codes are calledas channel coding in the literature. The duality between the source and channelcoding is studied since Shannon (1959), where in source coding the redundancy ofthe input source is removed; in channel coding case, a controlled redundant data is

- 2 -

Introduction

added in order to correct the transmission errors.

Coding with State Information

Figure 3: Coding with state information.

In this section, we extend the basic source coding and channel coding setupby introducing a state information S that determines the output of the channel.This state information can be accessible perfectly or partially to the transmitter,to the receiver or both depending on the setup. We are mainly interested in twocommunication problems with state information in this dissertation in order to findthe solution to the data hiding and source coding problem for low-power devices.The first setup is the “channel coding with state information (CCSI) known to thetransmitter”. Since only the transmitter has access to the state information, and notthe receiver in this setup, blind watermarking application problem can be posed asa CCSI known to the transmitter problem. Gel’fand and Pinsker (1980) andCosta (1983) have valuable contributions in this field.

The second one is the “source coding with state information (SCSI) known to thereceiver”. This setup considers the theoretical compression rate limits of a source ifthe state information accessible to the receiver is correlated with the input source.Even though the theoretical foundation of the SCSI known to the receiver has datedback to 1970s by Slepian and Wolf (1973) and Wyner and Ziv (1976), we needto wait until 2000s to develop practical applications for efficiently source codingon low-power devices and sensors (Puri and Ramchandran, 2002; Aaron andGirod, 2002). Despite the random binning argument and coset creation of Slepian

and Wolf (1973) for the proof of the achievable rate limits (which is not practicallyapplicable), good error correcting codes can be employed for a sub-optimal solution.

- 3 -

Introduction

Moreover, like classical source-channel coding duality, Cover and Chiang

(2002); Pradhan et al (2003); Su et al (2000) have showed the strong dual-ity between the source and channel coding with state information.

We employ the error correcting coding techniques to tackle the two problemsstated above. The main idea of Error Correcting Codes(ECC) is to add redundancyto the data to be transmitted in an appropriate manner which serves to detectand correct the erroneous parts from the channel. There exists two classes of ECC:“convolutional codes” and “block codes”. Some examples of block codes are Hammingcodes, Reed Solomon Codes(RSC) and Low Density Parity Check(LDPC) CodesGallager (1963). They all use a parity check matrix to create the redundancy,and have good error correcting capabilities. However if we take into considerationthe performance for larger blocks of data and soft decoding possibility, LDPC hasmore advantages (Mackay, 2003).

Convolutional codes are constituted by a finite state machine such that the out-put of the state machine depends on the current sample and the current state. Atrellis path is a sequence of state transitions. Since convolutional codes do not permitall possible state transitions, a sequence of state transitions derived by a convolu-tional code is a valid trellis path. The decoder wants to calculate the most probablevalid trellis path in either Maximum Likelihood(ML) sense or Maximum a posteri-ori(MAP) sense. The decoding can be done also in the sense of hard decision orsoft decision where soft decision can be used for iterative decoding. Berrou andGlavieux (1996); Berrou et al (1993) have proposed a coding algorithm (turbocode) based on concatenation of convolutional codes with an interleaver and haveused an optimal decoding algorithm of linear codes which is known as BCJR 1 inan iterative manner which operates close to the theoretical limits. Other techniqueshave been proposed for improving the performance of the turbo codes such as punc-turing ( Acikel and Ryan (1997)) and interleaving ( Benedetto et al (1998);Tepe and Anderson (1998)).

After the invention of turbo codes, LDPC codes are reinvented by Mackay andNeal (1997) using belief propagation algorithm. An LDPC code can be representedas a bipartite graph with variable nodes and check codes which are connected withedges. Variable nodes need to satisfy all the check node equations such that themodulo-2 sum of the variable nodes connected to a check node need to be 0. Thebipartite graph can be regular or irregular depending on the number of edges con-nected to each variable node or check node are same or not. There exist studies forboth regular and irregular code performances in Mackay and Neal (1997) andin Richardson and Urbanke (2001a); Chung et al (2001a); Chung (2000);Chung et al (2001b) respectively. With a carefully design of the the bipartitegraph, irregular codes outperform the regular ones. Richardson et al (2001)have proposed a density evolution method for the design of LDPC codes which per-

1The name of the BCJR algorithm is came from the initials of Bahl, Cocke, Jelinek and Raviv,where it is proposed in Bahl et al (1974).

- 4 -

Introduction

forms 0.13 dB away from the theoretical limits, surpassing the best codes known sofar (turbo codes).

In this dissertation, we propose a complete system which does combined dis-tributed source coding and data hiding. After a detailed research on state of artDSC and Data Hiding schemes, we apply a high-performance DSC method (basedon LDPC) and a high-performance data hiding method (based on Trellis CodedQuantization (TCQ) and LDPC). For the derivation of the theoretical bounds of thesystem that we proposed, we extend the Costa (1983)s work on “Writing on DirtyPaper” by realizing a partial state information at the encoder and another partialstate information at the decoder and we analyze the maximum achievable rates ofthis setup. This extension reduces to 6 different cases where 4 of them are alreadyknown and 2 of them are novel and have interesting application areas.

Afterward, the combination of data hiding and Distributed Source Coding systemis studied. Based on a practical application scenario, the theoretical rate distortionand channel capacity terms of this setup are derived. Actually the rate-distortionfunction of our setup is an extension of the Wyner-Ziv theorem with an appropriatecorrelation relation between the state information and the input source. Moreover,for the channel capacity, we use one of the special case in our findings on “Dirty PaperCoding with Partial State Information”. A practical code design is given applyingLDPC and BCJR decoding on TCQ.

Summary of Contributions

The contributions of this dissertation can be summarized as follows. Our first majorcontribution is in the field of combined Data Hiding and Distributed Source Coding.We derive the rate distortion function and capacity formula of the embedding processfor gaussian input case, and we propose a practical code design using LDPC and TCQwhich operates close to theoretical limits.

Our second major contribution is in the area of channel coding with partial sideinformations in gaussian input channels. The maximum achievable rates are derivedfor channel coding with side information partially/perfectly available to the encoderand partially/perfectly available to the decoder. Hence it is the extension of Costa’s“writing on dirty paper” setup, and this contribution is employed for the calculationof the channel capacity in combined Data Hiding and Distributed Source Codingproblem.

Moreover, we have proposed a Slepian-Wolf Coding Scheme based on LDPCcode which operates 0.08 bit per channel use away from the theoretical limits. Thisproposed system is applied to a still-image coding system where the image is codedsuch that the low pass DWT coefficients are available to the decoder.

Finally, our contributions in Informed Data Hiding field can be summarized asproposition of two embedding method, the first one is a low-rate embedding method

- 5 -

Introduction

for DWT coefficients of still-images using a perceptual shaping, and the second oneis high-rate embedding method for continuous synthetic data using superposition ofa good source code C0 based on TCQ and a good channel code C1 based on LDPC.By applying an iterative decoding between BCJR algorithm and belief propagation,even for an AWGN attack noise which is 1.5 dB away from the theoretical limits,the embedded message can be decoded with an error rate of Pe ≤ 10−5.

Now we give brief explanation of our contributions in the order of appearance inthis dissertation.

Distributed Source Coding

Slepian and Wolf (1973) have derived the compression rate limits of separateencoding and joint decoding of correlated sources drawn from discrete alphabet (SeeFigure-4). After the extension of this theorem with a distortion constrained byWyner and Ziv (1976), the first practical code designs can be seen during late2000s with the idea of encoding on low-power devices such as sensors.

Figure 4: Coding of two correlated sources.

According to distributed source coding, the statistical dependency of the twocorrelated sources can be exploited at the decoder. For instance one of the sourcesis coded with a low rate error correcting codes, and only the parity checks are sentthrough the channel. The second source assumed to be the noisy version of the firstsource available at the decoder, hence the decoder tries to correct the noisy part onthe second source using the parity checks of the first source.

In this dissertation, we propose a practical code design for Slepian-Wolf problemwhich is based on LDPC codes for discrete alphabet input. We use 2/3 rate LDPCcodes which operates close to theoretical limits. A 2/3 rate LDPC corresponds to a2 : 1 compression because of the ratio between (input source):(parity checks). TheLDPC encoding and decoding used for DSC is as in the traditional LDPC codesdescribed in Chapter-1.10 and some decoding modifications are applied for DSC asdescribed in Chapter-2.4.

Our system operates on a correlation noise entropy which is 0.08 bit per channeluse away from the Slepian-Wolf limit for 2 : 1 compression rate. We also compare the

- 6 -

Introduction

performance of our method with the existing systems. Please note that the systemthat we developed in this part is employed for the joint data-hiding compressionsystem in Chapter-5.

We also apply our proposed coding schemes for still-image compression applica-tions such that in this setup a refinement coding is made using LDPC codes while lowfrequency Discrete Wavelet Transform (DWT) component of the image is accessibleto the decoder.

Blind Watermarking

Suppose that the secret message that we embed within a cover signal or image isthe information that we want to transmit, then blind watermarking problem canbe viewed as Channel Coding with Side Information known to the encoder (SeeFigure5). Hence the theoretical limits of the secret message embedding rate can becalculated for noncausal memoryless systems. More interesting results are found byCosta (1983), such that for the gaussian input case, the capacity of the system isindependent of the cover data S, and there is no capacity loss due to the unavailabilityof the cover data at the decoder side.

Figure 5: Costa’s “Writing on Dirty Paper” setup.

We develop two data hiding schemes, the first one achieves low embedding ratewhich modifies DWT coefficients of the still-image by using trellis coding and control-ling the embedding strength based on perceptual sensitivity of the DWT coefficients.The performance of our proposed method in terms of error probability of the messageextraction under several attacks are given.

The second proposed scheme is focused on a high rate embedding performance bycombining a good source code and a good channel code. During embedding processof the secret message, we employ LDPC encoding. Moreover, in order to respectthe embedding criterion, 6 level output TCQ is used. Hence an LDPC code and aTCQ code are concatenated for the data hiding process. During transmission, thewatermarked signal is exposed to an AWGN attack channel. At the decoder side,the received signal is decoded with belief propagation decoding for the LDPC sideand BCJR decoding for the TCQ side. Since two of the decoding methods give soft

- 7 -

Introduction

output probabilities, the decoding process is done in an iterative manner. For lowSNR values, our system operates 1.5 dB away from the Costa’s limit. As in the DSCcase, the blind watermarking system using superposition coding is one of the mainblock of the overall data-hiding compression system that we proposed in Chapter-5.

Dirty Paper Coding with Partial State Information

As described in the previous section, Costa derives the capacity region of a channelcoding with state information problem known only to the decoder. However, forsome reasons, partial information on the state of the channel can be available to theencoder or to the decoder (need not to be the same). Hence we derive the capacityof the gaussian channel with a state information, where the state information ispartially known to the encoder and to the decoder as in Figure-6. Unlike the Costa’s

Figure 6: Channel Coding with state informations.

case, maximum achievable rate of this system depends on the state S also. Ourcontributions can be listed as

• The analytic expression of the maximum achievable rate is found to be

maxα

R(α∗) = R(α∗) =1

2ln

(

1 +P (QK + QL + KL)

N(QK + QL + KL) + QLK

)

, (1)

which is obtained for α∗ = PQK/(PQK +QNK +L(PQ+PK +QK +NQ+NK)).

- 8 -

Introduction

• The general setup can be reduced to 6 different cases where 4 of them arewell known (Case A,C,D,F) and 2 of them are new(Case B,E). The maximumachievable rates are calculated for all 6 cases. The two new cases are comparedwith the Costa’s setup.

• In order to achieve the maximum achievable rate, the encoder need to knowthe channel variance parameters. However in real world applications, the ex-act parameters are not always known to the encoder. We analyzed the rategains/losses for 6 cases and the general setup when the encoding is done fora non-optimum operating point. We also compare the gain/loss analysis withrespect to Costa’s setup.

This general setup is relevant for diverse practical applications such as water-marking under desynchronization attacks and point-to-point communication overfading channel where receiver has an estimate of the channel state.

Informed Data Hiding and Distributed Source Coding

We employ all the contributions up to this point in order to build a system with anInformed Data Hiding (IDH) and Distributed Source Coding (DSC). Motivated bythe application scenario in Figure-5.1 on page 102, we derive the channel capacityand rate distortion function of a point to point communication system between Aliceand Bob supplied by a non-trust Carrier. Alice sends a secret message by insertingit into a cover data with a power constrained, knowing that a correlated version ofthe cover data is accessible to Bob. Because of the non-secure transmission, Alicedoes not share her original copy, and transmits only the watermarked signal. InCarrier point of view, he wants to minimize his bandwidth while respecting a qualityof services, so wants to compress Alice’s message given that Bob shares his noisycopy at the decoding end. Our main contributions can be listed as

• The analytic expression of the rate distortion function of the Carrier in natsper channel use is found to be

RW |S(D2) =

{

12 ln

(

D1D2

+ QK(Q+K)D2

)

0 < D2 < D1 + QKQ+K ,

0, D2 ≥ D1 + QKQ+K

. (2)

• The embedding capacity of the overall system in nats per channel use is foundto be

C =1

2ln

(

1 +D1(D1 + Q − D2)

D2(D1 + Q)

)

. (3)

- 9 -

Introduction

Figure 7: Data Hiding + Source Coding Scheme.

• A practical code design for gaussian case is proposed using the concatenationof the systems proposed in Chapter-2 and Chapter-3, hence a data hidingmethod using superposition of source and channel code, with a compressionmethod using DSC principles are used in a unique system. The decoding isdone as belief propagation iterations for decompressing the watermarked signal,and BCJR-belief propagation iterations for extracting the hidden mark. Theperformance of the system is compared with the theoretical bounds derived inthis Chapter.

• A toy example for Discrete Case is proposed, and performance analysis is givenwithout analyzing the theoretical limits.

Organization of the Dissertation

This dissertation is constituted from two Parts and five Chapters. Part-I is titled as“Problem Statement and Preliminaries”, and includes a general introduction and achapter introducing preliminary notions such as the definition of information theo-retical elements, basic source and channel coding concepts and details of two pow-erful coding techniques that are used in this dissertation: Trellis Coded Quantiza-tion(TCQ) and Low Density Parity Check(LDPC) Codes (Chapter-1).

Part-II is dedicated to the contributions of this dissertation and contains fourchapters. In Chapter-2, we review the theoretical background for source coding withside information such as Slepian-Wolf theorem and Wyner-Ziv theorem, and explain

- 10 -

Introduction

state of art Distributed Source Coding implementations in literature. We then in-troduce our practical code design for Slepian-Wolf problem which is based on LDPCcodes for discrete alphabet input and we compare this scheme with existing systems.Finally, the extension of our practical code design for still-image compression usingLDPC codes for binning and low frequency DWT coefficients as the side informationavailable to the decoder.

In Chapter-3, we give the theoretical background of channel coding with sideinformation such as Gel’fand Pinsker theorem and Costa’s “writing on dirty paper”setup, and depict existing Informed data hiding implementations in literature. Thenwe presented our proposed informed data hiding methods, a low embedding rate onDWT coefficients of still-images and a high embedding rate using superposition of agood source code (TCQ) and a good channel code (LDPC).

In Chapter-4, we give our information theoretic contributions on channel codingwith side information. We extended the Costa’s “channel coding with side informa-tion perfectly known to the encoder” setup to “channel coding with side informationpartially known to the encoder and partially known to the decoder (which need notto be the same)”. The maximum achievable rate is calculated for this setup. Thisglobal setup can be reduced to 6 different sub-cases, where 4 of them are well-knownsetups, 2 of them are new. We analyze all 6 sub-cases and the general setup, andcompare with the Costa’s initial setup.

Now we have all ingredients to construct our final system in Chapter-5, in orderto make source-channel coding, to hide information within a host signal and compressit using distributed compression techniques. The problem is formalized as the pointto point communication between Alice and Bob by using a non-trust carrier. Alicewants to hide some information into her original copy and send it by the carrier.Carrier wants to compress this watermarked data, and the only thing he has is anoisy copy of the original data shared by Bob at the receiver end. We derive therate-distortion function of the Carrier, and the capacity of embedding system for thegaussian input case. Surprisingly, the absence of the noisy copy of Bob to Carrierencoder does not affect to the rate-distortion function of the Carrier. Similarly, theabsence of the Alice’s original copy to Bob does not affect to the embedding capacityformula. After these theoretical findings, we proposed a practical code design forthe gaussian case by using our system proposed for DSC in Chapter-2 and high rateembedding system proposed for IDH in Chapter-3. The chapter is finalized with apractical code proposition for Binary Symmetric Channel.

The dependencies between the chapters can be seen in Figure-I.

- 11 -

Introduction

Introduction and Preliminaries


Informed Watermarking

Dirty Paper Coding with Partial State Information

Data Hiding and Distributed Source Coding

1

2

3

4

5

Figure 8: Chapter Dependencies of this dissertation.

- 12 -

Chapter 1

Preliminaries

Contents

1.1 Notations and Conventions . . . . . . . . . . . . . . . . . 14

1.2 Entropy and Mutual Information . . . . . . . . . . . . . . 15

1.3 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . 19

1.6 Distributed Source Coding . . . . . . . . . . . . . . . . . . 19

1.7 Writing on Dirty Paper . . . . . . . . . . . . . . . . . . . . 20

1.8 Message Passing Algorithm . . . . . . . . . . . . . . . . . 20

1.9 Trellis Coded Quantization (TCQ) . . . . . . . . . . . . . 21

1.9.1 Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . 22

1.9.2 BCJR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.10 Low Density Parity Check (LDPC) Codes . . . . . . . . 25

1.10.1 Decoding with belief propagation . . . . . . . . . . . . . . . 27

1.10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 27

1.10.1.2 Initialization . . . . . . . . . . . . . . . . . . . . . 28

1.10.1.3 Check node iteration . . . . . . . . . . . . . . . . 29

1.10.1.4 Variable node iteration . . . . . . . . . . . . . . . 29

1.10.1.5 Final Guess . . . . . . . . . . . . . . . . . . . . . . 29

1.10.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.10.3 Performance of 1/2 LDPC codes . . . . . . . . . . . . . . . 30

1.10.4 A visual example . . . . . . . . . . . . . . . . . . . . . . . . 31

1.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

- 13 -

Chapter 1. Preliminaries

In this Chapter, we introduce the notations used throughout this dissertation,and define the information theoretical utilities such as entropy, differential entropyand mutual information. After a brief explanation of source coding and channelcoding limits with the help of the entropy related quantities, we give a practicalsource coding example with Trellis Coded Quantization (TCQ) and one channelcoding example with LDPC coding. You can also find the explanation of decodingalgorithms such as Viterbi decoding, BCJR decoding and belief propagation.

1.1 Notations and Conventions

Throughout this dissertation, we will use standard concepts and results from Infor-mation theoretic quantities, that can be found, for example in Cover and Thomas

(1991). Random variables will be denoted by capital letters, specific values theymay take will be denoted the corresponding lower case letters, and the calligraphicfont is used for sets. Similarly, random vectors, their realizations, and alphabets willbe denoted respectively, by boldface capital letters, boldface lowercase letters, andcalligraphic letters subscripted by the corresponding dimension.

Thus, for example, Xn will denote a random n-vector (X1, ..., Xn), and xn =(x1, ..., xn) is a specific vector value in X n, the n-th Cartesian power of X , whichis drawn from independent and identically distribution(i.i.d.). For a pair of discreterandom variables (X, Y ) with a joint distribution p(x, y), the entropy of X is denotedby H(X), the conditional entropy of X given Y by by H(X|Y ), the joint entropy byH(X, Y ), and the mutual information by I(X; Y ). A more detailed description onthe entropy related quantities can be found in Chapter-1.2.

A distortion measure d is a mapping from the set X × Y into the set of non-negative reals d : X × Y → R+. Two distortion functions used in this chapterare:

• Hamming (probability of error) distortion which is given by

d(x, y) =

{

0, if x = y,1, if x 6= y,

(1.1)

and corresponds also to the probability of error distortion, since Ed(X, Y ) =Pr(X 6= Y )

• Square error distance which is given by:

d(x, y) = (x − y)2 (1.2)

The distortion between two sequences d(x,y) of length n is given by:

d(x,y) =1

n

n∑

i=1

d(xi, yi) (1.3)

- 14 -

1.2. Entropy and Mutual Information

1.2 Entropy and Mutual Information

Entropy is one of the key elements of the information theory. Borrowed from ther-modynamics, it is known as the uncertainty of a random variable. It is measured innats (natural log base) or in bits (log2 base). Before the definition of the entropy,we introduce Shannon information content.

Assume a discrete random variable X drawn from a finite set x ∈ X with aprobability mass function p(x) = Pr{X = x}.

Definition 1.1 Information content of an outcome x is defined to be

i(x) = log2

(

1

p(x)

)

= − log2 (p(x)) . (1.4)

Definition 1.2 Entropy is defined to be the average Shannon information contentof an outcome:

H(X) ≡ E{i(x)} = −∑

x∈X

p(x) log2 (p(x)) (1.5)

with the convention for p(x) = 0, p(x) log2 (p(x)) ≡ 0, since limθ→0+ (θ log2(θ)) = 0.

Now we introduce joint and conditional entropy, and mutual information.

Definition 1.3 Joint Entropy H(X, Y ) of a pair of discrete r.v. X, Y drawing from(x, y) ∈ X × Y, with joint probability mass function p(x, y) = Pr{X = x, Y = y} is

H(X, Y ) = −∑

x∈X

∑

y∈Y

p(x, y) log2 (p(x, y)) . (1.6)

Definition 1.4 If (x, y) ∼ p(x, y), then conditional entropy H(Y |X) is

H(Y |X) = −∑

x∈X

p(x)H(Y |X = x)

= −∑

x∈X

p(x)∑

y∈Y

p(y|x) log2 (p(y|x))

= −∑

x∈X

∑

y∈Y

p(x, y) log2 (p(y|x)) . (1.7)

The relation between the joint and conditional entropy can be expressed as:

H(X, Y ) = H(X) + H(Y |X) (1.8)

= H(Y ) + H(X|Y ) (1.9)

- 15 -


Definition 1.5 The relative entropy or Kullback-Leibler divergence between two prob-ability mass function p(x) and q(x) is

D(p‖q) =∑

x∈X

p(x) log2

p(x)

q(x)(1.10)

Definition 1.6 Given (x, y) ∼ p(x, y), the mutual information I(X; Y ) is the rela-tive entropy between the joint distribution and the product distribution p(x)p(y)

I(X; Y ) = D(p(x, y)‖p(x)p(y))

=∑

x∈X

∑

y∈Y

p(x, y) log2

p(x, y)

p(x)p(y)(1.11)

Some of the relationships between the entropy and mutual information is

I(X; X) = H(X), (1.12)

I(X; Y ) = H(X) + H(Y ) − H(X, Y ), (1.13)

I(X; Y ) = H(X) − H(X|Y ), (1.14)

I(X; Y ) = H(Y ) − H(Y |X), (1.15)

I(X; Y ) = I(Y ; X). (1.16)

Figure 1.1: The Venn diagram of the relationship between entropy and mutual in-formation.

The Venn diagram shown in Figure-1.1 expresses the relationship between H(X),H(Y ), H(X, Y ), H(X|Y ), H(Y |X) and I(X; Y ).

In this dissertation, we also use entropy and mutual information of more thantwo random variables. Now we define the chain rules in order to calculate entropyrelated functions for more than 2 random variables.

- 16 -

1.2. Entropy and Mutual Information

Definition 1.7 (Chain Rule for entropy) Let Random variables X1, X2, .., Xn bedrawn according to p(x1, x2, .., xn), then

H(X1, X2, .., Xn) =n∑

i=1

H(Xi|Xi−1, .., X1). (1.17)

Definition 1.8 Conditional mutual information of random variables X and Y givenZ is

I(X; Y |Z) = H(X|Z) − H(X|Y, Z). (1.18)

Definition 1.9 (Chain Rule for mutual information):

I(X1, X2, .., Xn; Y ) =n∑

i=1

H(Xi; Y |Xi−1, .., X1). (1.19)

Information content definition can be extended for continuous random variablesdrawn from an infinite set. Let X continuous r.v. with probability density functionf(x) drawn from a support set S.

Definition 1.10 Differential entropy is defined to be

h(X) =

∫

Sf(x) log2

(

1

f(x)

)

dx. (1.20)

Below, we give numerical examples for entropy and differential entropy of severalprobability distribution functions.

Example 1.1 (Binary Distribution) Entropy of a r.v. X from finite set X ∈ {0, 1}where p(0) = a and p(1) = 1 − a is :

H(X) = a log2(1/a) + (1 − a) log2(1/(1 − a))def= H(a), (1.21)

where 0 ≤ a ≤ 1.

The graph of H(a) versus a is shown in Figure-1.2. Please note that H(a) maximizesfor a = 1/2.

Example 1.2 (Uniform Distribution) Consider a random variable distributed uni-formly between 0 and a as seen in Figure-1.3. Then its differential entropy is:

h(X) =

∫ a

0

1

alog2 a dx = log2 a. (1.22)

- 17 -


Figure 1.2: Binary entropy function H(a) versus a.

Figure 1.3: Uniform density function p(x) versus x where p(x) = 1/a for 0 ≤ x ≤ a.

Example 1.3 (Gaussian Distribution) Consider a random variable with a gaus-sian distribution X ∼ N (0, P ) hence having probability density function f(x) =1/√

2πP exp (−x2/2P ). Then its differential entropy is:

h(X) =

∫ ∞

−∞f(x) log2(f(x))dx =

1

2log2 2πeP. (1.23)

Remark 1.1 For a given probability density function with variance P , the gaussiandistribution has the greatest differential entropy.

1.3 Causality

A system is called causal if its output depends only on its past and its presentinputs. Otherwise, if the output depends also on future inputs, then it is defined tobe noncausal. During this dissertation, we focus on the non-causal systems.

- 18 -

1.4. Source Coding

1.4 Source Coding

Definition 1.11 We define the rate distortion function of a discrete memorylesssystem in Figure-1.4 with a fidelity criterion d(X, X) ≤ D as

R(D) = minp(x|x):E{d(X,X)≤D}

I(X; X), (1.24)

where the minimum is taken over all conditional distributions p(x|x) for which thejoint distribution p(x, x) satisfies the expected distortion constraint.

Figure 1.4: A compression system.

Thus, the Rate-Distortion function gives the minimum rate R needed to have acompression of the input with a maximum distortion level D.

1.5 Channel Coding

Definition 1.12 We define the channel capacity of a discrete memoryless system inFigure-1.5 as

C = maxp(x)

I(X; Y ), (1.25)

where the maximum is taken over all possible input distributions.

If we analyze the operational meaning of the channel capacity, it can be expressedas the highest rate in bits per channel use at which information can be sent witharbitrarily low error probability.

Figure 1.5: A communication system.

1.6 Distributed Source Coding

In the area of compression of correlated multi-sources, Slepian and Wolf (1973)have showed that with a separate encoding of each source and the joint decoding at

- 19 -


the receiving end has no rate loss with respect to the case of joint encoding and jointdecoding system. Even with the separate encoding, the joint decoder can exploitthe correlation between the sources. The idea is the each separate encoder partitionpossible inputs into random subsets, and send only the index of the subset (known assyndrome) and the decoder used channel coding principles in order to estimate thesources from their syndromes. Motivated by the idea of developing low complexityencoders for low-power handhelds, in this dissertation we propose to transmit theparity checks of a high performance error correcting codes such as LDPC for theseparate compression of correlated sources.

1.7 Writing on Dirty Paper

Costa (1983) has introduced the terminology “Writing on Dirty Paper” for the prob-lem of coding with state information at the encoder. The encoder can communicatewith the decoder using a signal X with a limited power P and try to sent a messageM given that the state information S of the channel is accessible only to the encoder.Costa has showed that the non-availability of the state information to the decoderdoes not affect the capacity of the system. The state information is assumed to be adirt. Instead of canceling out this dirt by using its limited power P , the encoder canuse its power in the direction of the dirt, and can achieve the same capacity wherethe state information is accessible to the decoder. An auxiliary variable U is usedfor encoding such that U = X + αS, where X is the output of the encoder, and α isa multiplication constant between 0 and 1. If α is chosen as P/(P + N), the rate ofthe communication is maximized. Then it is enough to send the appropriate X bymodifying the αS to the closest U value which is indexed by the message M .

Chen and Wornell (1998) and Cox et al (1999) have firstly realized thatthis setup can be used to determine the capacity of a blind watermarking problem. Inthis dissertation, highly motivated by this setup, we search the capacity of a systemwhere the state information is partially available to the decoder. Moreover practicalcode design for writing on dirty paper will be proposed.

1.8 Message Passing Algorithm

Message Passing is a simple and powerful algorithm that is used to resolve diverseresearch problems from counting problem to marginalization problems. Since it isfundamental for the belief propagation for LDPC decoding and BCJR algorithm forturbo codes, we mention it with a simple counting problem on a straight line (SeeFigure-1.6). Instead of dedicating one person to sum up all of the group, the headand the tail of the line can send a message to their neighbors by simply saying 1,And if a person receives a message from only one of his neighbors, it adds up 1 to themessage and transmits to his other-side neighbor. If it receives the messages of bothof two neighbors, the sum of the line can be found as the sum (left message+right

- 20 -

1.9. Trellis Coded Quantization (TCQ)

message+1). If there exists no loop, the message passing algorithm converges to theexact solution. You can find the details of BCJR algorithm (backward-forward) andbelief propagation in the following sections.

Figure 1.6: Counting problem on a straight line.

1.9 Trellis Coded Quantization (TCQ)

Trellis Coded Quantization (TCQ) is a limit achieving vector quantization methodproposed by Marcellin and Ficher (1990). It uses Ungerboeck (1982)’s setpartitioning idea for Trellis Coded Modulation(TCM). Let Xn a random n vector(X1, ..., Xn) where each element of the vector is i.i.d. with probability density func-tion P (X). We want to quantize this vector by m bits per sample hence to transmitone of the 2m symbols per sample. The basic idea of the TCQ is such that theelements of quantized data Y n constitutes a markov sequence, and while it is trans-mitted through a noisy channel, Xn is the output of the channel. The aim is to findthe sequence Y n most probable given Xn. First, the possible symbols are doubled to2m+1, then partitioned into 2k+1 subsets where k ≤ m. TCQ uses a rate k/(k + 1)convolutional code to expand k input bits to k + 1 to select one of the k + 1 subsetand uses the rest m−k bits to select one of the 2m−k symbols in the selected subset.Then by minimizing the MSE between Xn and possible sequences Y n, Y n is found.Using a convolutional code and set partitioning has a better performance than con-ventional modulation techniques. The min-sum algorithm which is also know asViterbi algorithm can be applied to find the most probable sequence Y n.

Now we will give a brief example of trellis and a TCQ example with a 1/2 rateconvolutional code with memory 2. A systematic recursive convolutional code withgenerator matrix (011, 101) in binary form or (3, 5) in octal digits can be seen inFigure-1.7. The blocks D are the delay elements with unit time.

The convolutional code in Figure-1.7 corresponds to a state diagram as seen inFigure-1.8 where the states are the 2 bit memory and state transitions are describedby the arrows marked as corresponding input output pairs ik/y1,ky0,k. If we map theoutput sequences to 4 reconstruction levels D0, D1, D2 and D3.

For instance for 2 bits per sample, the reconstruction levels are doubled as seen inFigure-1.9. For time instant i and for each possible transitions Dk where 0 ≤ k ≤ 3,

- 21 -


Figure 1.7: A 1/2 recursive systematic convolution code with memory 2 and gener-ator matrix (3, 5) in octal digits.

the MSE cost of selecting the closest element within the selected output is calculated(Xi − Dk)

2. Starting by state 00 at time 0, the one bit input choses one of thepossible 2 subsets. Then using viterbi algorithm as explained in Chapter-1.9.1, themost probable path pn

1 is found. Then when at time t, pt governs chose one of thefour dictionary Dk by the convolutional code, extra 1 bit is needed to choose theindex of the sub-dictionary of Dk.

1.9.1 Viterbi algorithm

Viterbi algorithm or known as min-sum algorithm (Viterbi, 1967) tries to findthe most probable sequence within the valid codewords. For all time sequencest = 1, .., n and all possible output level k = 1, .., 4, the MSE cost of each outputsequence (Xt − Dk)

2 is calculated. Initializing the cost of the state 0 at t = 0 as 0and the other states as ∞, each node transmits the current state cost plus the costof the arc chosen. In the next step, the node chooses the minimum cost messageamong the messages it receives and send it to the next time step. At the end theminimum cost of the overall codewords is found. Finally the most probable paththat minimizes the word error can be found by back-tracing the minimum sum path.For instance Figure-1.10 shows all the possible paths of a trellis with length 4. HenceViterbi algorithm searches the minimum cost path among all possible paths.

1.9.2 BCJR

While viterbi algorithm is a maximum-likelihood decoding method which minimizesthe probability of the word error, minimizes the probability of the sequence error,Bahl et al (1974) proposed an algorithm known also as BCJR, which can minimizethe symbol error probability. Borrowing from the message passing algorithm, BCJRcalculates the probability of a symbol given the observed sequence.

The state transitions of the Markov source are governed by the transition prob-abilities

pt(m|m′) = Pr{St = m|St−1 = m′},

- 22 -

1.9. Trellis Coded Quantization (TCQ)

0 1

2 3

0/D0

0/D0

0/D2

0/D2

1/D3

1/D3

1/D1

1/D1

Figure 1.8: State transition of the recursive systematic convolutional code (1, 3) inoctal digits.

Figure 1.9: Output points and corresponding partitions for 2 bits per sample.

and the output by the probabilities

qt(X|m′, m) = Pr{xt = X|St = m, St−1 = m′}.

Since outputs are deterministic given the previous and current state, qt(X|m′, m)term only takes 0 or 1 depending on the possibility of that transition.

for 1 ≤ t ≤ τ . The decoder receives the error sequence Y τ1 and try to estimate

the a posteriori transition probabilities given the observation Y τ1 i.e.

Pr{St−1 = m′; St = m|Y τ1 } = Pr{St−1 = m′; St = m; Y τ

1 }/Pr{Y τ1 }. (1.26)

For this purpose, it is more convenient to estimate the quantity σt(m′, m) = Pr{St−1 =

m′; St = m; Y τ1 }.

- 23 -


Figure 1.10: Viterbi decoding of a vector with length 4.

Let we define the probability functions

αt(m) = Pr{St = m; Y t1 }

βt(m) = Pr{Y τt+1|St = m}

γt(m′, m) = Pr{St = m; Yt|St−1 = m′}

σt(m′, m) = Pr{St−1 = m′; St = m; Y τ

1 } = αt−1(m′) · γt(m

′, m) · βt(m).

Now

γt(m′, m) =

1∑

U=0

Pr{St = m|St−1 = m′} · Pr{ut = U |St = m, St−1 = m′} · Pr{Yt|U}

=1∑

U=0

pt(m, m′) · qt(U |m′, m) · R(Yt|U) (1.27)

is calculated for each possible transitions and for t = 1, 2, .., τ where R(Yt|U) is theappropriate symbol transition probabilities of the channel.

Then, for t = 1, 2, .., τ

αt(m) =M−1∑

m′=0

Pr{St−1 = m′; St = m; Y t1 }

=M−1∑

m′=0

Pr{St−1 = m′; Y t−11 } · Pr{St = m; Yt|St−1 = m′, Y t−1

1 }

=M−1∑

m′=0

Pr{St−1 = m′; Y t−11 } · Pr{St = m; Yt|St−1 = m′}

=

M−1∑

m′=0

αt−1(m′) · γt(m

′, m). (1.28)

The boundary conditions of α0(m) for t = 0 are

α0(0) = 1; α0(m) = 0, for m 6= 0. (1.29)

- 24 -

1.10. Low Density Parity Check (LDPC) Codes

Similarly, for t = 1, 2, .., τ − 1.

βt(m) =M−1∑

m′=0

Pr{St+1 = m′; Y τt+1|St = m}

=

M−1∑

m′=0

Pr{St+1 = m′; Yt+1|St = m} · Pr{Y τt+2|St+1 = m′}

=M−1∑

m′=0

βt+1(m′) · γt+1(m, m′). (1.30)

The boundary condition for βτ is

βτ (m) = 1/M, (1.31)

since the termination state probability is equally distributed over all possible Mstates.

Finally σ is calculated as

σt(m′, m) = Pr{St−1 = m′; Y t−1

1 } · Pr{St = m; Yt|St−1 = m′} · Pr{Y τt+1|St = m}

= αt−1(m′) · γt(m

′, m) · βt(m). (1.32)

The recursive calculation σt(m′, m) can be done in 4 steps given below.

1. Initialization of α0(m) and βτ (m).

2. Calculation of γt(m′, m) and αt(m) for all t = 1, 2, ..; τ and for all possible

transitions.

3. Recursively compute βt(m).

4. compute σt(m′, m).

The pseudo-code of BCJR Algorithm can be find in Algorithm-1.

1.10 Low Density Parity Check (LDPC) Codes

Low Density Parity Check (LDPC) codes are first proposed by Gallager (1963)and reinvented by Mackay and Neal (1997). A k/n rate linear binary (n, k)LDPC Code is a block code that is defined by an (n − k) × n sparse parity checkmatrix H, which has few numbers of 1s in every rows and columns (For instanceEquation 1.33). Another representation of the parity check code is by its bipartitegraph(See Figure 1.11) Mackay (2003).

- 25 -


Algorithm 1 BJCR Algorithm

Require: The received vector Y τ = {Y1, Y2, .., Yτ}Ensure: σt(m

′, m)Initialize α0(m) and βτ (m) according to Equations-1.29 and 1.31while t : 1 ≤ t ≤ τ do

calculate γt(m′, m) as in Equation-1.27

calculate αt(m) as in Equation-1.28end whilewhile t : τ − 1 ≥ t ≥ 1 do

calculate βt(m) as in Equation-1.30end whilewhile t : 1 ≤ t ≤ τ do

calculate σt(m′, m) as in Equation-1.32

end whileReturn σt(m

′, m).

H =

1 0 1 0 1 0 0 0 0 01 1 0 1 0 1 0 0 0 00 1 0 0 1 0 0 0 1 10 0 1 1 0 0 0 1 1 00 0 0 0 0 1 1 1 0 1

. (1.33)

An ensemble of the LDPC codes is described by the degree distribution polynomialsλ(x) and ρ(x) Richardson et al (2001); Chung (2000). λ(x) is given as

λ(x) =∑

i

λixi−1, (1.34)

and ρ(x) is defined as

ρ(x) =∑

j

ρjxj−1, (1.35)

where λi is the fraction of edges that are incident on degree-i bit nodes and ρj is thefraction of edges that are incident on degree-j check nodes. A code is to be regular(wc,wr) if the degree polynomials are λ(x) = xwc−1 and ρ(x) = xwr−1.The rate ofthe LDPC code for a given pair of degree profiles is bounded by

R ≥ 1 −∫ 10 ρ(x)dx∫ 10 λ(x)dx

, (1.36)

with equality if and only if the rows of the parity check matrix are linearly indepen-dent.

- 26 -


Figure 1.11: Bipartite graph representation of the parity check matrix H.

1.10.1 Decoding with belief propagation

The transmitter sends a codeword x such that Htx = 0. The receiver receives thevector y with a transition probability p(y|x). The aim of the decoder is to findmaximum likelihood codeword xML = arg maxx p(y|x).

If H does not include cycles, the sum product algorithm converges the exactsolution Pearl (1988). You can find below the sum-product algorithm on bipartitegraph.We use the following notation, also shown in Figure 1.12.

1.10.1.1 Definitions

• The set of bits n that participates the check m is N (m) ≡ {n : Hmn = 1}. Forexample N (1) ≡ {1, 3, 5, 7} in Figure 1.11.

• The set of checks in which bit n participates is M(n) ≡ {m : Hmn = 1}. Forexample M(1) ≡ {1, 2} in Figure 1.11.

- 27 -


Figure 1.12: Belief propagation on bipartite graph H.

• N (m)\n is the set N (m) with bit n excluded.

• qxmn is the probability of the n’th bit of the vector x is x given the informations

obtained via checks other than check m.

• rxmn is the probability of check m satisfied if bit n of x is considered fixed at x

and the other bits qmn′ : n′ ∈ N (m)\n.

• δqmn is difference between the probabilities n’th bit of the vector x is 0 and1 given the informations obtained via checks other than check m, δqmn =q0mn − q1

mn.

• δrmn is the probability check m satisfied if bit n x is 0 given the informationsobtained via checks other than check m minus that of bit n x is 1, δrmn =r0mn − r1

mn.

1.10.1.2 Initialization

Depending on the vector y received from the channel and the channel model, thelikelihood probability p(xn|y) for each bit n are calculated. For instance, for amemoryless binary symmetric channel with crossover probability ρ, p(x1 = 0|y1 =0) = (1 − ρ) and p(x1 = 1|y1 = 0) = ρ.

q0mn and q1

mn values are initialized with the corresponding likelihood probabilitiesreceived from the channel respectively, such that q0

mn = p(xn = 0|y) and q1mn =

- 28 -


p(xn = 1|y). Then each variable node sends the messages δqmn to its connectedcheck.

1.10.1.3 Check node iteration

Each check node sends a message to the connecting bit j, raij , which is an approxi-

mation to the probability that check i is satisfied given the symbol j is a:

raij = Pr{check i satisfy|xj = a}, (1.37)

The

r0mn ≈

∑

xn′ :n′∈N (m)\n

p(∑

xz :z∈N (m)

xz = 0mod 2|xn = 0)∏

N (m)\n

qxn′

mn′ (1.38)

Then there is a shortcut for calculating raij by first calculating δrmn

δrmn =∏

n′∈N (m)\n

δqmn′ , (1.39)

where r0mn = 1/2(1 + δrmn) and r1

mn = 1/2(1 − δrmn). The δrmn can be calculatedefficiently by using backward-forward algorithm Bahl et al (1974).

1.10.1.4 Variable node iteration

In this step, q0mn and q1

mn values are calculated by using the output from the checknode iteration.

q0mn = αmnp(xn = 0|y)

∏

m′∈M(n)\m

r0m′n, (1.40)

andq1mn = αmnp(xn = 1|y)

∏

m′∈M(n)\m

r1m′n, (1.41)

where αmn is a normalization factor such that q0mn + q1

mn = 1.

1.10.1.5 Final Guess

Posterior probabilities of each bit can be calculated as

q0n = αnp(xn = 0|y)

∏

m∈M(n)

r0mn, (1.42)

andq1n = αnp(xn = 1|y)

∏

m∈M(n)

r1mn. (1.43)

- 29 -


The estimate x can be found by just thresholding the posterior probabilities

xn = arg maxi

qin. (1.44)

For the codeword decoding point, we can check if all the check nodes are satisfiedHx = 0mod2. If it is not a codeword the check-node and variable-node iterationswill be repeated respectively. The iterations halt even if a codeword is found or amaximum number of iteration is reached.

1.10.2 Encoding

Assume an (n − k) × n sparse parity check matrix H in systematic form, such thatH = [P|IM ]. Then, the corresponding generator matrix G is simply an n × k densematrix with the form G =

[

IK |Pt]

, where P has dimension k × m and I is theidentity matrix. Hence from k input bit vector t a length n codeword vector x canbe calculated by simply a matrix product operation G × t = x. The method ofRichardson and Urbanke (2001b) can be used for fast encoding of LDPC codes.

1.10.3 Performance of 1/2 LDPC codes

In this part, we evaluate the error correcting capacity of rate 1/2 LDPC binary codesfor various block lengths and degree distribution polynomials.

Let xn/2 is a n/2 length binary string with Bernoulli(1/2). Using a 1/2 rateLDPC coder, x is coded as a n bit length vector r. then it is modulated to R using2 level Pulse Amplitude Modulation (PAM) as

Ri =

{

−√Q, if ri = 0

+√

Q, if ri = 1, (1.45)

Then the AWGN channel outputs Y = R + Z where Z is i.i.d. r.v. ∼ N (0, N).Decoder initializes the likelihood ratio as

p(ri = 1|Yi)

p(ri = 0|Yi)=

fN (Yi −√

Q)

fN (Yi +√

Q)

= exp

(−(Yi −√

Q)2

2N− −(Yi +

√Q)2

2N

)

= exp

(

2Yi√

Q

N

)

, (1.46)

(1.47)

where fN corresponds to probability density function of a gaussian distribution with0 mean and variance N . Then belief propagation decoding is done as explainedin Chapter-1.10.1. The performance comparisons of the decoding error rates aregiven in Richardson et al (2001) for (3, 6) regular LDPC code, turbo code and

- 30 -

1.11. Conclusion

optimized irregular LDPC code (See Figure 1.13). Please note that the comparisonis made for a code length of 106 for all codes.

Figure 1.13: Performance comparison of the error rates of (3, 6) regular LDPC code,turbo code and optimized irregular LDPC code. The channel is Binary input, addi-tive white gaussian noise.

1.10.4 A visual example

In this section we give a visual example of LDPC coding by using a black andwhite cartoon image as the input binary string to be coded. Let the image inFigure1.14(a) is a binary string need to be transmitted through a noisy channel.This 100 × 100 cartoon is composed from 1s and 0s that corresponds to black andwhite pixels respectively. We add redundancy in order to detect and correct theerroneous bits during the transmission. A 1/2 rate systematic regular LDPC codewith the degree polynomials λ(x) = x2 and ρ(x) = x5 is used to code the originalimage (each information bit participates 3 checks and each check bit is calculatedby sum of 6 information bits,). Encoded image with its redundancy bits can beseen in Figure1.14(b). Afterward the encoded bits are transmitted through a BinarySymmetric Channel with crossover probability p(BSC) = 0.07. The decoder receivesthe noisy image in Figure1.14(c) and uses the belief propagation method explained inChapter1.10.1 by taking into account of the apriory probability of the systematic bitsis known to be P (x = 1) = 0.2445 and the channel characteristic is p(BSC) = 0.07.

You can see the output of the belief propagation in Figure-1.15 after 1 iteration(a), after 5 iterations (b), and after 10 iterations (c) in Figure1.15.

1.11 Conclusion

This chapter has introduced both theoretical and practical tools that will be used inthis dissertation. Entropy, mutual information definitions will be used to calculate

- 31 -


(a) (b) (c)

Figure 1.14: LDPC coding example. Cartoon copyright c© 2007 piyalemadra.com,used with permission. (a) Original binary cartoon with size 100 × 100 with 0scorrespond to white and 1s correspond to black pixels. The ratio between the numberof black pixels and the number of total pixels is 0.2445. (b) Visualization of thecartoon coded with 1/2 systematic LDPC code such that the output of the encodercontains the original image and its parity checks with size 100×100. (c) Throughoutthe transmission, both the cartoon and its parity check bits are exposed to bit errorssuch that the error probability of a received bit is 0.07.

the capacity calculations of the proposed systems in Chapter-4 and Chapter-5. Fur-thermore, high performance channel code LDPC and high performance source codeTCQ will be utilized in order to design of our practical codes for data hiding andSlepian-Wolf source coding in the following Chapters.

- 32 -

1.11. Conclusion

(a) (b) (c)

Figure 1.15: LDPC decoding. (a) After 1 iteration. (b) After 5 iterations. (c) Theoriginal cartoon is decoded without any error after 10 iterations.

- 33 -


- 34 -

Part II

Contributions

- 35 -

Chapter 2


Contents

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2 Theoretical Background . . . . . . . . . . . . . . . . . . . 39

2.2.1 Slepian-Wolf Coding of Discrete Sources . . . . . . . . . . . 39

2.2.2 Wyner-Ziv Theorem . . . . . . . . . . . . . . . . . . . . . . 41

2.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.3.1 Code Design for Slepian-Wolf Coding . . . . . . . . . . . . 45

2.3.1.1 Convolutional Codes . . . . . . . . . . . . . . . . . 45

2.3.1.2 Turbo Codes . . . . . . . . . . . . . . . . . . . . . 47

2.3.1.3 LDPC Codes . . . . . . . . . . . . . . . . . . . . . 47

2.3.2 Code Design for Wyner-Ziv Coding . . . . . . . . . . . . . . 48

2.4 Practical Code Design . . . . . . . . . . . . . . . . . . . . 50

2.4.1 Input Constraints and Theoretical Correlation Noise Anal-ysis for a Given Rate . . . . . . . . . . . . . . . . . . . . . . 50

2.4.2 LDPC Code Generation and Coset Index Calculation . . . 51

2.4.3 Modified Sum Product Algorithm . . . . . . . . . . . . . . 51

2.4.4 Experimental Setup and Performance Analysis . . . . . . . 53

2.5 Practical Application for Still-Image Coding . . . . . . . 54

2.5.1 Side Information . . . . . . . . . . . . . . . . . . . . . . . . 55

2.5.1.1 Coset Creation . . . . . . . . . . . . . . . . . . . . 56

2.5.1.2 Iterative Decoding . . . . . . . . . . . . . . . . . . 56

2.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . 56

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

- 37 -

Chapter 2. Distributed Source Coding

A practical code design for Slepian-Wolf setup is proposed. Based on LDPC binningtechniques, a performance of only 0.08 bits/channel use away from the theoreticallimits is achieved. The system is applied to a still image coding scheme where thedecoder has access to the low-pass wavelet coefficients of the image, a complementarycoding is done based on DSC principles for the refinement of the side information atthe decoder.1

2.1 Introduction

Slepian and Wolf (1973) derived the achievable rate region for the problem oflossless source coding with side information. Wyner and Ziv (1976) later showedthe rate distortion function for such a system. Early 2000s, a potential application ofSlepian-Wolf and Wyner-Ziv theorems was realized such that compression complexityproblems on low-power devices can be shifted to the decoder side, and the practicalcode designs have been proposed based on channel coding principles. In this chapter,we introduce the recent work on constructing practical codes for source coding withside information using the framework of LDPC codes. The orientation of the chapteris as follows. In Chapter-2.2, the details of the two stimulating theorems for the DSCare given, Slepian-Wolf theorem for lossless compression of correlated sources, andWyner-Ziv theorem for the rate-distortion function of a source where a correlatedversion of it is available only to the decoder. Then the prior work in order to designpractical codes on this area is given in Chapter-2.3. Chapter-2.4 gives the detailsof our code design based on LDPC and compares our setup with the existing DSCsystems. Finally, the application of our code design applied on compression of still-image is proposed in Chapter-2.5 where it includes distributed source coding of astill image given that low-pass wavelet coefficients are available to the decoder.

2.1.1 List of Symbols

List of symbols that are used in this chapter can be find below.

X, Y Two i.i.d. correlated input sources.

X, Y Estimations at the decoder.H(X) Entropy of X.H(X, Y ) Joint entropy of X and Y .H(X|Y ) Conditional entropy of X given Y .RX , RY Achievable rates of X and Y .D Distortion level.Π, Π−1 Interleaver and deinterleaver.

1The contents of this chapter have been presented partially in Dikici et al (2005) and Dikici

et al (2006b).

- 38 -

2.2. Theoretical Background

2.2 Theoretical Background

2.2.1 Slepian-Wolf Coding of Discrete Sources

The Slepian-Wolf theorem states the admissible rate regions for coding two correlatedi.i.d. sources X and Y which are drawn from a finite alphabet. The encoding anddecoding of these two correlated sources depends upon the information availableat the encoders and decoders. Figure-2.1 generalizes 16 different cases by simplyswitching on and off of 4 switches S1, S2, S3 and S4 (Slepian and Wolf, 1973). Astate variable si is associated with switch Si taking the value 0 if the switch is opened,1 if it is closed. The quadruple {s1s2s3s4} will be used to specify the settings of theswitches. The cases vary in novelty and interest. For example case {1111} is alreadyknown since Shannon such that two correlated sources can be jointly compressedwith a total rate of RX + RY ≥ H(X, Y ). However the admissible regions of thecases {0011} and {0001} are the most interesting results of the Slepian Wolf theorem.

Figure 2.1: 16 Cases of correlated source coding.

Table-2.2 lists twelve theorems whose implications in connection with Figure-2.2give the admissible rate region R for the 16 cases. Certain lines and points on Figure-2.2 are labeled with the names of the theorems in Table-2.2. The admissible regionof a setup is determined immediately with these lines and points and the theoremsf and g. Symbol x in the first column of the Table-2.2 states that the theorem holdsboth when the corresponding switch is open and when it is closed.

For instance, in order to find the admissible region of the setting {1011}, Table-2.2 states that the Theorems B E a c d e f g apply. The first two show that R cannot extend below the line B and nor below the line E of Figure-2.2. The next fourrshow that the points a c d and e lie in R. Theorem f shows that points above a onthe RY axis and points on B to the right of c lie in R. Finally Theorem g shows theline segment ac is in R (See Figure-2.3 for the rate region of the case {1011}).

For the setting {0011}, the theorems A B E c d e f g hold. According to the firstthree theorems, R can not extend left side of line A nor below the lines E and B.The points c, e lie in R. Theorem f then shows that every point above d on A and

- 39 -


Table 2.2: Achievable rate regions according to the Slepian-Wolf Theorem

s1, s2, s3, s4 Theorem Theoremname

It is necessary that:0xxx A RX ≥ H(X|Y )x0xx B RY ≥ H(Y |X)xx0x C RY ≥ H(Y )xxx0 D RX ≥ H(X)xxxx E RX + RY ≥ H(X,Y )

It is sufficient that:1xx1 a RX = 0 RY = H(X,Y ) + ǫXY

x11x b RX = H(X,Y ) + ǫXY RY = 0xx1x c RX = H(X) + ǫX RY = H(Y |X) + ǫY

xxx1 d RX = H(X|Y ) + ǫX RY = H(Y ) + ǫY

xxxx e RX = H(X) + ǫX RY = H(Y ) + ǫY

ǫX , ǫY , ǫXY > 0Bit stuffing:

xxxx f (RX , RY ) ∈ R =⇒ (RX + δX , RY + δY ) ∈ RδX , δY ≥ 0Limited time sharing:

xxxx g If (RX , RY ) ∈ R =⇒ (RX′, RY

′) ∈ R andRX + RY = H(X,Y ) and RX

′ + RY′ = H(X,Y ),

then (RX′′ + RY

′′) ∈ R,RX

′′ = λRX + (1 − λ) + RX′

RY′′ = λRY + (1 − λ) + RY

′ 0 ≤ λ ≤ 1

every point to the right of c on B are also in R. By Theorem g, the line segmentdc is in R. The region R of Subfigure-2.4(a) is thus established. The novelty of theSlepian-Wolf setup is that the minimum admissible regions of 2 separate encoders-unique decoder({0011}) and unique encoder- unique decoder({1111}) overlaps onthe operating line segment dc. The admissible region of {0011} can be expressed as:

RX ≥ H(X|Y ) (2.1)

RY ≥ H(Y |X) (2.2)

RX + RY ≥ H(X, Y ) (2.3)

Moreover, the setting {0001} corresponds to the “Coding with Side Informationat the decoder” or “Distributed Source Coding”, where a source X is compressedsuch that a correlated version Y is accessible at the decoder. Table-2.2 shows thatTheorems A B C E d e f g all apply. Locating the lines ABCE on Figure-2.2, R can

- 40 -


Figure 2.2: Lines and points of Table-2.2.

Figure 2.3: Admissible Slepian-Wolf rate region R for the case {1011}.

not extend to the left of A nor below C. The point d is in R, then by Theorem f,all the points to the right of d on C and every point above d on A are in R (SeeSubfigure-2.4(b)).

2.2.2 Wyner-Ziv Theorem

Wyner and Ziv (1976) find the rate-distortion function of a source X given thata correlated information Y available at the encoder, at the decoder or both wheredistortion is defined as a nonnegative function d(X, X) ≥ 0. As seen in Figure-2.5,two switches A and B controls the side information Y available to the encoder or thedecoder. Wyner-Ziv analyze the rate distortion of three cases:

• Switches A and B are open, i.e. no side information:

- 41 -


(a) Case {0011}. (b) Case {0001}.

Figure 2.4: Admissible Slepian-Wolf rate region R for the cases {0011} and {0001}.

Then the classical Shannon Theory yields

RX(D) = minp(x|x):E{d(X,X)}≤D

I(X; X). (2.4)

• Switches A and B are closed, i.e. both the encoder and the decoderhave access to the side information:In this case the rate distortion function is

RX|Y (D) = minp(x|x,y):E{d(X,X)}≤D

I(X; X|Y ) (2.5)

• Switch A is open and B is closed, i.e. only the decoder has access tothe side information:Then Wyner-Ziv show that the rate is

R∗X|Y (D) = min

p(z|x)p(x|s,z):E{d(X,X)}≤DI(X; Z) − I(Y ; Z) (2.6)

Wyner and Ziv (1976) show that:

RX|Y (D) ≤ R∗X|Y (D) ≤ RX(D) (2.7)

For D = 0 the theorem is consistent with the Slepian Wolf Theorem such thatRX|Y (0) = R∗

X|Y (0) = H(X|Y ).

Wyner-ziv derived the rate distortion function for binary symmetric case wherethey assumed that X is the unbiased input to a BSC channel with crossover proba-bility pz with 0 ≤ pz ≤ 0.5 and Y is the corresponding output. Y can be expressed asY = X⊕Z where Z is bernoulli(pz) distributed binary string and ⊕ is the addition in

- 42 -


Figure 2.5: Wyner-Ziv Setup.

modulo 2 arithmetic. The rate distortion function R∗X|Y for the Hamming distance

distortion measure is shown by Wyner and Ziv (1976) as:

R∗X|Y (D) = l.c.e. {H(pz ∗ D) − H(D), (pz, 0)} , 0 ≤ D ≤ pz (2.8)

where l.c.e. is the lower convex envelop and pz ∗D = pz(1−D) + D(1− pz), andH(λ) = −λ lnλ − (1 − λ) ln(1 − λ) is the entropy function for binary distributionwhich defined in Chapter-1.2. As seen from the graph of R∗

X|Y in Figure2.6, R∗X|Y =

H(pz ∗ D) − H(D) for 0 ≤ D ≤ dc and R∗X|Y is a straight line segment between

(dc, H(pz ∗ dc) − H(dc)) and (pz, 0). Hence, if we define g(D) = H(pz ∗ D) − H(D),then dc is the solution of the equation

g(dc)

dc − p0= g′(dc), (2.9)

where g′(dc) is the derivative of g(D) with respect to D at point dc. You can alsofind the graph of RX|Y (D), the rate distortion curve, where Y is accessible both atthe encoder and at the decoder. The analytic form of RX|Y (D) is known as Cover

and Thomas (1991):

RX|Y (D) =

{

H(pz) − H(D), 0 < D ≤ pz,0, D ≥ pz,

(2.10)

Hence, for the binary symmetric source case, RX|Y (D) = R∗X|Y (D) only at two

(R, D) points: (H(X|Y ), 0) and (0, pz). Otherwise there exists a rate loss withrespect to the RX|Y (D), RX|Y (D) < R∗

X|Y (D).

A more interesting result is found for coding when switch A is open and B isclosed for the continuous Gaussian case, such that there is no rate loss with respectto RX|Y (D) for all D values (Wyner and Ziv, 1976). Let X has i.i.d. gaussiandistribution with N (0, Q) and Y = X+Z where Z is gaussian i.i.d. with N (0, N) andis independent of X (See in Figure2.7). Then, the rate distortion function R∗

X|Y (D)

is equal to the RX|Y (D) which has been calculated by Berger (1971):

- 43 -


Figure 2.6: Graph of RX|Y (D), R∗X|Y (D), and H(pz) − H(D) versus D curves for

pz = 0.28. For binary symmetric case, R∗X|Y (D) has a rate loss with respect to

RX|Y (D) except the points (H(pz), 0) and (0, pz), and there is no rate loss for thesetwo points.

R∗X|Y (D) = RX|Y (D) =

{

12 ln QN

(Q+N)D , 0 < D < QNQ+N ,

0, D ≥ QNQ+N ,

(2.11)

Figure 2.7: Wyner-Ziv Setup for Gaussian case.

- 44 -

2.3. Related Works

because of the fact that the term I(X; Z|X, Y ) = 0 in the right hand-side of I(X; Z)−I(Y ; Z) = I(X; X|Y ) − I(X; Z|X, Y ) in Equation-2.6 for the gaussian case.

2.3 Related Works

In this section, we give the existing practical code designs for the Slepian-Wolf codingproblem. Starting from Wyner’s proposition based on using parity check codes, wewill state the state of art techniques based on convolutional codes, turbo codes,puncturing turbo codes and finally LDPC codes. Furthermore, we will state out theWyner-Ziv lossy compression design as a problem of quantization following with aSlepian-Wolf coding, and mention the existing practical code designs.

2.3.1 Code Design for Slepian-Wolf Coding

Slepian and Wolf (1973) have proposed a coding scheme based on random binningin their proofs. However, because of its non-constructive nature, it is not applicablein practical code design. Wyner (1974) first proposed a coding scheme based ongood parity-check codes for the {0001} setup of the Slepian-Wolf Coding problem.The idea is to partition the codeword space into cosets using ”good” (in the sensethat the codewords in the same coset are as far as possible) parity-check code H, andtransmit only the coset index s to the receiver. Then receiver can have an estimateof the source by accessing its coset index s and the correlated input Y . Hence thereceiver tries to estimate X by assuming that Y is the noisy observation of X andtry to eliminate the noise of Y by using the parity check information of X sent bythe encoder.

The two n-bit length binary source vectors X and Y can be modeled as Y =X ⊕ U where ⊕ is the modulo 2 operation, U is a n-length binary string withbernoulli distribution p1. Assume that an n−k×n parity check matrix H partitionsthe n dimensional vector space into 2k disjoint subspaces. The code vectors of Xmust satisfy H · Xt = 0. Decoding is done by calculating the syndrome of Y ,s = H · Yt = H · Ut. Then using a decoding function f(s), decoder finds the errorsequence and estimate X = Y ⊕ f(s). The probability of error Pr{X 6= X} =Pr{f(H · Ut) 6= U} → 0 for n → ∞. For a practical code design using syndromes,we need to wait till early 2000s.

2.3.1.1 Convolutional Codes

Pradhan and Ramchandran (1999) first used a channel coding technique knownas DIstributed Source Coding Using Syndromes (DISCUS) for the slepian-wolf prob-lem (setup {0001}). Borrowing from Ungerboeck (1982)’s Trellis Coded Modula-tion (TCM) method, Pradhan and Ramchandran (1999); Kusuma et al (2001)have proposed a 4-level and various number of state trellis-structured construction

- 45 -


based framework. In order to obtain a 2 : 1 compression rate, a 2/3 systematic con-volutional code is used and for an n bit input X, the convolutional code outputs nbit X and n/2 bit coset index s, and only s is sent through the channel. The decoderfinds the sequence which is closest element to the Y within the received coset indexof X (See Figure-2.8).

Figure 2.8: 2 : 1 rate DSC compression using a 2/3 convolutional code.

Let us give an intuitive example for the binary case. Assume that X and Y are3 bit long binary strings where the bits of X are drawn i.i.d. with Pr{Xi = 0} =Pr{Xi = 1} = 0.5. The correlated information Y is drawn such that the hammingdistance between X and Y is at most 1. For instance, given X has the value 101possible sequences of Y are 001, 100, 101 and 111.

The entropy and the conditional entropy of X and Y are found as H(X) = 3,H(Y ) = 3, H(X|Y ) = 2. According to the S-W theorem, H(X|Y ) bit per channeluse is enough to transmit X without loss. First of all, let us assume the case whereY is accessible both to the encoder and to the decoder. Since the encoder hasaccess to Y , it can code just the error pattern between X and Y and the decodercan successfully decode X using the error pattern between X and Y . The possiblesequences of X given Y is four, hence two bit is sufficient to communicate withoutany loss, which achieves the rate H(X|Y ).

However, in Slepian-Wolf setup {0001}, Y is not accessible to the encoder. Witha carefully design of a parity check code, X can be still sent with H(X|Y ) = 2 bits.Assume the 2/3 parity check matrix

H =

[

1 1 01 0 1

]

, (2.12)

where each row defines a parity check equation over modulo-2 summation of theinput bits. Hence the syndrome bits are calculated as c1 = x1 ⊕x2 and c2 = x1 ⊕x3.For the encoding of an X sequence 101 is done by calculating c1c2 pairs c1 = 1 ⊕ 0c2 = 1 ⊕ 1, hence 10. The decoder has access to the side information Y and thesyndrome index of X, and it decodes the most probable sequence of X. For areceived Y sequence, the decoder verifies if both of the check equations satisfy, andby changing at most 1 bit of Y , X is estimated. For instance, for Y = 100 and the

- 46 -

2.3. Related Works

syndrome 10, the decoder verifies whether the checks are satisfied: c1 = y1 ⊕ y2?= 1

and c2 = y1⊕y3?= 0. Since only the c2 is not satisfied, flipping the value of the third

bit of Y is enough to satisfy both of the equations, so X = 101 is estimated withoutany error.

In Pradhan and Ramchandran (2000), a practical code design for the setup{0011} of slepian-wolf problem has been proposed using two convolutional codes forcompressing X and Y separately, hence it can operate not only on the symmetricregions like the setup {0001} or {0010}, but also the intermediate rate regions as inthe setup {0011}.

2.3.1.2 Turbo Codes

Afterward, more powerful channel coding techniques were employed for the cosetconsctruction. First of all, Turbo code, which is invented by Berrou et al (1993)and improved by Benedetto et al (1998); Tepe and Anderson (1998); Berrou

and Glavieux (1996), has been applied to the DSC problem by Garcia-Frias andZhao (2001). Bajcsy and Mitran (2001a) have used the parallel concatenationof finite state machines using Latin squares proposed in (Bajcsy and Mitran,2001b). Aaron and Girod (2002) have used two parallel 4/5 rate systematicconvolutional code with an interleaver and transmit the parity bits to obtain a 2 : 1rate compression. After the calculation of the likelihood ratios of the input-bits giventhe side information Y and the parity bits, the estimation of X is done with aniterative manner by using MAP algorithm (See Figure-2.9). For a 2:1 compressionrate, Aaron and Girod (2002) achieves lossless compression with a correlationnoise entropy H(p1) ≤ 0.381 which corresponds a gap of 0.154 with respect to theS-W limit.

Garcia-Frias and Zhao (2002) have employed the puncturing concept in Acikel

and Ryan (1997). Several other systems have been proposed using turbo codes(Chou et al, 2003; Liveris et al, 2002b, 2003b). Lajnef (2006) proposed aturbo coding based on puncturing, while he has obtained a 2 : 1 rate compression byusing two 2/3 rate parallel systematic convolutional codes with an interleaver. Theoverall system has n/2 + n/2 parity bits. By using a puncturing matrix, half of theparity bits are dropped and a compression rate 2 : 1 is obtained. By using the iter-ative SISO decoding, the system achieves a lossless transmission with a correlationnoise entropy H(p1) ≤ 0.4233 which is 0.0767 far away from the S-W limit.

2.3.1.3 LDPC Codes

As described in detail in Chapter-1.10, LDPC code is a powerful error correcting codewhich is invented by Gallager (1963), reinvented and improved by Mackay andNeal (1997); Richardson et al (2001); Chung et al (2001a). Because of itsgood distance properties, Liveris et al (2002a) first used the LDPC codes in DSC

- 47 -


Figure 2.9: 2 : 1 rate DSC compression code design using two systematic 4/5 convo-lutional codes with an interleaver and iterative MAP decoding. Blocks π correspondto a pseudo-random interleaver, and the block π−1 is the corresponding deinterleaver.For the Log-Likelihood Ratio(LLR) calculations log(p(x=1|y)

p(x=0|y)), the correlation noiselevel and the received side information Y is used. An iterative decoding is done usingSoft-Input Soft-Output (SISO) decoder.

field (See Figure-2.11). By using 2/3 rate irregular LDPC codes with long blocklengths like 106, and a compression rate of 2 : 1, Liveris et al (2002a) achievelossless transmission with a correlation noise entropy H(p1) ≤ 0.466 which is 0.034far away from the S-W limit, so far the best probability of error rates obtained for agiven correlation noise in the literature.

Then LDPC is used in Schonberg et al (2002) for coding the general Slepian-Wolf problem ({0011}) by replacing the convolutional codes of Pradhan and Ram-

chandran (2000) by LDPC codes.

In Varodayan et al (2005, 2006) also proposed a S-W coding scheme basedon LDPC Accumulate (LDPCA) and Sum LDPC Accumulate (SLDPCA) codes.

2.3.2 Code Design for Wyner-Ziv Coding

Since for the gaussian input case, Wyner-Ziv theorem states out that there is noloss in rate whether the side information is accessible to the encoder or not, theresearchers have focused their effort on the design of DSC codes close to the S-Wlimit. The state of art practical designs assume the wyner ziv problem as the con-catenation of a good source code (quantization) which achieves good rate distortionperformance, and a S-W coder, which achieves lossless compression with side infor-mation(See Figure-2.13). The input X is first quantized by a good source code such

- 48 -

2.3. Related Works

Figure 2.10: 2 : 1 rate DSC compression code design using two systematic 2/3 rateparallel concatenation convolution codes and 1/2 rate puncturing matrices P .

Figure 2.11: 2 : 1 rate DSC compression using a systematic 2/3 rate LDPC code.

as TCQ (Marcellin and Ficher, 1990), nested lattices as in Zamir and Shamai

(1998) or lloyd-max based quantizer as in Rebollo-Monedero and Girod (2005).Then, the quantized stream is coded with a S-W lossless coder which is based ona systematic turbo code (Aaron et al, 2003) or a systematic LDPC code (Liu

et al, 2006). In Pradhan and Ramchandran (1999), an 8-level max-lloyd quan-tization of the input source where the outputs are labelled into 4 subsets D0, D1,D2 and D3 (See Figure-2.12), their convolutional code based S-W coder described inChapter-2.3.1.1 is performed 7 dB away from the 1 bit/sample Wyner-Ziv distortionlimits for a Correlation-Signal to Noise Ratio (C-SNR) level of 12 dB between X andY .

Furthermore, the S-W and W-Z problems are extended to three sources in Liv-

eris et al (2003a) and Lajnef et al (2006).

Moreover the S-W and W-Z coding paradigms are applied on video coding (Puri

- 49 -


Figure 2.12: Eight output points and corresponding partitions for 4 subset.

and Ramchandran, 2002; Puri et al, 2006; Girod et al, 2005; Aaron et al,2003; Westerlaken et al, 2005; Liveris et al, 2002b), on sensor networks(Xiong et al, 2004; Pradhan and Ramchandran, 2000; Kusuma et al, 2001;Pradhan et al, 2002),multiple description coding (Stankovic et al, 2007) andon multiple-camera arrays (Zhu et al, 2003; Gehrigand and Dragotti, 2004).

Figure 2.13: Wyner Ziv Coding as a concatenation of a good quantization code anda Slepian-Wolf Coder.

2.4 Practical Code Design

In this section, we describe our proposed LDPC based S-W coding scheme in detail.Based on LDPC coding for the syndrome calculations, we used a modified product-sum algorithm (or belief propagation) for the decoding.

2.4.1 Input Constraints and Theoretical Correlation Noise Analysisfor a Given Rate

Let Xn = {X1, X2, ..Xn} be the sequence of n-length binary string with i.i.d. randomvariables Pr{Xi = 1} = Pr{Xi = 0} = 0.5 noncausally available to the encoder.Similarly Un = {U1, U2, ..Un} be the sequence of n-length binary string with i.i.d.random variables Pr{Ui = 1} = (1 − Pr{Ui = 0}) = p1 where 0 ≤ p1 ≤ 0.5 Theside information Y n = {Y1, Y2, ..Yn} noncausally available to the decoder is modeledas Yi = Xi ⊕ Ui where ⊕ is the modulo-2 sum operation. The entropy of X andY can be calculated as H(X) = H(Y ) = H(0.5) = 0.5 log2(2) + 0.5 log2(2) = 1bit per channel use. Now the conditional entropy of X given Y is found to beH(X|Y ) = H(Z) = H(p1) = p1 log2(1/p1) + (1− p1) log2(1/(1− p1) bit per channeluse. Hence, for instance for a fixed compression rate of Xn as 1/2 bit per channel

- 50 -

2.4. Practical Code Design

use which corresponds to n/2 bits, S-W theorem states that Pr{X 6= X} = 0 for thecorrelation probability p1 such that H(X|Y ) = H(p1) ≤ 1/2. In our experiments, wefix the compression rate to 1/2 bit per channel use and find the maximum correlationlevel p1 for an arbitrarily small probability of error like Pr{X 6= X} = 10−5.

2.4.2 LDPC Code Generation and Coset Index Calculation

For these experiments, we generate the LDPC matrices using the degree polyno-mials found and distributed by the Communications Theory Lab (LTHC) at EcolePolytechnique Fédérale de Lausanne (EPFL) Amraoui et al (2003). The degreedistribution polynomials used in this dissertation can be found in Appendix-B.

In order to obtain a 2 : 1 compression rate of X, we use 2/3 rate systematicLDPC codes where for n input bit, n/2 parity check bits are calculated (See Figure-2.14). The encoder discards the systematic bits and transmit only the n/2 paritybits. The decoder calculates the likelihood ratios and runs a modified Sum-ProductAlgorithm as explained in the following section.

Figure 2.14: Our proposed 2 : 1 rate DSC compression code design using LDPCcodes.

2.4.3 Modified Sum Product Algorithm

The classical LDPC decoding is done by belief propagation or known as Sum-ProductAlgorithm as described in Chapter-1.10. The algorithm is designed for a channelwhere each transmitted data is exposed to the same channel characteristics. However,in S-W coding using syndromes all the received data do not exposed to the same

- 51 -


channel. While there is a correlation noise between X and Y ; the syndrome of X sentby the encoder does not contain any error. Hence we modify the decoding algorithmfor S-W starting by the likelihood ratio calculations.

For a rate 2/3 systematic LDPC code, n-bit input is coded as a total of 3n/2-bit where n bits are the systematic input bits, the rest of n/2 bits are the paritybits z which satisfies the equation 0 = H · [R]t, where 3n/2-length vector R ={z1, z2, .., zn/2, X1, X2, .., Xn}. The decoder receives the syndrome vector z and theside information Y = X⊕U as explained in Chapter-2.4.1. You can see in Figure-2.143n/2 variable nodes corresponds to the circles at the left-hand side of the decoder.We group the variable nodes into two: the check bits in blue and the systematic bitsin pink. Since the check bits do not exposed to error, we initialized the likelihoodratios of the blue circles as:

p(Ri = 1|zi)

p(Ri = 0|zi)=

{

∞, if zi = 10, if zi = 0

(2.13)

for i = 1, 2, .., n/2. The likelihood ratios of the systematic bits are calculated fori = n/2 + 1, n/2 + 2, .., 3n/2 as

p(Ri = 1|Yi−n/2)

p(Ri = 0|Yi−n/2)=

{

(1−p1)p1

, if Yi−n/2 = 1p1

(1−p1) , if Yi−n/2 = 0(2.14)

The next step is the modifications on the definitions given in Chapter-1.10.1.1 bygrouping the variable bits as systematic variable bits and the parity variable bits. Theset N (m) which signifies the set of variable bits that participates the checksummingof m is divided into two subsets N1(m) and N2(m) as the set of systematic bits andparity bits that participate the check m respectively. N (m) = N1(m) ∪N2(m).

Moreover rxmn is redefined for calculation of the probability of check m satisfied

only for the systematic bit n if the systematic bit n of X is considered fixed at Xn

and the other systematic bits qmn′ : n′ ∈ N1(m)\n.

The check node iteration δrmn in Equation-1.39 is modified as

δrmn = (−1)∑

i∈N2(m) Ri∏

n′∈N1(m)\n

δqmn′ . (2.15)

The variable node iteration equations are the same as mentioned in Equation-1.40and Equation-1.41, and calculated only for the systematic variable nodes.

Similarly, the final guess step is calculated for the systematic variable nodes.The decoding starts with the initialization of q0

mn and q0mn values by using p(Ri=1|zi)

p(Ri=0|zi)

andp(Ri=1|Yi−n/2)

p(Ri=0|Yi−n/2) calculated as in Equations-2.13 and 2.14. Then the check-node

and variable node iterations will be repeated until a valid codeword is found or themaximum number iteration is reached. Finally the decoder calculates

- 52 -

2.4. Practical Code Design

q0n = αnp(xn = 0|[z|y])

∏

m∈M(n)

r0mn, (2.16)

andq1n = αnp(xn = 1|[z|y])

∏

m∈M(n)

r1mn; (2.17)

and outputs the estimation x by thresholding the posterior probabilities

xi = arg maxj

qji+n/2. (2.18)

The bit error rate Pe of the system then calculated as:

Pe =

∑ni=1(xi ⊕ xi)

n, (2.19)

where∑

is defined to be summation over real numbers while ⊕ is the modulo-2summation. Hence Pe is the ratio between the number of error bits to total numberof bits.

2.4.4 Experimental Setup and Performance Analysis

In this section, we compare our proposed S-W system with respect to the existingones. We code the input in several block-lengths varying from 4 × 103 to 1 × 105.Please note that LDPC decoding process has better performance for higher blocklengths, however the decoding complexity of the decoder increases for longer blocklengths. So there is a trade off between the performance and the decoding complexity.

The length n input binary string X which is drawn from a bernoulli(0.5) distri-bution is coded with 2/3 systematic LDPC code generated with a degree distributionas in Appendix-B. The noise binary string U which is drawn from a bernoulli(p1) ismodulo-2 added to the X to create Y = X ⊕U , where 0 ≤ H(p1) ≤ 0.5. Recall fromthe S-W theorem that X can be compressed with a rate RX ≥ H(X|Y ) = H(p1). Inour experiments, we fix RX = 0.5 and search the maximum error noise p1 that thedecoder can extract X with a low probability of error (Pe(X 6= X) → 105). Pleasenote that according to the S-W theorem, the theoretical limit of the p1 = 0.11 whereit corresponds to have an entropy of H(p1) = 0.5.

The simulation results can be seen in Figure-2.15. The best performance ofconvolutional code, turbo code, punctured turbo code and irregular LDPC codepublished are given as H(p1) = 0.35, H(p1) = 0.39, H(p1) = 0.42 respectively(Aaron and Girod, 2002; Lajnef, 2006; Liveris et al, 2002a). Our length 4000and length 104 regular LDPC codes perform with a low probability of decoding errorat H(p1) = 0.36 and H(p1) = 0.37 which reside between the best convolutional codeand the best turbo code. Our irregular 104 achieves H(p1) = 0.42 and has a similarperformance with respect to the best punctured turbo code performance in Lajnef

et al (2006). Xiong et al (2004) have achieved a better performance for a blocklength of 105 with a higher complexity of decoding.

- 53 -


Figure 2.15: Decoding bit error rate versus entropy rate of the correlation noisepower H(p1) graph for 2 : 1 rate Slepian Wolf compression comparison. The sim-ulations for LDPC is made for input length 4000 length regular LDPC matrix and104 length irregular LDPC matrix. The graph also contains the S-W limit, the bestperformances achieved using convolutional code (Aaron and Girod, 2002), punc-tured turbo code (Lajnef, 2006), and irregular LDPC with length 105 (Liveris

et al, 2002a).

2.5 Practical Application for Still-Image Coding

In this section, we propose a compression scheme for still images, by exploiting thetheory of Distributed Coding of correlated multi-sources. Two corrupted versionsof an image are encoded separately but decoded jointly (Dikici et al, 2006a).Our approach has two main results: i) use of decomposition of low-pass waveletcoefficients for creating the Side Information, and ii) LDPC based coset creation usingthe quantized version of the original image in pixel domain. In the case of coding formobile terminals, the proposed codec exploits the channel coding principles in orderto have a simple encoder with a low transmission rate and high PSNR.

The application of distributed source coding techniques on still images are nottrivial, because the image should be divided into two sources X1, X2 which will beencoded separately. One of the solution of that problem can be sub-sampling theimage in two images (Ozonat, 2000). However, we are interested in distributedimage compression given that a compressed version of that image is accessible at

- 54 -

2.5. Practical Application for Still-Image Coding

the decoder (See Figure-2.16). For instance Low-frequency component of the imageis accessible to the decoder as a side information and a low-power device wants toimprove the quality of this side information by using low-complexity coding tech-niques. We introduce an efficient distributed coding technique for still images, usinglow-pass discrete wavelet transformation as the side information and LDPC codingas the mapping of the cosets. In our setup, the low-pass component of the discretewavelet composition of the image is assumed to be accessible to the decoder as sideinformation X2. For X1, a uniform quantized version of the original image is used.Instead of classical source encoding of X1, after an LDPC coding, the coset indexof X1 is sent to the decoder. The decoder finds the value of the syndrome that isclosest to X2.

Figure 2.16: Encoder and decoder structure. The source is compressed using LDPCbinning, the side information Y available to the decoder is the image reconstructedfrom low frequency (LL2) wavelet composition, and joint decoding of the two receivedsignal.

We will explain the extraction of side information (Section-2.5.1), the coset calcu-lation using quantization and LDPC coding (Section-2.5.1.1), iterative joint decoding(Section-2.5.1) and give our experimental results (Section-2.5.1).

2.5.1 Side Information

Side Information is known as the information available to the decoder which is cor-related with the original signal X, and it will be used at the decoder in order toestimate X with the help of received coset index. We use the following assumptionfor the side information:

Let X(M, N) be an M×N gray level image matrix which has integer pixel valueswithin the range of 0 and 255. The image X is decomposed into its 2-level waveletcoefficients employing the 5/3 tap filter set in Le Gall and Tabatabai (2000). Theside information image Y is reconstructed from the synthesis of DWT using only lowpass component LL2 and setting the rest of the coefficients to 0. The visualizationof the SI that is computed in the encoder can be seen in Figure-2.17.

- 55 -


The correlation noise between the original image X, and the side information Ycan be modeled as a laplacian distribution f(X, Y ) = α

2 e−α|X−Y | where α can beestimated at the encoder using the residual error between the LL wavelet decompo-sition of the first level and that of second level. We observed that using the estimateof α instead of calculating the real distance values does not significantly degrade theperformance of the system.

2.5.1.1 Coset Creation

The image X is quantized with n-bit uniform quantizer and the quantized bits arecoded with 2/3 rate LDPC coder as explained in Section-2.4.2. After discarding thesystematic output bits of the LDPC coder, only the parity bits (coset index) z aresent to the decoder.

2.5.1.2 Iterative Decoding

After the assumption of the correlation noise between the side information Y andthe quantized signal Xq have a laplacian distribution with variance 2/α2, by utilizingthe appropriate correlation noise variance α between quantized image Xq and theside information Y , the likelihood function p(Xq|Y = y) is calculated at the decoderfor initializing the LDPC belief propagation decoding. Then, modified sum-productalgorithm is employed as explained in Chapter-2.4.3.

2.5.2 Experimental Results

Figure 2.17: Construction of the Side In-formation. The Low-Low wavelet compo-sition of the second level is transmittedonly. Decoder reconstructs the side infor-mation by setting all other coefficients to0.

The proposed algorithm is applied onthe image ’Lena’. In our experimen-tal setup, we examine the effects of thequantization and calculate the rate dis-tortion operating points. The imageLena is processed as the following steps:

• The input image is first linearlyquantized at 256, 128, 64, 32 and16 levels respectively. Please recallthat 256 level quantization corre-sponds to lossless quantization be-cause of the input image pixels are8 bits depth.

• The quantized bits are coded with2/3 systematic LDPC coder whichis generated pseudo-randomly withan appropriate length n.

- 56 -

2.6. Conclusion

• The decoder has access to the side information Y reconstructed from LL2DWT coefficients.

The effects of the decoding iteration can be shown in Figure-2.18. In this figure,a 130 × 160 pixel subset of the outputs with a compression rate 16 : 5 are given.The leftmost picture is the side information that is reconstructed such that all of thewavelet coefficients except the ones that are LL received by the encoder are set to 0.The decoding of the cosets after first iteration in the center, and the rightmost oneis the output of the decision based on the decoding after 5 iterations. The qualityimprovement on the edges such as the face, shoulder and hair region can be seen.Moreover the PSNR values of these three images are 28.9 dB 34.16 dB and 34.97 dBrespectively.

Figure 2.18: Left: Side Info at the receiver; Center: First iteration output of thedecoded image; Right: decoding output after 5 iterations.

2.6 Conclusion

This chapter has proposed a close to limit Slepian-Wolf lossless compression of aninput source while a correlated side information is accessible only to the decoder.The parity bits of a systematic 2/3 rate LDPC code is used for achieving a 2 : 1rate compression of the input. For the binary symmetric case, a correlation noiseentropy of H(pz) = 0.39 and H(pz) = 0.42 is achieved using regular length 4000and irregular length 104 LDPC matrix respectively. Since the Slepian-Wolf limits of2 : 1 compression rate for BSC channel corresponds to a correlation noise entropy of0.5, the proposed system operates 0.08 bit per channel use away from the theoreticallimits. Furthermore this study shows the feasibility of such a multi-source codingscheme for still images in which the Low-pass wavelet coefficients and LDPC binningof the image are encoded separately, and decoded jointly at the receiver.

- 57 -


- 58 -

Chapter 3

Informed Data Hiding

Contents

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.1.1 Types of watermark . . . . . . . . . . . . . . . . . . . . . . 60

3.1.2 Types of attack models . . . . . . . . . . . . . . . . . . . . 61

3.1.3 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 62


3.3 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 Proposed Scheme-1: Extension to Cox Miller . . . . . . 68

3.4.1 Embedding on Discrete Wavelet Transform Coefficients . . 69

3.4.2 Perceptual Shaping for DWT . . . . . . . . . . . . . . . . . 70

3.4.3 Attack Channel . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 73

3.5 Proposed Scheme-2: Superposition Coding . . . . . . . . 73

3.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.5.2 Code Construction . . . . . . . . . . . . . . . . . . . . . . . 74

3.5.2.1 Source Code C0 . . . . . . . . . . . . . . . . . . . . 74

3.5.2.2 Channel Code C1 . . . . . . . . . . . . . . . . . . . 75

3.5.3 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.5.4 Gaussian Attack Channel . . . . . . . . . . . . . . . . . . . 76

3.5.5 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.5.6 Details of Joint Iterative Decoding C0 and C1 . . . . . . . . 77

3.5.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 78

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

- 59 -

Chapter 3. Informed Data Hiding

We address the data hiding problem where the host signal is not accessible to thedecoder. Exploiting the theorem of Costa such that the non-availability of the hostsignal to the decoder does not affect the capacity, we proposed two practical codedesigns for informed data hiding problem. The first one is for embedding low-ratemessage within DWT coefficients of still images using trellis coded modulation. Withthe conjunction of perceptual shaping function during the embedding process, robust-ness of this method for several types of attacks are given. The second code designis for embedding high rate message within continuous signals using a combination ofgood source code (TCQ) and good channel code (LDPC). After AWGN attack chan-nel, the receiver decodes the hidden message by BCJR and belief propagation decodingwith an iterative manner. For 1/2 data embedding rate, the hidden message can beextracted with a low decoding error like Pe ≤ 10−5, even after an attack channelvariance which is 1.5 dB away from the theoretical limits.

3.1 Introduction

In this chapter, we compare the existing informed data hiding techniques and proposea high embedding rate informed data hiding method with a blind detection in orderto use our complete system which will be explained in Chapter-5. Before passinginto the details of the theoretical and implementation issues of informed data hiding,we will define the basic notions of watermarking systems and will explain where ourwork fits in.

Humans have been interested in hiding a information (message) within an in-nocent host signal (cover) since medieval time (Hartung and Kutter, 1999).This hiding process is named differently depending on the application. For instance,Steganography , origin from a Greek word means “covert communication”, stands fora point-to-point secret communication that is not known by the third parties. Hencethe secret information need not to be robust after the manipulations. Watermark-ing , on the other hand, must satisfy the desideratum of robustness to the maliciousattacks. Even the third parties know the existence of the mark, it is hard to removethe hidden message. To meet the robustness issue, the information embedding ratein watermarking is much less than that of steganography. Data Hiding or Data Em-bedding resides between the steganography and watermarking, and the third partiesknow that there exists a message that is embedded to the signal, but there is no needto protect it. The idea is to embed a complementary information to the host data.

3.1.1 Types of watermark

The watermarking process can be grouped as robust or fragile. In robust watermark-ing, the mark can be detectable even after a severe processing. The attacker goal isto make the detector unable to detect the mark while keeping a perceptual quality.The example applications can be inserting a mark for detecting illegal use of a copy,

- 60 -

3.1. Introduction

or to find the distributer of the illegal copy by inserting a special message for eachcopy which is known as fingerprint. On the other hand, fragile watermarking is usedfor authentication, that is control of tampering. It can be used for DVD detectorsto authenticate the data by a detector, and with a small variations of the signal, thedetector has to be failed. The third parties want to change the watermarked datawhile the detector still can extract the message or creating a valid work for a newdata.

3.1.2 Types of attack models

Because there exists different types of watermarking applications, the malicious at-tacks also vary. The overall system has been modeled as the game between thewatermarker and the attacker (Moulin and Mihcak, 2004). With the amount ofknowledge of the watermarker about his attacker, the watermarker tries to maximizehis capacity of the embedding rate while the attacker tries to minimize it. Thisgame-theoretical approach is used as a tool for calculating the capacity for a worstcase attack.

Below, we give several possible assumptions about the attacker (Craver et al,1998):

• Attacker knows nothing.

• Attacker knows the algorithm. This is the most widely used assumption onwatermarking. The security depends on the key, and not on the algorithm. Thisassumption is related with the Kerckhoffs law in cryptography which states thata cryptosystem should be secure even if everything about the system, exceptthe key is publicly available (Kerckhoffs, 1883).

• Attacker has access to several watermarked data (Collusion attack). In thismodel, the access can be different host signals coded with the same mark orthe same host coded with different marks (Stone, 1996).

• Attacker has access to the detector like a black box (Oracle attack). Severalattacks can be applied in order to remove the mark (Gradient descend attack,sensitivity analysis attack. etc.).

There exists various types of attacks, mainly classified into four in Hartung

et al (1999) as :

• Simple Attacks are attacks that adds a noise on whole watermarked data with-out trying to identify and isolation of the mark. Some examples of this type are:linear or non-linear filtering, compression, addition of noise and quantification.

• Synchronization Attacks are attacks that attempt to disable the detection ofthe mark by geometric distortion, spatio-temporal shifts, zooming, rotation,cropping (Petitcolas, 2000).

- 61 -


• Fake Watermark Attacks are attacks that try to confuse the decoder by pro-ducing fake original data or fake watermarked data (Holliman and Memon,2000).

• Removal Attacks attempt to analyze the watermarked data, estimate and re-move the mark from the host data. Examples are denoising and collusionattacks (Stone, 1996).

Softwares such as Stirmark (Kuhn and Petitcolas, 2000) and Checkmark(Pereira, 2001) are publicly available for simulating various kinds of attacks onstill images.

In this chapter, we solely focused on blind watermarking problem, where thecover data or image S, in which the hidden information M will be embedded, isaccessible only to the encoder but not to the decoder (See Figure-3.1). Since theoriginal cover image is not accessible to the receiver, the decoding process is calledblind decoding. We analyze this problem in information theoretical way. Afterintroducing the prior work in this field, we propose two coding schemes, the first oneis for low embedding rate of data on images. We use Discrete Wavelet Transformcoefficients of the host image for robust embedding. Up to 1000 bits can be efficientlyembedded into 256 × 256 images with an acceptable perceptual coding. Our secondwork is a high rate informed data hiding rather then a digital watermarking, becauseof our assumptions of continuous input and AWGN channel fits better on an IDHsystem. In this work, we embed the secret data with a rate 1/2 per host sample, andit performs close to theoretical limits on low SNR embedding regimes facing AWGNattacks.

Figure 3.1: Channel Coding with State Information Setup.



M Discrete Message to be transmitted (Watermark).M Alphabet of Watermark.

M Decoded Watermark.

- 62 -


Pe Probability of decoding error.S State information.X Stegotexts.W Watermarked data.Z Attack noise.Y Received signal.U Auxiliary variable.α A constant for coding with side information.D Distortion level.P, Q, N Variances of X, S and Z respectively.C0 Source code.C1 Channel code.


A simple watermark text “Art&Flowers” is inserted to the cover image which canbe easily seen by human eye in Figure3.2. A malicious user can easily remove thiswatermark and can be use for his own purpose without a legal permission. Actuallyin this example the watermark data is independent of the picture itself, such thatsome of the inserted watermark resides within white background which can be seen byhuman eye. Hence the watermarking process needs to satisfy three main constraints:

• Insertion strength to guarantee imperceptibility,

• Robustness to malicious attacks,

• Capacity to accommodate the secret message.

Figure 3.2: Watermarked image.

Embedding as a function of the host data and of the secret message is referred asinformed embedding, because of the participation of the host data in the embedding

- 63 -


process (Cox and Miller, 2002). The watermarking problem has been firstlyrealized as channel coding with side information (See Figure-3.3) by Chen andWornell (1998).

Gel’fand and Pinsker (1980) derived the capacity formula for a class of discretechannels {X , p(y|x, s),Y,S} with a noncausal state Sn = {S1, S2.., Sn}, Si i.i.d. ∼p(s). A discrete message M with finite cardinality |M| where each possible values areequally probable is encoded with a deterministic function f : M×Sn → X n whereit satisfies a distortion measure E{d(f(M, Sn), 0)} ≤ P , then transmitted throughthe channel with conditional probability function p(y|x, s). The decoding functiong : Yn → M estimates M . Then the average probability of error is

Pe =1

|M|

|M|∑

k=1

Pr {g(Y n) 6= k|M = k} → 0 as n → ∞. (3.1)

The greatest integer 2nC less then or equal to |M| can be sent per channel use. Thenthe supremum of the rates C is defined to be the capacity of the channel and can becalculated as:

C = maxp(x,u|s)

[I(U ; Y ) − I(U ; S)], (3.2)

where U is the auxiliary random variable with finite cardinality. The maximizationis over p(x, u|s).

The Gel’fand Pinsker setup has been extended to the continuous alphabet gaus-sian channel in Costa (1983). According to Costa’s setup (as seen in Figure-3.3(a)),a message M drawn from discrete finite set M is sent through a channel with a lim-ited power signal X : 1/nE{(X)2} ≤ P , and the channel output is modeled asY = X + S + Z where S is an interference signal known by the encoder drawnfrom ∼ N (0, Q) and Z is a noise component drawn from ∼ N (0, Z). The aim isto find the theoretical upper bounds of the quantity of the secret information Mthat can be transmitted through this channel with probability of decoding errorP (M 6= M) → 0. Surprisingly, Costa shows that the capacity of this channel isindependent of the interference signal S, and equals to

R =1

2ln

(

1 +P

N

)

, (3.3)

which also equals to the rate in the case where S is accessible to both the encoderand the decoder. The key point on achieving this rate without the accessibility ofS to the decoder is to use an auxiliary random variable U such that U = X + αSwhere α is a constant α = P/(P + N). For sending the index M , the encodersearches within possible Us for that message M such that the difference between Uand scaled interference αS satisfies the power constraint 1/n(U − αS)2 ≤ P . Thenit sends X = U −αS on the channel. The channel outputs Y = X + S + Z, and the

- 64 -


decoder finds the closest U to Y and estimates M as the index that U resides in. Amore detailed derivation of the capacity of Costa can be found in Chapter-4.

(a) Costa’s writing on dirty paper setup.

(b) Costa’s setup applying to watermarking problem.

Figure 3.3: Costa setup.

Cox et al (1999) have realized as in Figure-3.3(b) that if the channel state Sis assumed to be the host signal of the watermark, the work is defined to be S + X,and Z is defined to be the attack noise; then this blind watermarking problem can bemodeled as Costa’s “writing on dirty paper”, which means that even if the originalhost data is not accessible to the decoder, there is no loss on the capacity of thechannel.

Costa’s work has been extended to arbitrarily distributed interference by (Co-

hen and Lapidoth, 2002; Erez et al, 2005). The theoretical limits of watermarksystems have been studied in Moulin and O’Sullivan (2003); Chen and Wor-

nell (1999) with taking the notion of privacy of the watermark with a key intoaccount.

- 65 -


3.3 Prior Work

Costa’s research provides a theoretical solution using random binning argument butthis solution could not be implemented practically because of the complexity issue.

Quantization Index Modulation (QIM) proposed by Chen and Wornell (1998)use lattice codes, where the message to be embedded divides the lattice into sub-lattices, and given the host signal, the aim is to quantize it using the proper sub-lattice. They improved QIM using the Costa’s approach and named it DistortionConstrained QIM (DC-QIM) (Chen and Wornell, 2001). This system had beena superior performance comparing to the spread-spectrum techniques, however thedrawback of this system is when the embedding rate is high, it is hard to efficientlysub-divide the quantization lattice.

Chou et al (2000) have applied error correcting codes (ECC) for the codingconcept. They used the Distributed Coding Concept explained in Chapter-2 andthe duality between DSC and IDH (Cover and Chiang, 2002; Pradhan et al,2003). A trellis based convolutional code has been used in order to partition thespace. Le Guelvouit (2005) has proposed a system based on Turbo TCQ with themessage forces the trellis to pass through certain path. Bastug and Sankur (2004)have proposed LDPC codes to improve the quality of payload of the watermark.

Afterward, the combination of the good quantizer codes with the good channelcodes has been proposed by several researchers.

Eggers et al (2003) have proposed a system called “Scalar Costa Scheme”,which is similar to QIM but differs from it by taking the watermark noise ratio(WNR) into account. At the encoding process, the Costa’s α = P/(P +N) has beenemployed for a better performance while QIM supposed an infinite length codinghence fixing α = 1.

Miller et al (2004) have developed an informed coding by guaranteeing arobustness level. A modified trellis path is utilized in order to find the best embeddingnoise correlated with the host signal. The coding process can be explained briefly infour steps.

• Choice of the embedding region of the host signal:

Discrete Cosine Transform (DCT) coefficients of the host image is calculated foreach 8×8 blocks. Discarding the DC coefficient, the first twelve AC coefficientsare selected as seen in Figure-3.4(a).

• Informed Coding:

A trellis with a length equal to the number of bits to be sent is created, anddepending on the message bits M , all the arcs except the corresponding messageM is deleted from the trellis. Using the selected DCT coefficients and a pseudo-random key selected commonly by the encoder decoder pair, the signal whichis mostly correlated with the host image S is found.

- 66 -

3.3. Prior Work

(a) 12 selected DCT coefficientsto modify for embedding pro-cess.

(b) Geometric interpretation of theembedding process. The aim is tomove the host image S to the closestpoint within the target region that cor-responds to the index M to be sent.

Figure 3.4: Informed embedding of Miller et al. on DCT coefficients of still images.

• Embedding with Perceptual Shaping:

In the embedding process, the cover image S is needed to be modified such thatthe decoder can decode the correct embedded bits with high probability. Theprocess can be interpreted geometrically as in Figure-3.4(b). In this VoronoriDiagram, the space is divided into five different regions, each correspondingto a message index. The message M to be sent is called as g (good index)while the other regions as b index. Assume that the host signal S resides inthe region b1, and embedding process modifies this image such that it falls intothe good region which satisfies perceptual quality and a robustness. Watson’smetric is used for the modification of the DCT coefficients (Watson, 1993),while the work image W must be decoded correctly under a fix distortion level.

Nested lattice code has been proposed by Zamir et al (2002) where there existstwo codes, one source code Λ1 and one channel code Λ2 such that the codewords ofΛ2 can be subset of the codes of Λ1: Λ1 ⊃ Λ2. However it is hard to generate nestedlattice codes with both good distance properties. Then Bennatan et al (2006)proposed a coding method using Superposition of good source code C0 with a goodsource C1. Exploiting the duality between Multiple Access Channel (MAC) andWriting on Dirty paper, they obtained 1.2 dB away performance for 1/4 embeddingrate using joint TCQ and LDPC coding.

- 67 -


3.4 Proposed Scheme-1: Extension to Cox Miller

The algorithm of Miller et al (2004) suffers from the block visual artifacts be-cause of the modification of DCT coefficients. Even after a perceptual shaping usingthe Watson’s algorithm (Watson, 1993), the effect of embedding can be detectableeasily. In this section, we propose an informed embedding and coding technique sim-ilar to Miller et al (2004), but we employed Discrete Wavelet Transform (DWT)for embedding process in order to minimize the block effects. Furthermore, a per-ceptual shaping based on DWT coefficients are applied for adjusting the embeddingstrength depending on the sensibility of the human visual system to the altering ofthe DWT coefficients.

Figure 3.5: Proposed informed embedding setup on DWT coefficients of still images.

The block diagram of the proposed system can be seen in Figure-3.4. After theextraction of the DWT coefficients, the selected ones pass through the informedcoder and embedder. The informed coder finds the most correlated signal on themodified trellis where the trellis path is fixed by the message bits M . Then, theembedder modifies the host image in the direction of the correlated signal and theoutput signal can be decoded correctly with a robustness measure. The embeddingis done by counting the perceptual effects of each coefficient into account. Detailedexplanation of the blocks can be found in the following Subsections.

- 68 -

3.4. Proposed Scheme-1: Extension to Cox Miller

3.4.1 Embedding on Discrete Wavelet Transform Coefficients

JPEG-2000 fixed two types of wavelet types in their standards, a reversible 5-3 tapLe Gall filter and irreversible 9-7 tap Cohen-Daubechies-Fauvaue filter (Marcellin

et al, 2000). Since the first one has perfectly reconstructible, we used Le Gall filterin our experiments where the low pass and high pass z transform is given as H0(z)and H1(z) respectively (Le Gall and Tabatabai, 2000):

H0(z) =1

8z(1 + z−1)2(−z − z−1 + 4), (3.4)

H1(z) =1

2z(1 + z−1)2. (3.5)

The analysis and the synthesis steps for a 2-D image based on a 1-D Le Gallfilter can be explained as follows. The composition of wavelet coefficients are basedon levels, and each level there exist four frequency components, Low-Low (LL),Low-High (LH), High-Low (HL) and High-High (HH). For each level, these fourcomponents can be calculated by using down-sampling and applying the analysisfilters in H0 and H1 in horizontal and vertical directions (See Figure-3.6(a)). TheLL component can be used then to calculate the next higher level.

(a) Analysis. (b) Synthesis.

Figure 3.6: Analysis and Synthesis steps of Le Gall DWT.

The reconstruction of the image from the DWT components can be done usingthe synthesis filters given as

g0(n) = (−1)nh1[n], (3.6)

g1(n) = (−1)nh0[n]. (3.7)

Similar to the analysis process, each component is up-sampled following with theapplication of the synthesis filters g0 and g1 both in vertical and horizontal directionsSee Figure-3.6(b)). Two level DWT coefficients of the Lena image can be visualizedas in Figure-3.7.

- 69 -


Figure 3.7: Wavelet composition of Lenaimage.

In our work, we choose LH2, HL2and HH2 components of the DWT coef-ficients for the embedding process. Theereasons is that, as it has been shown,maximum robustness is attained whenwatermarks are embedded into well-populated bands. Since the first levelcoefficients contain lots of zeros, we se-lected all of the second level coefficientsexcept LL2, the low pass one.

Moreover, for an objective com-parison with Miller’s work with DCT,we created the same length of trellis,hence the same amount of coefficientsmust participate in the embedding pro-cess. For obtaining a ration 12/64, theamount of DCT coefficients over the total number of coefficients used in Miller etal. ’s work, the combination of LH2, HL2 and HH2 levels has the same ratio of3/16.

In a first experiment, we use the same informed encoding and embedding processwithout any perceptual shaping, which means that the embedding strength on allthree level coefficients are same. As seen in Figure-3.8(a), even without any percep-tual shaping, our embedding algorithm achieves a PSNR value of 39 dB for the imageLena. Compared with the DCT domain embedding, instead of the block artifacts inDCT, the embedding noise is distributed overall of the image. The difference betweenthe host image and the coded image in wavelet domain can be seen in Figure-3.8(b).Please note that the components LH2, HL2 and HH2 are modified equally.

3.4.2 Perceptual Shaping for DWT

Watson (1993) has proposed a contrast masking method for perceptual qualityshaping based on DCT coefficients. Using a weight matrix T which contains theweight of each DCT coefficient, and the local features of the image (Low-pass com-ponent) a metric defining the effect of each DCT coefficient can be calculated. Thismetric is used for determining the perceptual shaping weights in the Miller et al. ’smethod. Moreover, the visual impact of the DWT components have been studiedin Watson et al (1997); Levický and Foriš (2004). The weights of LH andHL are same because of the calculation of these two components includes a low-passand a high-pass filter. However the third component HH has been proved to be lesssensitive to the perturbations. After our subjective tests, we defined fix weightingratios matrix T for each components, 2/7 for LH, 2/7 for HL and 2/7 for HH. Thenas in DCT case, a metric for each DWT coefficient is calculated for determining theembedding power for a better perceptual output.

- 70 -

3.4. Proposed Scheme-1: Extension to Cox Miller

(a) Coded. PSNR value of39.0005 dB.

(b) DWT coefficient differ-ences. MSE: 6.157; dmin,dmax: −23, 26

Figure 3.8: 100 bit message M is inserted into Lena image using LH2, HL2 andHH2 DWT coefficients. No perceptual shaping is applied.

You can see in Figure-3.9(a), the embedding the same amount of bits as in Figure-3.8, but with the perceptual shaping described above. Because of the perceptualshaping, the modifications are concentrated at the contours of the image (See Figure-3.9(b)). Furthermore, the insertion into the HH2 component is 1/3 more than that

(a) Coded. PSNR value of38.8 dB.

(b) DWT coefficient differ-ences. MSE: 6.21; dmin,dmax: −67, 75

Figure 3.9: The same 100 bit message M is inserted into Lena image using perceptualshaping.

into LH2 and HL2 components. Comparing with the embedding without perceptualshaping in Figure-3.8, a similar PSNR value is achieved in Figure-3.9 with errorsconcentrated at the less-sensitive DWT coefficients.

- 71 -


Another visual example of the effect of perceptual shaping can be seen in Figure-3.10. The image asia is code comparison in Figure-3.10(a).

(a) Coded with DWT embedding withoutany perceptual shaping. PSNR value of39.4 dB.

(b) DWT coefficient differents of (a) :2.95; dmin, dmax: −16, 19

(c) Coded with DWT embedding usingperceptual shaping. PSNR value of 40.237dB.

(d) DWT coefficient differents of (b) :6.157; dmin, dmax: −23, 26.

Figure 3.10: Comparison of 40 bit length M embedding process into asia image withand without perceptual shaping.

3.4.3 Attack Channel

For the attack channel, we simulate various attacks from linear filtering to compres-sion attacks using Stirmark (Petitcolas, 2000). Since the proposed embeddingmethod depends on the trellis length, hence on the image dimension, the attacks thatmodify the image dimensions can easily de-synchronizes the system. Hence we donot apply such types of attacks as cropping, geometric-distortion, affine transformand rotation. The lists of attacks that we apply on the watermarked images: JPEGcompression, convolution filtering, median filtering, additive noise, PSNR (all pixelvalues increased by the same quantity), rotation and scale, small random distortions,

- 72 -

3.5. Proposed Scheme-2: Superposition Coding

and auto-correlation.

3.4.4 Simulation Results

With the combination of coding on selected DWT coefficients and the embeddingwith the perceptual shaping, we obtain a superior image quality with respect to Milleret al. ’s work while preserving the same amount of robustness. For instance Figure-3.14 in page 80 compares the embedding outputs of 40 bit length message M intoCameraman image by using Miller’s algorithm (Figure-3.14(a)) and our algorithm(Figure-3.14(b)).

Table-3.2 in page 80 shows the performance of the perceptually shaped asia imageface to several attacks. The right column indicates the maximum attack level thatthe embedded message survives still at the decoding. Several attacked images thatthe embedded message can be still decoded correctly can be found in Figure-3.15 atthe end of this chapter (page 81).

3.5 Proposed Scheme-2: Superposition Coding

The Miller et al (2004) algorithm works quite well for certain insertion ratesuch as thousand bit per 256 × 256 image, however it could not embed for higherrates because of the insufficient coefficients to fill out the trellis. For this reason, forhigh-rate embedding systems such as 1 bit per 2 coefficients of the cover signal, wedeveloped a similar system as in Bennatan et al (2006). The coding is done bysuperposition of a good channel code C1 and a good source code C0. The receivermakes an iterative decoding between the channel code estimation and the source codeestimation. We use LDPC coder as the channel code and the TCQ as the sourcecode.

3.5.1 Definition

Assume a source code C0 quantizes a continuous input source vector x = xn1 i.i.d.

having values in the range [−A,A] with a mean square distortion 1/n∑n

i=1 x2i ≤

P . Moreover length-n channel code C1 can be constructed according to zero-meandistribution with a variance Q (the value of Q is determined with a function of Pand attack noise power N given in Section-3.5.2) where Q < P .

The superposition code is defined as C = C0 + C1, where addition being thestandard addition over the real-number field. C corresponds to the auxiliary variableU of Costa. The aim is to find the vector c, that is closest to the scaled host signalαs.

- 73 -


(a) Code C0 for time instant t. (b) Pulse Ampli-tude Modulation ofcode C1 for time in-stant t.

(c) Code C0 + C1 for time instant t.

Figure 3.11: Superposition of 2 codes.

3.5.2 Code Construction

Code constructions close to theoretical limits are proposed for C0 and C1. Here arethe detailed explanations of the two codes.

3.5.2.1 Source Code C0

C0 is designed to meet the fidelity criterion between the host signal s and the wa-termarked signal w such that 1/n

∑ni=1(si − wi)

2 ≤ P . We select the quantizationcode C0 as a Trellis Coded Quantization (TCQ) which has a 1/2 convolutional codefeedback polynomial (671, 1631) in octal digit (Please refer to Section-1.9 for moreinformation on TCQ). For an input in the range [−A,A], 6-level PAM output signals[−5A/4, −3A/4, −A/4, A/4, 3A/4, 5A/4] are used by labeling them with the 4-leveloutput of the convolutional code as [D3, D0, D1, D2, D3, D0] (See Figure-3.11(a)).The reason of distributing the 6 PAM output signals not between [−A,A] is thefact that for the side-points of the input there exists only one choice for the trellis,which leads to a performance loss (Marcellin and Ficher, 1990). Forney

and Ungerboeck (1998) have proposed several techniques including replication ofthe output signal levels. According to our simulation results, our source code C0

can quantize an input x uniformly distributed in the range [−1,1] to QC0(x) with amean-distortion of P = 0.062 where QC0(x) is the reconstruction of the quantizedvector x. The rate distortion limit is 0.0585, which can be calculated as

R(D) = H(X) − H(D) ∼ log2 2A − 1/2 log2(2πeD) (3.8)

for R = 1. Hence the C0 can able to quantize the input source with a gap of 0.19 dBfrom the theoretical limits.

- 74 -


3.5.2.2 Channel Code C1

C1 is designed to spread the secret message M to a codeword such that 1 bit of thecodeword is embedded into one sample of the host signal. Since we want to achievea 1/2 embedding rate, we design an irregular LDPC code with rate 1/2 (Please referto Section-1.10 for more information on LDPC). The input of the LDPC code is n/2bit-long message M and the output is the n bit codeword. The codeword is twolevel PAM modulated with strength −√

Q,√

Q depending on the codeword bit valueis 0 or 1 (See Figure-3.11(b)). Exploiting the duality between MAC channel anddirty paper coding, an optimum Q value can be calculated by Boutros and Caire

(2002) as:Q = αP, (3.9)

where P is the C0’s quantization MSE distortion level, α corresponds to Costa’sα = P

P+N and N is the noise variance of the attack channel.

For the LDPC coding, we generate the LDPC matrices using the degree polyno-mials found and distributed by the Communications Theory Lab (LTHC) at EcolePolytechnique Fédérale de Lausanne (EPFL) Amraoui et al (2003). The irreg-ular 1/2 rate degree distribution polynomial used in this section can be found inAppendix-B. It achieves a performance of a 0.11 dB away from the Shannon limit.

In order to visualize the superposition scheme for a time instant t, the possiblecombinations of c0,t + c1,t can be seen in Figure-3.11(c).

3.5.3 Encoder

Given the n/2 bit message M and the n sample host s = {s1, s2, .., sn}, the encodersearches the vector c = c0 + c1 that is closest to the scaled host vector αs where αis a scaling constant equal to α = P/(P + N).

The encoding process can be seen in Figure-3.12. The encoder starts with thecomputation of c1. 1/2 rate LDPC coding of the n/2 bit message M outputs n bitcodeword k composing from 0’s and 1’s. Then length-n vector c1 is found by 2-levelPAM with equation

c1,i =

{

−√Q, if ki = 0

+√

Q, if ki = 1, (3.10)

where Q is a constant scalar Q = αP . Hence the variance of the c1 vector is equalto Q.

The second step is to search the n-length c0 vector such that c0 + c1 is closestto the αs. Since our TCQ coder can quantize a vector with a variance P , αs − c1

vector is given as input to a vector quantizer. The viterbi algorithm searches the allpossible paths on the trellis to find the minimum-error sequence. The output vectorof the quantizer is assigned to c0 as

c0 = QC0(αs − c1). (3.11)

- 75 -


Figure 3.12: Embedding process of the message M into the work s using superposi-tion coding. An LDPC coding of M to find the channel code c1 is followed by TCQcoding of αs−c1 to find the source code c0. The watermarked signal c0+c1+(1−α)sis sent through the attack channel.

The superposition code c is then assigned to

c = c0 + c1 = QC0(αs − c1) + c1. (3.12)

Since the quantification code QC0 assures a quantization error limited by P , theencoder can find the embedding noise as:

x = c0 + c1 − αs. (3.13)

The watermarked signal w is then w = s + x.

3.5.4 Gaussian Attack Channel

Stego signal w is subjected to additive channel noise Z which is i.i.d. N (0, N).Hence the attack channel outputs

y = w + z = x + s + z. (3.14)

3.5.5 Decoder

The decoder searches the c0 and c1 pair such that the conditional probability

P (y|(c0 + c1)) (3.15)

is maximized. Since the encoding is done by computing c1 followed by the searchof c0, the decoding iteration first starts with the estimation of c0 and terminates

- 76 -


with the estimation of c1. The main steps of the decoding process can be seen inFigure-3.13. The receiver computes

y = αy

= αs + αx + αz

= c0 + c1 − (1 − α)x + αz (3.16)

= c0 + c1 + z (3.17)

where the Equation-3.16 follows from the Equation-3.13 and the effective noise z isdefined to be z = −(1 − α)x + αz, gaussian distributed with mean 0, and varianceσ2

z asσ2

z = (1 − α)2P + α2N = αN, (3.18)

because of the fact that α = P/(P + N).

The decoding is done between a BCJR decoder and a LDPC belief propagationdecoder in which both outputs soft decision probabilities of c0 and c1 respectively.The decoding is done with an iterative manner and the final guess is made fromP (c0) after a certain iteration or a codeword k is found.

Figure 3.13: Superposition watermarking extraction by BCJR and LDPC decodingiterations.

3.5.6 Details of Joint Iterative Decoding C0 and C1

The joint iterative decoding can be described in three steps; the two update rulesof plain-likelihood calculations, BCJR iteration and the LDPC iteration where thedetail of each step can be found below.

• Update rules of plain likelihood calculations:

- 77 -


Plain-likelihood is the ratio between the probability of possible outcomes giventhe observations. The likelihood calculations are done before starting eachBCJR or LDPC iterations in order to initiate the cost function of every path inthe trellis or the LDPC bipartite graph. There exist two likelihood calculations.The first one is the n by 4 matrix v from direction Y to BCJR decoder, andthe second one is the n by 2 matrix r from direction Y to LDPC decoder. Thei’th elements of t’th row of v, vti corresponds to the likelihood of c0,t = Di

given yt and r where Di is the i’th the output level of the TCQ coder closestto the channel output yt. Each element of v can be calculated as

vti =

∑2b=1 rtb · fσz

(yt − Di + (−1)b√

Q))∑4

i=1

∑2b=1 rtb · fσz

(yt − Di + (−1)b√

Q))(3.19)

for t = 1, 2, .., n and i = 1, 2, 3, 4, where fσzis the probability density function

of a Gaussian r.v. N (0, αN), and rt1, rt2 are the messages coming from theLDPC node iteration which signifies likelihood of t’th element of c1 is 0 or 1.At the beginning of the decoding, the LDPC decoder sends rt0 = rt1 = 1/2which means there is no prior knowledge on the c1,t.

The b’th elements of t’th row of r, rtb corresponds to the likelihood of c1,t =(b − 1) given yt and v where Di is the i’th the output level of the TCQ coderclosest to the channel output yt. Each element of r can be calculated as

rtb =

∑4i=1 vti · fσz

(yt − Di + (−1)b√

Q)∑4

i=1 vti · fσz(yt − Di −

√Q) +

∑4i=1 vti · fσz

(yt − Di +√

Q)(3.20)

for t = 1, 2, .., n and b = 1, 2. Similarly fσzis the probability density function

of a Gaussian r.v. N (0, αN), and Di is the i’th level output of the TCQ coder.

• Iteration BCJR: The branch metrics of the trellis are initialized by the re-ceived vectors r for each sample. Then a BCJR iteration is done as explainedin Section-1.9.2, and the BJCR outputs the probability P (c0|y, r), which willbe mapped to the message matrix v.

• Iteration LDPC: The variable node likelihoods v are calculated as explainedin the previous item. Then 10 LDPC iteration is executed between the variablenodes and the check nodes as explained in the Section-1.10. The LDPC decoderoutputs the likelihood probability P (c1|y,v), which will be mapped to themessage vector r.

3.5.7 Simulation Results

In our simulations we embed 105 bit M within a host signal S with length 2 · 105

i.i.d. uniformly distributed in the range [−1/α,1/α]. Since our embedding method

- 78 -

3.6. Conclusion

can achieve an MSE performance of P = 0.062, the theoretical maximum AWGNvariance can be calculated from Equation-3.3 with the values R = 1/2 and P = 0.062.Hence in theory, maximum attack variance is found to be N = 0.062.

In our experiments, starting from N = 0.062 and decreasing the N by smallamount, we search the maximum AWGN variance N , that the probability of messageerror is low enough (Pe ≤ 10−5). For each N value, we created 20 random host signaland embed a random message M with appropriate α = P/(P + N) value. After amaximum 100 decoding iteration, the error rate in the decoding M is calculated. Weachieve a decoding error rate of 3 · 10−6 for N = 0.0439 where there is a

10 log10

(

0.062

0.0439

)

= 1.5 dB (3.21)

gap from the theoretical setup.

3.6 Conclusion

In this chapter, we have proposed two informed watermarking practical code design,one for low rate data-embedding on DWT coefficients of the still images, and theother for high rate data-embedding using the superposition of a good source code(TCQ) and good channel code(LDPC) under AWGN attack channel.

In the low embedding-rate code design, up to 1000 bit message M is embeddedinto LH2, HL2 and HH2 components of DWT transform coefficients using a trelliswhere the valid trellis path is driven my the message M . Based on Watson perceptualmetric, the sensibility of each DWT coefficient of the host image is calculated andthe embedding process is done by taking this sensibility metric into account.

For the high embedding-rate code design, we use continuous alphabet syntheticstate information and embed the message M with a rate of 1/2 bit per channeluse. Embedding is done by the superposition of a good source code based on TCQand a good channel code based on LDPC codes. By using a iterative decodingalgorithm of BCJR for the source code and belief propagation for the channel code,up to an AWGN attack noise that corresponds to 1.5 dB away from the theoreticalembedding limits. This high rate embedding system can be used in conjunction witha compression system like in Chapter-2 to build a joint embedding and compressionsystem.

- 79 -


(a) Coded with Miller etal. . PSNR value of 31.5 dB.

(b) Coded with proposedmethod and perceptualshaping. PSNR value =32.2 dB.

Figure 3.14: Embedding 40 bit length payload to Cameraman image.

Table 3.2: Robustness test of the proposed algorithm for the image “asia.pgm”. 40bit message is embedded into asia image with DWT perceptual shaping. For eachattack listed below, the corresponding maximum attack that the secret message Mcan be decoded without any error.

Stirmark 4.0 asia imageAttack Type Maximum level

JPEG compression Quality factor of 12%Convolution filtering gaussian filter

Median filtering 3 × 3Additive noise 3%

Rotation and scale ±0.25◦

Auto-correlation 3PSNR by 100

- 80 -

3.6. Conclusion

(a) Convolution1 (gaussian). (b) JPEG QF=12%.

(c) Median 3 × 3. (d) Noise 3%.

Figure 3.15: Maximum level of attacked images that the secret message can be stilldecoded perfectly.

- 81 -


- 82 -

Chapter 4

Dirty Paper Coding with PartialState Information

Contents

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . 85

4.3 Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3.0.1 Case A . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3.0.2 Case B . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3.0.3 Case C . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3.0.4 Case D . . . . . . . . . . . . . . . . . . . . . . . . 91

4.3.0.5 Case E . . . . . . . . . . . . . . . . . . . . . . . . 91

4.3.0.6 Case F . . . . . . . . . . . . . . . . . . . . . . . . 92

4.4 Capacity/rate gain / loss analysis . . . . . . . . . . . . . . 92

4.4.1 For optimum values of α . . . . . . . . . . . . . . . . . . . . 92

4.4.2 For non optimum values of α . . . . . . . . . . . . . . . . . 93

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

- 83 -

Chapter 4. Dirty Paper Coding with Partial State Information

A generalization of the problem of dirty paper coding is considered in which (possiblydifferent) noisy versions of the state information, assumed to be i.i.d. Gaussianrandom variables, are available at the encoder and at the decoder. This chapterderives the maximum achievable rate formula for this general problem. This generalsetup encompasses the cases where the state information is perfectly known either atthe encoder or decoder or at both. Moreover, it generalizes the analysis to cases wherethe state information is known only partially. In addition, this chapter shows that inrealistic situations where the AWGN noise power is not known at the encoder, partialinformation at the decoder can increase the maximum achievable rate with respect toCosta’s coding setup.1

4.1 Introduction

The problem of coding for communication over a channel whose conditional probabil-ity distribution is controlled by a random state parameter finds applications in diverseareas ranging from coding for information storage in a memory with defective cells,data hiding, and coding for multiple input multiple output communication. The par-ticular case where the state is known causally at the encoder only (no channel stateinformation at the receiver) has been first considered by Shannon in 1958 (Shan-

non, 1958). In Gel’fand and Pinsker (1980), Gel’fand and Pinsker consider thechannel coding problem with non-causal state information available at the transmit-ter. In their setup, the transmitter wishes to send a message M ∈ {1, ..., |M|} over amemoryless channel defined by the transition probabilities p(y|x, s), where X and Yare the channel input and output and S is an i.i.d. random variable representing thesequence of states {S1, . . . SN} of the channel known non-causally at the encoder butunknown at the decoder. The general Gel’fand-Pinsker problem suffers from somecapacity loss when compared with channel coding with side information available atboth encoder and decoder. In Costa (1983), Costa has shown that there is no lossin capacity if the channel state is additive white Gaussian interference ("dirt"). Thedesign of codes for approaching Costa’s capacity is known as the dirty paper codingproblem. The capacity loss is derived in Zaidi and Duhamel (2005) for additivewhite Gaussian channel state S partially available at the encoder but not to the de-coder. The capacity for information storage in a memory where the channel state isperfectly available at the decoder but not to the encoder is derived in Heegard andGamal (1983). The authors in Moulin and Wang (2007) consider a generalizedGel’fand-Pinsker coding problem and derive capacity formulas, as well as randomcoding and sphere packing exponents.

In this chapter, we focus on the particular problem of dirty paper coding withcorrelated partial state information at the encoder and at the decoder. The Gel’fand-Pinsker coding problem where (possibly different) noisy versions of the channel se-quence are available at both sides has actually been first considered in Salehi (1992)

1This chapter corresponds to a paper that will be soon submitted.

- 84 -

4.2. Problem statement

for a binary input - binary output channel. The targeted application was informa-tion storage in a memory with defecting cells. Here, the problem we focus on canbe regarded as a special case of a coding problem with two-sided state informationexamined in Cover and Chiang (2002). The state information, the channel inputand output are assumed to be i.i.d. Gaussian random variables. The maximumachievable rates formulas are derived for this general problem as a function of αby expressing as in Costa (1983) U = X + αS where U is an auxiliary randomvariable. This will give us the general capacity formula for the cases where there isonly a partial or a null side information at the encoder side, while there is a perfectone at the decoder side. The analytic expressions of capacity/maximum achievablerate gains and losses with respect to Costa’s set up are given for six particular caseswith optimum and non optimum values of the α parameter. It is shown that in thegeneral case, a capacity gain or loss can be obtained in a realistic situation wherethe optimum α is not known.




M Decoded Watermark.Pe Probability of decoding error.R Communication rate.X Stegotexts.S State information.S1 Partial state information available to encoder.S2 Partial state information available to the decoder.θ, T Additive random noise of S1 and S2.Z Channel noise.Y Received signal.U Auxiliary variable.α A constant for coding with side information.N Gaussian distribution.Σ Covariance matrix.P, Q Variances of X and S respectively.L, N, K Variances of θ, T and Z respectively.

4.2 Problem statement

Consider the communication problem shown in Figure 4.1. We use the same notationas Costa (1983) throughout this chapter. An index M ∈ {1, ..., |M|} will be sent to

- 85 -


the receiver in n uses of the channel, where |M| is the greatest integer smaller thanor equal to enR, and R is the rate in nats per transmission. Let S = (S1, S2, ..., Sn)be the sequence of noncausal state of the channel for n transmissions, assumed to bea sequence of independent identically distributed (i.i.d.) N (0, QI) random variables.We consider the cases where this sequence of states is partially known to the encoderS1 = (S1,1, S1,2, ..., S1,n) and to the decoder S2 = (S2,1, S2,2, ..., S2,n) noncausallyand expressed as S1 and S2 throughout this chapter. This problem can be castinto a two-sided state information set-up close to the one considered in Cover

and Chiang (2002), where S is defined by a pair of independent and identicallydistributed (i.i.d.) correlated state information (S1, S2) available at the sender andat the receiver respectively. The state information available at the encoder and atthe decoder is expressed in terms of the channel state as S1 = S + θ and S2 = S +T ,where θ and T are i.i.d. random variables according to N (0, LI) and N (0, KI), andI is the n × n identity matrix.

Figure 4.1: Channel Coding with state informations.

Based on M and S1, the encoder sends a codeword X, which must satisfy thepower constraint (1/n)

∑ni=1 X2

i ≤ P . The channel output is given by Y = X+S+Z,where the channel noise Z is i.i.d. according to N (0, NI). Upon receipt of Y andS2, the decoder creates an estimate M(Y, S2) of the index M . Under the assumptionthat the index M is uniformly distributed over {1, .., M}, the probability of error Pe,

- 86 -

4.3. Achievable Rate

is given by

Pe =1

M

M∑

k=1

Pr{

M(Y, S2) 6= k|M = k}

. (4.1)

The general formula for the capacity of this set-up in the case of finite alphabets isgiven by Cover and Chiang (2002):

C = maxp(x,u|s1)

[I(U ; Y, S2) − I(U ; S1)], (4.2)

where the maximum is over all joint distributions of p(u)p(s1, s2, x|u)p(y|x, s1, s2),where U is an auxiliary random variable with finite cardinality. But, in our casethe alphabets are continuous and the only general capacity expression that has beenstated is Moulin and Wang (2007):

C = supp(x,u|s1)

minp(y|x,s)

[I(U ; Y, S2) − I(U ; S1)], (4.3)

So, here we will be interested in the estimation of the maximum achievable ratefor particular distributions and constructions, and see that in some cases it can beidentified with the capacity.

The perfect codes can be created as in Cover and Chiang (2002) using therandom binning argument. First, en(I(U ;Y,S2)−2ǫ) i.i.d. sequences of U are gen-erated according to distribution p(u) and each of them is indexed as U(i) wherei ∈ {1, 2, ..., en(I(U ;Y,S2)−2ǫ)} . Then these sequences are randomly distributed intoen(R−4ǫ) bins where R corresponds to the rate of the system. Given the stateS1 = S + θ and the message M ∈ {1, ..., |M|}, the encoder searches the codewordU(i) within the bin indexed by M such that the pair (U(i), S1) is jointly typical.Then it sends the corresponding X which is jointly typical with (U(i); S1). Duringthe transmission, the signal is exposed to the additive interference S and Z. Thereceiver receives Y = X + S + Z from the channel and observes the noncausal stateinformation S2 = S + T . The decoder searches for the sequence U(i) such that(U(i), Y, S2) is strongly jointly typical and assigns M as the index of the bin con-taining the sequence U(i). All possible error events go to 0 as n → ∞ Cover andChiang (2002).

4.3 Achievable Rate

We assume that X, S, Z, θ and T are random variables with respective Gaussiandistributions N (0, P I), N (0, QI), N (0, NI), N (0, LI), and N (0, KI). Hence, thejoint distribution f(X, S, Z, θ, T ) is a multivariate Gaussian ∼ N (0,Σ) where thecovariance matrix Σ is:

- 87 -


Σ =

P I 0 0 0 00 QI 0 0 00 0 NI 0 00 0 0 LI 00 0 0 0 KI

. (4.4)

We consider U = X +αS1 = X +αS +αθ where α is a parameter to be determined.

The achievable rate is then function of the parameter α and is given by R(α) =I(U ; Y, S2) − I(U ; S1), where2

R(α) =1

2ln

(

P ((P + Q + N)(Q + K) − Q2)

PQK(1 − α)2 + NK(P + α2(Q + L)) + α2L(PQ + PK + QK + NQ) + PNQ

)

.

(4.5)

Similarly to Costa (1983), the graphs of R(α) versus α are presented in Fig-ure 4.2 where P = Q = N = 1 and for several {L,K} pairs such as {0, 0}, {0, 1},{1, 0}, {1, 1}, {0,∞} and {1,∞}.

Maximizing R(α) over α, we get3

maxα

R(α) = R(α∗) =1

2ln

(

1 +P (QK + QL + KL)

N(QK + QL + KL) + QLK

)

, (4.6)

which is obtained for α∗ = PQK/(PQK+QNK+L(PQ+PK+QK+NQ+NK)).

Therefore, if the noise powers Q, N, L, K are known at the encoder, we can obtainthe maximum achievable rate given in Equation 4.6.

Table 4.2: Special Cases of the proposed channel coding setup.

CASES Encoder Decoder Rate loss CitationState S1 State S2 for αopt

General Case Partial Partial Rloss General Dikici et al. Section-4.3Case A Perfect Perfect 0Case B Perfect Partial 0 Dikici et al. Section-4.3.0.2Case C Perfect ∅ 0 Costa (1983)Case D Partial Perfect 0Case E Partial ∅ Rloss Case E Zaidi and Duhamel (2005)Case F ∅ Perfect 0 Heegard and Gamal (1983)

The system can be further analyzed for six particular cases, as listed in Table 4.2.Let us first recall that the capacity in the more favorable case, where there is a perfectknowledge of S both at the encoder and decoder, is equal to C∗ = 1

2 ln(

1 + PN

)

.Costa showed that this capacity is achievable through Gaussian distributions and

2See Appendix A.1 for the derivation of the achievable rate.3See Appendix A.2 for the method of derivation.

- 88 -


Figure 4.2: P = Q = N = 1, Graphs of R(α) for {L,K} pairs {0, 0}, {0, 1}, {1, 0},{1, 1}, {0,∞} and {1,∞}. The rate of transmission R(α) is calculated in natsper unit transmission (Maximum value 0.3466 nats/transmission corresponds to 1bit/transmission).

the construction U = X + αS and that, while we keep a perfect knowledge of S atthe encoder, the capacity is still reached even if there is no side information at thedecoder side. Hence, this construction is particularly interesting, and our purposehere is to study the maximum achievable rates it reaches in several other cases.We will first consider, as Costa did, a perfect knowledge of S at the encoder side,deriving cases A, B and C to distinguish the different amount of information at thedecoder side. Cases A ([perfect,perfect]) and C ([perfect,∅]) are not new, since theycorrespond to the ones explored by Costa. In both, the maximum achievable rate isequal to C∗ and then the capacity is reached. The conclusion concerning the capacityfor Case B ([perfect,partial]) could be derived from Case C, as it is weaker, but wegive here the proper expression of the achievable rate, that was not stated by Costa,and see in Section 4.4 that for non optimal values of α there is some possible gain.

- 89 -


4.3.0.1 Case A

S1 = S, S2 = S. This corresponds to the encoder-decoder state pair [perfect, perfect]where K → 0 and L → 0. Then the achievable rate is

RCase-A = limK→0, L→0

R(α) =1

2ln

(

1 +P

N

)

(4.7)

which is independent of α and reaches C∗, hence showing that it is in fact the ca-pacity, and that the capacity is achieved by this construction. Hence, there is noneed of an auxiliary variable U , and we simply have U = X. The graph of Case Ais presented in Figure 4.2 where P = Q = N = 1 and K = L = 0.

4.3.0.2 Case B

S1 = S, S2 = S + T . This corresponds to the encoder-decoder state pair [perfect,partial] where L → 0. The achievable rate of the system is given by

RCase-B(α) = limL→0

R(α) =1

2ln

(

P (K(P + Q + N) + Q(P + N))

PQK(1 − α)2 + NK(P + α2Q) + PNQ

)

. (4.8)

RCase-B(α) is maximized for α⋄ = P/(P + N) which corresponds to a rate ofRCase-B(α⋄) = 1

2 ln(

1 + PN

)

= C∗. Hence, here also the capacity can be reachedby this construction. It is not really surprising, since Costa showed (as we recallin Case C, see below) that the capacity C∗ can be reached by this construction,when there is a perfect side information at the encoder and even if there is no sideinformation at the decoder. The graph of Case B is presented in Figure 4.2 whereP = Q = N = K = 1.

4.3.0.3 Case C

S1 = S, S2 = S + T which corresponds to the encoder-decoder state pair [perfect,∅] where L → 0 and K → ∞.

The achievable rate becomes

RCase-C(α) = limK→∞, L→0

R(α) =1

2ln

(

P (P + Q + N)

PQ(1 − α)2 + N(P + α2Q)

)

(4.9)

This rate is maximized for α⋄ = P/(P +N) then giving RCase C(α⋄) = 12 ln

(

1 + PN

)

=C∗. As Costa showed, the capacity is then reached. The graph of Costa’s limit canbe seen in Figure 4.2 for P = Q = N = 1.

- 90 -


Now, more interesting cases are the ones where the knowledge at the encoderside is only partial. We will first consider in Case D the case where S is perfectlyknown at the decoder side, and show that the maximum achievable rate still reachesC∗. Then, we will consider in Case E the possibility for the decoder to access noside information at all, and see that there is a loss in terms of maximum achievablerate. At last, we will consider in Case F the case where there is no knowledge at theencoder, but a perfect one at the decoder side, showing that the maximum achievablerate hence reaches C∗.

4.3.0.4 Case D

S1 = S + θ, S2 = S. The encoder-decoder state pair is [partial, perfect] whereK → 0. The achievable rate in this case is

RCase-D = limK→0

R(α) =1

2ln

(

P (P + N)

α2L(P + N) + PN

)

. (4.10)

The rate RCase-D is independent of the state power Q. It is maximized for α∇ = 0which corresponds to a maximum achievable rate of RCase-D(α∇) =1

2 ln(

1 + PN

)

= C∗.Actually, if the state is perfectly known to the decoder, but the encoder has a noisyversion of the state, the rate is maximized when we consider U = X, and the capacityis still reached with this construction. The graph of RCase-D is given in Figure 4.2for P = Q = N = L = 1.

4.3.0.5 Case E

S1 = S+θ, S2 = S+T . The encoder-decoder state pair is [partial, ∅] where K → ∞.For this setup the rate is

RCase-E(α) = limK→∞

R(α) =1

2ln

(

P (P + Q + N)

PQ(1 − α)2 + N(P + α2(Q + L)) + α2L(P + Q)

)

,

(4.11)

It is maximized for α† = PQ/(PQ + QN + LP + LQ + LN) which correspondsto a rate of

RCase-E(α†) =1

2ln

(

1 +P (Q + L)

N(Q + L) + QL

)

. (4.12)

The graph of Case E can be seen in Figure 4.2 for P = Q = N = L = 1. Please notethat there exists a loss in Case E with respect to Case A (RCase-E(α†) < RCase-A).Here, we cannot state that RCase-E(α†) corresponds to a capacity: it is the maximumachievable rate for our construction. Zaidi et. al. (Zaidi and Duhamel, 2005)analyze the capacity loss of a setup similar to the Case E such that the channel stateS is not perfectly available to the encoder and is defined by S = S1 + θ where inour case S1 = S + θ. A practical code construction technique for this setup can befound in Zamir et al (2002).

- 91 -


4.3.0.6 Case F

S1 = ∅, S2 = S. The encoder-decoder state pair is [∅, perfect] where K → 0 andα = 0. For this setup the rate is

RCase-F = limK→0

R(0) =1

2ln

(

1 +P

N

)

. (4.13)

Since there is no state information available at the encoder, the auxiliary variableU is U = X. Please remark that the capacity is reached, stating its value for thiscase, and showing that this construction enables to achieve it.

4.4 Capacity/rate gain / loss analysis

In this section, we analyze the capacity analysis of Dirty paper codes with partialstate information at the encoder and decoder sides given in Equation 4.5, and thespecial cases of this setup which are given in Section 4.3. Moreover the rate gain/lossbetween the special cases where the encoder does not have knowledge to the optimumcoding parameter α. Since the capacity/maximum achievable rate has a non-negativevalue, the gain is calculated such that the capacity/maximum achievable rate of asystem is defined as max{0, R(α)}.

4.4.1 For optimum values of α

If the transmitter uses the optimum value of the α parameter for each setup, thereis no capacity gain nor loss for the particular cases A,B,C, D and F. The achievablecapacity in that case is given by

RCase A = RCase B(α⋄) = RCase C(α⋄) = RCase D(α∇) =1

2ln

(

1 +P

N

)

= C∗.

(4.14)In Case E, the optimum value of α yields a maximum achievable rate loss:

Rloss Case E = RCase-E(α†) − RCosta(α⋄) = −1

2ln

(

1 +PQL

N((Q + L)(P + N) + QL)

)

.

(4.15)Similarly, for the optimum value of α, the maximum achievable rate loss for thegeneral case is

Rloss general = R(α∗)−RCosta(α⋄) = −1

2ln

(

1 +PQLK

N((P + N)(QK + QL + LK) + QLK)

)

.

(4.16)

- 92 -

4.4. Capacity/rate gain / loss analysis

4.4.2 For non optimum values of α

However, in actual systems, the transmitter does not have perfect knowledge of theadditive variances N, Q, L, and K, so can not always code with the optimum αparameter. Assuming that the coding is done with a non-optimum α, we analyze therate gain or loss with respect to Costa’s coding setup [perfect,∅] . For instance, forusing the same non-optimal α, there exists a rate gain in Case B [perfect, partial]with respect to Costa’s setup which is given by:

Cgain CaseB-C(α) = max{0, RCase-B(α)} − max{0, Rcosta(α)}

=

12 ln

((

K(P+Q+N)+Q(P+N)(P+Q+N)

)

.(

PQ(1−α)2+N(P+α2Q)(PQK(1−α)2+NK(P+α2Q)+PNQ)

))

if Rcosta(α) > 0

12 ln

(

P (K(P+Q+N)+Q(P+N))PQK(1−α)2+NK(P+α2Q)+PNQ

)

else if RCase-B(α) > 0

0 otherwise

(4.17)

Let us define the Signal to State Ratio (SSR) and Signal to Noise Ratio (SNR) as

SSR = 10 log10

(

PQ

)

and SNR = 10 log10

(

PN

)

.

The graphs showing the capacity gains between Case B and Costa’s setup foran SNR value ranging between −15 dB and +15 dB for different values of the αparameter and of 10 log(Q/K) (∞, 6 dB, 2.1 dB and −1 dB) in Figure 4.3. We fixP = 1, L = 0, SSR = −6 dB4. We observe that, given the values of P, Q, K andfixing α, the capacity is maximized for a certain SNR value such that RCase-B(α) =Rcosta(α

⋄), hence there is 0 capacity gain for that SNR value. However, for other SNRvalues, there always exists a capacity gain with respect to Rcosta(α). It is also evidentthat, given fixed P, Q, N values and an estimate of α, decreasing the 10 log(Q/K)value decreases the capacity gain. Voloshynovskiy et al (2004) assumed thestatistics of the state modeled as a mixture of Gaussian distributions, to be availableat the decoder. When a noisy version of the state is available at the decoder (CaseB) with 10 log(Q/K) ∼= 2 dB (See Figure 4.3(c)), the same rate gain with respectto Costa’s Setup is observed as in Voloshynovskiy et al (2004). For highervalues of 10 log(Q/K), a higher capacity gain can be obtained with respect to theonly knowledge at the decoder of statistical distributions of the state information.

In Case E [partial, ∅], without optimum α at the transmitter, there is a maximum

4Such low SSR values are relevant for practical application such as watermarking.

- 93 -


achievable rate loss with respect to Costa’s setup, given by

Closs CaseE-C(α) = max{0, RCase-E(α)} − max{0, Rcosta(α)}

=

12 ln

(

PQ(1−α)2+N(P+α2Q)PQK(1−α)2+NK(P+α2(Q+L))+α2L(PQ+PK+QK+NQ)+PNQ

)

if RCase-E(α) > 0

12 ln

(

PQ(1−α)2+N(P+α2Q)P (P+Q+N)

)

else if Rcosta(α) > 0

0 otherwise

(4.18)

The maximum achievable rate loss versus SNR graphs between Case E andCosta’s setup can be found in Figure 4.4 where P = 1, SSR = −6 dB, 10 log(Q/L) =2.1 dB and 6 dB, for an SNR value ranging between −15 dB and 15 dB.

Finally, without optimum α parameter, there exists a maximum achievable rategain or loss between the general case [partial,partial] and Costa’s setup [perfect,∅]expressed as a function of P, N, Q, L, K and α. Figure 4.5 shows the maximumachievable rate gain/loss versus SNR graph for an SNR value ranging between −15dB and 15 dB. Please note that P = L = 1, SSR = −6 dB, 10 log(Q/K) = 2.1 dB(for Figure 4.5(a)) and 10 log(Q/K) = 6 dB (for Figure 4.5(b)). Then, the maximumachievable rate gain/loss is plotted for several alpha values: 0, 0.2, 0.4 and 0.6.

4.5 Conclusion

This chapter has analyzed the maximum achievable rate losses and gains for thegeneral setup where the partial state information is available at the encoder and atthe decoder under Gaussian interference. In particular, we derived the capacity forthe case [partial or ∅,perfect], showing that Costa’s construction enables to reach it;this is not the case of [partial,partial or ∅], for which only a maximum achievablerate has ben stated. We then analyzed the gain/loss in terms of achievable rates ifthe optimal coding parameter α is not accessible to the encoder. This general setupis relevant for practical applications such as watermarking under desynchronizationattacks and point-to-point communication over fading channel where the receiverhas an estimate of the channel state.

- 94 -

4.5. Conclusion

(a) Capacity gain for 10 log(Q/K)= ∞. (b) Capacity gain for 10 log(Q/K)= 6 dB.

(c) Capacity gain for 10 log(Q/K)= 2.1 dB. (d) Capacity gain for 10 log(Q/K)= −1 dB.

Figure 4.3: Capacity gain (between RCase-B(α) and RCosta(α)) versus SNR, for dif-ferent α values where P = 1, SSR= −6 dB and various 10 log(Q/K) values, withperfect knowledge of the channel state information at the encoder (L = 0).

- 95 -


(a) Maximum achievable rate loss for 10 log(Q/L)= 2.1 dB.

(b) Maximum achievable rate loss for 10 log(Q/L)= 6 dB.

Figure 4.4: Maximum achievable rate loss (between RCase E(α) and RCosta(α)) versusSNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/L) values.

- 96 -

4.5. Conclusion

(a) Maximum achievable rate gain/loss for 10 log(Q/K)= 2.1 dB.

(b) Maximum achievable rate gain/loss for 10 log(Q/K)= 6 dB.

Figure 4.5: Maximum achievable rate gain or loss (between R(α) and RCosta(α))versus SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/K)values, with partial knowledge of the channel state information at the encoder (L=1).

- 97 -


- 98 -

Chapter 5

Data Hiding and DistributedSource Coding

Contents

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.1.2 Formal Statement of Problem . . . . . . . . . . . . . . . . . 103

5.1.2.1 Data Hiding (F1, G1) . . . . . . . . . . . . . . . . 105

5.1.2.2 Source Coding (F2, G2) . . . . . . . . . . . . . . . 105

5.1.2.3 Summary of the overall setup . . . . . . . . . . . 106

5.1.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 106


5.2.1 Channel Coding with Side Information (CCSI) . . . . . . . 107

5.2.2 Source Coding with Side Information (SCSI) . . . . . . . . 108

5.3 Contribution 1: Capacity Analysis for Multivariate Gaus-sian Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.3.1 Evaluation of the Rate Distortion Function of the Carrier . 109

5.3.2 Capacity of the channel . . . . . . . . . . . . . . . . . . . . 113

5.4 Contribution 2: Practical Code Design . . . . . . . . . . 115

5.4.1 Practical Code Design for the Multivariate Gaussian Case . 115

5.4.1.1 Data Hiding Coder-Decoder Pair (F1 − G1) . . . 116

5.4.1.2 Distributed Source Coding Coder-Decoder Pair(F2 − G2) . . . . . . . . . . . . . . . . . . . . . . 116

5.4.1.3 Theoretical Limits and Performance Analysis ofthe Proposed System . . . . . . . . . . . . . . . . 117

5.4.2 Practical Code Design for Discrete Case . . . . . . . . . . . 117

5.4.2.1 Data Hiding Coder-Decoder Pair (F1 − G1) . . . 117

5.4.2.2 Distributed Source Coding Coder-Decoder Pair(F2 − G2) . . . . . . . . . . . . . . . . . . . . . . 118

- 99 -

Chapter 5. Data Hiding and Distributed Source Coding

5.4.2.3 Experimental Setup . . . . . . . . . . . . . . . . . 118

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

- 100 -

5.1. Introduction

We address the problem of combination of the Informed Data Hiding and DistributedSource Coding within a single system. With the existing limited power devices suchas multi-sensor systems, PDAs etc., the researchers are attracted to low complex-ity data compression and watermarking applications. In this work, we provide anoriginal framework based on Distributed Source Coding (DSC) and informed datahiding (IDH) which uses duality between the source and the channel coding with sideinformation. A mark M is inserted into an host signal S with a fidelity criteriond(S, W ) ≤ D1, and then the watermarked signal W is compressed; given that S, anoisy version of the host signal available only to the decoder. The decoder estimatesboth the message M with a low probability of error Pe(M 6= M) ≤ 10−5 and the hostsignal W with a fidelity criterion d(W, W ) ≤ D2. The rate-distortion function ofthe compression of the watermarked signal W and the capacity of the overall systemis derived for the Gaussian case. Moreover a practical code design based on TrellisCoded Quantization (TCQ) and Low Density Parity Check (LDPC) codes is proposedand is evaluated for both binary and gaussian input cases.1

5.1 Introduction

Both the Gel’fand Pinsker (G-P) model of channel coding with side information atthe encoder Gel’fand and Pinsker (1980) and the Slepian-Wolf (S-W) model oflossless source coding with side information at the decoder Slepian and Wolf

(1973) have various practical applications such as blind watermarking, distributedvideo coding, writing on defected cells. G-P model is extended to the continuousalphabet gaussian sources by Costa Costa (1983), and lossy version of the S-W isdeveloped by Wyner-Ziv (W-Z) Wyner and Ziv (1976). The duality between thesechannel coding and source coding problems is studied in Pradhan et al (2003);Su et al (2000), and a more general model where the state information is partiallyavailable to the encoder and the decoder (which need not to be the same) is studiedboth for source coding Cover and Chiang (2002) and the channel coding Moulin

and Wang (2007); Voloshynovskiy et al (2004)cases.

Recently, various combinations of these source-channel coding schemes have beeninvestigated, for instance combined data hiding and lossy compression with stateinformation available only to the encoder in Maor and Merhav (2005); Yang andSun (2006), and joint source-channel coding for W-Z and G-P channels where thereexists two parallel channels in Merhav and Shamai (2003). In this chapter, weaddress the two problems i) data hiding with the state information available to theencoder and partial state information available to the decoder; ii) lossy compressionwith partial state information available only to the decoder. These two problemscan be applied together within a scenario described as in the following.

Consider the communication problem shown in Figure 5.1. Alice wants to send

1This chapter corresponds to a paper that will be soon submitted. It has been presented partiallyin Dikici et al (2006b) and it is also related to the work of Dikici et al (2006c).

- 101 -


Figure 5.1: A Communication System between Alice and Bob via a non-secure Car-rier.

a message M to Bob with a non-secure Carrier. She uses a host signal S whichis available only to her, and S a noisy version of the host signal is available toBob. Alice does not share her host signal with neither Bob nor the Carrier, howeverBob shares his noisy version with the Carrier at the decoding end. Alice embedsher secret message M within the host signal S with a fidelity criterion such thatEd(S, W ) ≤ D1. Carrier wants to compress the message W while guaranteeing aquality of service (QoS) to Alice and Bob such that his delivered copy W satisfies theconstraint Ed(W, W ) ≤ D2. Hence the Carrier compresses W with the informationthat Bob will share his noisy copy S at the decoding end. After the delivery ofW to Bob, Bob extracts the hidden message M using his noisy copy S with a lowprobability of error Pe.

The novelty of our work is that we analyzed the theoretical limits of the sys-tem, and then propose a practical code design that operates close to the theoreticallimits. One of the application areas of this system can be the development of alow complexity encoder for a mobile handheld that can compress the redundancyof the multimedia data while also carefully embedding the hidden information likemeta-data.




M Decoded Watermark.Pe Probability of decoding error.RC Capacity of the data-hiding system.RS Compression rate.

- 102 -

5.1. Introduction

F1 − G1 Encoder-decoder pair of data hiding.F2 − G2 Encoder-decoder pair of the Wyner-Ziv compression.X Stegotexts.S State information.

S Partial state information available to the decoder.W Watermarked data.

W Decompressed watermarked data at the decoder.B, T, Z Additive random noises.U Auxiliary variable.α A constant for coding with side information.D, D1, D2 Distortion levels.N Gaussian distribution.Σ Covariance matrix.

Q, K, N Variances of S, T and Z respectively.h(X) Differential entropy of X.h(X, Y, Z) Joint differential entropy of X, Y and Z.I(X; Y |Z) Mutual information of X and Y given Z.E Expectation operator.C0 Source code.C1 Channel code.

5.1.2 Formal Statement of Problem

In this section we will give a precise statement of the problem which we statedinformally in the previous section.

Here we consider the discrete-valued hidden message M , and continuous-valuedhost signal S and side information S. Specifically, the sequence {(Sn, Sn)}∞n=1 repre-

- 103 -


Figure 5.2: Data Hiding + Source Coding Scheme.

sents independent samples of a pair of dependent random variables (S, S) with jointprobability p(s, s), taking values within continuous infinite set alphabet S×S, that isfor any n and sn×sn ∈ Sn×Sn, p(sn, sn) =

∏ni p(si, si). (S, S, W, W ) has joint prob-

ability distribution p(S, S, W, W ) and takes values within the set S × S ×W × W.An index M ∈ {1, ..., 2nRC} will be sent to the receiver in n uses of the chan-nel, where RC is the embedding capacity of the channel per transmission. Thesequence {Xn}∞

n=1 which takes values within infinite set X with a power constrainedE(d(S, S + X)) ≤ D1 is used to transmit the index M , where X is independentgiven S and S. Furthermore the coded signal W is compressed by sending an indexV ∈ {1, ..., 2nRS} with a fidelity criterion E(d(W, W )) ≤ D2 where RS is the rate ofthe Carrier per transmission given for a distortion D2.

The goal is to form the best estimate of M with the probability of decodingerror Pe → 0 respecting the fidelity criteria E(d(S, W )) ≤ D1 and E(d(W, W )) ≤D2, where S is available only to the embedding process and S is available to thedecompression and extraction.

This problem involves interplay between source coding and channel coding withside information. We consider the following system, involving embedding-extractionand compression-decompression pairs which are marked as [F1 − G1] and [F2 −G2] respectively. Let us define the data hiding (F1,G1) and source coding (F2,G2)mappings respectively in the following sections.

- 104 -

5.1. Introduction

5.1.2.1 Data Hiding (F1, G1)

There is a mapping pair F1 and G1 given as

F1 : M×Sn → X n, (5.1)

where E(d(X, 0)) ≤ D1, and W is defined as W = X + S so E(d(S, W )) ≤ D1; and

G1 : Wn × Sn → M, (5.2)

where E(d(W, W )) ≤ D2. Given an encoder decoder pair [F1-G1], the error prob-ability averaged over all possible messages M and all host signal Sn is defined byp(F1, G1) = Pr{M 6= M}.

Definition 5.1 RC is the achievable rate if there exists a encoder decoder pair F1-G1 such that p(F1, G1) → 0. The capacity C is the supremum of the achievable rates.

5.1.2.2 Source Coding (F2, G2)

A source code (n, v,∆) is defined by two mappings F2, G2, an encoder and a decoderrespectively, where

F2 : Wn → {1, 2, ..., v} , (5.3)

andG2 : {1, 2, ..., v} × Sn → Wn, (5.4)

andd(W,W) = ∆ (5.5)

Definition 5.2 A pair (RS , D2) is said to be achievable if, for arbitrary ǫ > 0, thereexists (for n sufficiently large) a code (n, v,∆) with

v ≤ 2n(RS+ǫ), ∆ ≤ D2 + ǫ. (5.6)

Definition 5.3 The rate distortion function R(D2) is

R(D2) = min(RS ,D2)∈R

RS , (5.7)

where R is the set of achievable (RS , D2) pairs.

- 105 -


5.1.2.3 Summary of the overall setup

Sender has access to the realization of the secret message M and the noncausal hostsignal realization sn. The encoder function F1 finds

xn = F1(M, sn) (5.8)

with a power criterion 1/n∑

xi2 ≤ D1. Then, the sender passes the watermarked

signal wn = sn+xn to the unreliable carrier. The carrier compresses the watermarkedsignal as

v = F2(wn) = F2(sn + F1(M, sn)), (5.9)

and transmit it to the receiver. The receiver shares its noisy version of the hostsignal with the carrier and the carrier reconstructs the watermarked signal

wn = G2(v, sn) = G2(F2(sn + F1(M, sn)), sn), (5.10)

with a fidelity criterion d(wn, wn) ≤ D2. At the final step, the receiver estimates thesecret message

M = G1(wn, sn) = G1(G2(F2(sn + F1(M, sn)), sn), sn). (5.11)

5.1.3 Summary of Results

In this chapter, we give the rate distortion function of the Carrier and the capacityformula of the system with continuous alphabet gaussian distributed state informa-tion. Letting f(S, X, T ) has a multivariate Gaussian distribution ∼ N (0,ΣS,X,T )where the covariance matrix ΣS,X,T = diag(Q, D1, K). Defining the state informa-

tion available to the decoders G1 and G2 as S = S + T and watermarked signal asW = S + X, Theorem-5.2 states that the minimum rate for a mean distortion levelE{d(W, W )} ≤ D2 for the Carrier is

RS(D2) =

{

12 ln

(

D1D2

+ QK(Q+K)D2

)

, 0 < D2 < D1 + QKQ+K ,

0, D2 ≥ D1 + QKQ+K

(5.12)

in nats per channel use. Moreover, according to the Theorem-5.3, the capacity ofthe overall system is given as

RC =1

2ln

(

1 +D1(D1 + Q − D2)

D2(D1 + Q)

)

(5.13)

in nats per channel use.

Some of our remarks can be found below:

• Remark-1: The rate distortion function RS(D2) of the carrier is the same asin the case where the state information S is accessible at the both compressor(F2) and de-compressor (G2).

- 106 -


• Remark-2: The overall capacity RC does not depends on the accessibilityof S to the decoder G1 or not. In return, the accessibility of S to the de-compressor (G2) affects the capacity RC indirectly because RC depends on D2

(Equation-5.13), where D2 depends on K (Equation-5.12).

• Remark-3: Unlike the capacity term found in Equation-4.8 in page 90, theoverall capacity RC depends on the variance Q of the host signal S.

Finally a practical coding approach for both Gaussian case is proposed usingsuperposition coding in Chapter-3.5 and LDPC binning method in Chapter-2.4. Anda similar coding scheme is done for the the Binary Symmetric Case.

The organization of the remaining of this Chapter as follows:

After giving the theoretical background of the source-channel coding in Chapter-5.2, Chapter-5.3 focuses on the rate-distortion function of the carrier and the overallcapacity analysis of the system. You can find the proofs of the rate distortion functionand the capacity term in this section. Then the practical code design for GaussianCase is given in Chapter-5.4.1 and practical code design for the binary symmetriccase is given in Chapter-5.4.2.


5.2.1 Channel Coding with Side Information (CCSI)

The capacity of the memoryless channel p(y|x, s, s) with state information (S, S)i.i.d. p(s, s), all taking values from finite alphabet, with Sn available to the senderand Sn available to the receiver noncausally (See Figure5.3) is given in Cover andChiang (2002) as

C = maxp(x,u|s)

[I(U ; Y, S) − I(U ; S)], (5.14)

where the maximum is over all joint distributions of p(u)p(s, s, x|u)p(y|x, s, s), whereU is an auxiliary random variable with finite cardinality.

Figure 5.3: Channel Coding with two sided state information scheme

- 107 -


Moreover, the general capacity expression for the continuous alphabet case hasbeen stated in Moulin and Wang (2007):

C = supp(x,u|s)

minp(y|x,s)

[I(U ; Y, S) − I(U ; S)], (5.15)

where s is the state information partially available to the encoder.

The achievable rate region for the continuous alphabet Gaussian case has beenderived in Chapter-4. The readers can refer to Chapter-3.2for more detailed back-ground on CCSI.

5.2.2 Source Coding with Side Information (SCSI)

The details of the two main theorems concerning SCSI are given in Chapter-2.2.While Slepian and Wolf (1973) derived the minimum achievable rate for losslesscompression of discrete input source, Wyner and Ziv (1976) extended this theoryto lossy case and derived the rate distortion function for the binary symmetric caseand the continuous alphabet gaussian input case.

Figure 5.4: Rate Distortion theory with side information at the decoder: Wyner-ZivSetup. Scheme.

5.3 Contribution 1: Capacity Analysis for MultivariateGaussian Source

In this section, we derive the capacity of the multivariate gaussian IDH-DSC commu-nication problem shown in Figure 5.5. An index M ∈ {1, ..., m} will be sent to thereceiver in n uses of the channel, where m is the greatest integer smaller than or equalto enRC , and RC is the rate in nats per transmission. Let S = (S1, S2, ..., Sn) be thesequence of noncausal state of the channel for n transmissions perfectly known tothe encoder, assumed to be a sequence of independent identically distributed (i.i.d.)N (0, Q) random variables. We consider the case where this sequence of state ispartially known to the decoder S = (S1, S2, ..., Sn) noncausally and is modeled asS = S + T where θ is i.i.d. random variable according to N (0, K). We use thesquared error metric for the distortion measure of gaussian source. We first evaluatethe rate distortion function of the Carrier RS(D2), and then find the capacity of theoverall system RC .

- 108 -

5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source

Figure 5.5: Multivariate Gaussian Channel of IDH DSC scheme.

5.3.1 Evaluation of the Rate Distortion Function of the Carrier

Consider the communication channel for the Carrier point of view in Figure-5.6.The noisy version of the host signal sn, such that S = S + T , is available to theencoder when the Switch-A is closed, and it is not available to the encoder if theSwitch-A is open. We are interested in the case where the Switch-A is open, how-ever to derive the rate distortion function for this case, the case where the switch isclosed is employed. We assume that the joint distribution f(S, X, T ) has a multi-variate Gaussian distribution ∼ N (0,ΣS,X,T ) where the covariance matrix ΣS,X,T isΣS,X,T = diag(Q, D1, K).

Definition 5.4 If switch-A is closed, the rate distortion function RW |S(D2) for com-

pressing W given a noisy observation S available both to the encoder and to thedecoder with a fidelity criterion d(W, W ) ≤ D2 is defined as

RW |S(D2) = minp(w|w,s):E{d(w,w)}≤D2

I(W ; W |S). (5.16)

Definition 5.5 If switch-A is open, the rate distortion function R∗W |S

(D2) for com-

pressing W given a noisy observation S available only to the decoder with a fidelitycriterion d(W, W ) ≤ D2 (See Figure5.6) is:

R∗W |S

(D2) = infp(w|w,s):E{d(w,w)}≤D2

[

I(W ; E) − I(S; E)]

, (5.17)

where E is an auxiliary variable.

Theorem 5.1 The rate distortion function RW |S(D2) is:

- 109 -


Figure 5.6: Multivariate Gaussian Case: Carrier point of view.

RW |S(D2) =

{

12 ln

(

D1D2

+ QK(Q+K)D2

)

0 < D2 < D1 + QKQ+K ,

0, D2 ≥ D1 + QKQ+K

. (5.18)

Proof : We first find a lower bound for the rate distortion function. Then provethat this is achievable.

Since E{d(w, w)} ≤ D2, we observe

I(W ; W |S) = h(W |S) − h(W |S, W )

= h(W, S) − h(S) − h(W − W |S, W )

≥ h(W, S) − h(S) − h(W − W ) (5.19)

≥ h(W, S) − h(S) − h(N (0, Ed(W, W ))) (5.20)

= h(W, S) − 1

2ln((2πe)(Q + K)) − 1

2ln((2πe)D2)

=1

2ln((2πe)2((Q + D1)(Q + K) − Q2)) − 1

2ln((2πe)2(Q + K)D2)

(5.21)

=1

2ln

(

D1(Q + K) + QK

(Q + K)D2

)

=1

2ln

(

D1

D2+

QK

(Q + K)D2

)

. (5.22)

where h is the differential entropy defined in Chapter-1.2. Please note thatEquation 5.19 follows from the fact that conditioning reduces entropy, Equa-tion 5.20 follows from the fact that the gaussian distribution maximizes theentropy for a given variance, and Equation 5.21 follows from the fact that the

- 110 -


joint probability of p(w, s) is multivariate gaussian distribution 2 with mean 0and covariance matrix

Σ(w,s) =

[

Q + D1 QQ Q + K

]

. (5.23)

Hence

RW |S(D2) ≥1

2ln

(

D1

D2+

QK

(Q + K)D2

)

in nats, (5.24)

or

RW |S(D2) ≥1

2log2

(

D1

D2+

QK

(Q + K)D2

)

in bits. (5.25)

To find the conditional density f(w|w) that achieves this lower bound, it ismore convenient to look at the test channel (conditional density f(w|w)) toconstruct f(w|w) to achieve the equality in the bound. We choose the jointdistribution as shown in Figure5.7. If D2 ≤ max{0, min{Q+D1, D1 + QK

Q+K }},we choose

W = W + B, W ∼ N (0, Q + D1 − D2), B ∼ N (0, D2) (5.26)

For greater values of D2, if Q + D1 < D1 + QKQ+K we choose W = 0 with

probability 1 achieving R(D2) = 0, and if Q + D1 ≥ D1 + QKQ+K we choose

W = S with probability 1 achieving R(D2) = 0.

This completes the proof.

Figure 5.7: Gaussian test channel that achieves lower bound found in Equation 5.21.Input: W ∼ N (0, Q + D1 − D2), output: W ∼ N (0, Q + D1).

2

2See AppendixA.3 for the formula of joint differential entropy of a multivariate gaussian dis-tributed random variables.

- 111 -


Please note that the equivalence of this test channel can be constructed as W isthe input and W is the output of the channel by using an addition and a multipli-cation operation (See Figure 5.8). The equivalent channel outputs W = (W + Z) · awhere Z is i.i.d. N (0, D2(D1+Q)

D1+Q−D2), and a is a constant multiplier defined as a =

D1+Q−D2

D1+Q . Then W has a gaussian distribution with 0 mean and variance

σ2W

= a2((D1 + Q) +D2(D1 + Q)

D1 + Q − D2) = D1 + Q − D2. (5.27)

We will use this equivalent channel in our capacity calculations of the overall system.

Figure 5.8: Equivalent setup of the test channel in Figure-5.7 by using an additionand a multiplication operator.

Theorem 5.2 For the independent multivariate gaussian case, the rate distortionfunction R∗

W |S(D2) has the value

R∗W |S

(D2) = RW |S(D2) =

{

12 ln

(

D1D2

+ QK(Q+K)D2

)

, 0 < D2 < D1 + QKQ+K ,

0, D2 ≥ D1 + QKQ+K

.

(5.28)

Proof : We give a similar proof as in Wyner and Ziv (1976); Oohama (1997). LetY , E are conditionally independent given X, then the term I(W ; E)− I(S; E)in Equation-5.17 is

I(W ; E) − I(S; E) = h(E|S) − h(E|W )

= h(E|S) − h(E|W , S)

= I(W ; E|S) (5.29)

≥ I(W ; W |S), (5.30)

where Equation-5.29 follows from the assumption of Y , E are conditionallyindependent given X, and Equation-5.30 follows from the data processing in-equality. The equality in Equation-5.30 holds if and only if

- 112 -


h(W, Z|W , S) = 0. (5.31)

For the independent gaussian variables X, T and W in Figure-5.5, the equationh(W, Z|W , S) = 0 holds, and there is no rate loss with respect to the caseSwitch-A is closed. Hence R∗

W |S(D2) = RW |S(D2) and equals to the value as

in Equation-5.28.

2

5.3.2 Capacity of the channel

In this Section, we derive the achievable communication rate between Alice andBob. With our findings of the rate-distortion function of the Carrier in the previoussection, the overall system can be sketched as in Figure-5.9 by replacing the Carrierstep by its equivalent channel setup given in Figure-5.8.

The setup in Figure-5.9 is closely related to the Case-B of "Dirty Paper Cod-ing with Partial State Information" (Chapter-4). The two differences between theFigure-4.1 in page 86 and the Figure-5.9 are i) The absence of the random variable θin Figure-5.9 and ii) A multiplication element is added to the output of the channelin Figure-5.9 such that it outputs W = a · (X +S +Z) while the setup in Figure-4.1outputs Y = X +S +Z. We follow the same methodology as in Chapter-4.3 in orderto find the achievable rate region.

Figure 5.9: Equivalent Scheme of Gaussian Channel.

Theorem 5.3 The capacity RC for the communication system given in Figure-5.9is

RC =1

2ln

(

1 +D1(D1 + Q − D2)

D2(D1 + Q)

)

. (5.32)

- 113 -


Proof : Let X, S, Z, and T are i.i.d. random variables with respective Gaussiandistributions N (0, D1), N (0, Q), N (0, D2(D1+Q)

D1+Q−D2), and N (0, K). We define the

variance of the r.v. Z as D2(D1+Q)D1+Q−D2

= N , and the multiplication constantD1+Q−D2

D1+Q = a. Then, the joint distribution f(X, S, Z, T ) has a multivariateGaussian distribution ∼ N (0,ΣX,S,Z,T ) with a covariance matrix ΣX,S,Z,T =diag(D1, Q, N , K).

The channel outputs W = a · (X + S + Z). Assuming U = X + αS where α isa constant to be determined, the joint distribution, f(U, W , S) is then a mul-tivariate gaussian with mean 0 and covariance matrix ΣU,W ,S = BΣX,S,Z,TBt

where B is the matrix that satisfies the equation

U

W

S

= B ·

XSZT

. (5.33)

The solution for the matrix B

B =

1 α 0 0a a a 00 0 0 1

(5.34)

yields to a covariance matrix

ΣU,W ,S =

D1 + α2Q a(P + αQ) αQ

a(P + αQ) a2(D1 + Q + N) aQαQ aQ Q + K

(5.35)

Then the relevant mutual informations can be calculated to yield

I(U ; W , S) = h (U) + h(W , S) − h(U, W , S)

= h(X + αS) + h(a(X + S + Z), S + T ) − h(U, W , S)

=1

2ln(

(2πe)(D1 + α2Q))

+1

2ln(

(2πe)2(a2((D1 + Q + N)(Q + K) − Q)))

− 1

2ln(

(2πe)3(a2(D1QK(1 − α)2 + NK(D1 + α2Q) + D1NQ)))

(5.36)

(5.37)

and similarly

I(U ; S) = h(U) + h(S) − h(U, S) =1

2ln

(

D1 + α2Q

D1

)

. (5.38)

- 114 -

5.4. Contribution 2: Practical Code Design

Then if the term I(U ; W , S) − I(U ; S) can be given as a function of α as

R(α) =1

2ln

(

D1(K(D1 + Q + N) + Q(D1 + N))

D1QK(1 − α)2 + NK(D1 + α2Q) + D1NQ

)

. (5.39)

Equation-5.39 has the same form as in Equation-4.8 in page 90. As in same way,if the Equation-5.39 is maximized with respect to α, the maximum achievablerate is found to be

R(α⋄) =1

2ln

(

1 +D1

N

)

=1

2ln

(

1 +D1(D1 + Q − D2)

D2(D1 + Q)

)

(5.40)

for α⋄ = D1/(D1 + N). Please note that the maximum achievable rate doesnot depends on the correlation noise K. Since the achievable rate can not passthe capacity where the state information S is perfectly available both to theencoder and to the decoder which is equal to R(α⋄), then the capacity of thischannel is

RC = R(α⋄) =1

2ln

(

1 +D1(D1 + Q − D2)

D2(D1 + Q)

)

(5.41)

This completes the proof. 2

5.4 Contribution 2: Practical Code Design

In the following two sections, we will give practical code design of a hybrid schemewhich utilize both channel coding and rate distortion with state information at theencoder and decoder for the Data Hiding and Distributed Source Coding problemintroduced in the previous sections. The first code design is designed for evaluatingthe theoretical limits calculated for the Gaussian side information case in Chapter-5.3. While for the embedding part, the superposition data hiding code explained inChapter-3.5 is applied; for the source coding part, we use the DSC coding mechanismexplained in Chapter-2.4. The second practical design one is for the side informationwith discrete alphabet case.

5.4.1 Practical Code Design for the Multivariate Gaussian Case

The theoretical rate distortion function of the Carrier and the overall capacity limitsof the communication system given in Figure-5.5 for the Gaussian case is calculatedas Equation-5.28 and Equation-5.32 in Chapter-5.3.

In this section, we propose a hybrid scheme for gaussian case which utilize bothchannel coding and rate distortion with state information at the encoder and decoderrespectively. Briefly, Alice has a n-length host vector s where each element of thevector is i.i.d. with probability distribution ∼ N (0, Q). Bob has a noisy versionof this host vector s = s + t where each element of t is i.i.d. with probability

- 115 -


distribution ∼ N (0, K). At the decoding end, Bob shares this noisy version with theCarrier. Alice embeds n/2 bit length message M within s (which corresponds to anembedding rate of RC = 1/2 bit per channel use) such that the watermarked signalw satisfies a fidelity criterion 1/n

∑ni=1(wi−si)

2 ≤ D1. The Carrier then compressesthe vector w to RS = 1 bit/channel use and decompresses at the decoder side as w

using the noisy version s shared by Bob such that the MSE distortion level satisfies1/n

∑ni=1 (wi − wi)

2 ≤ D2. In the final stage, Bob extracts the hidden message Mwith the help of w and s. Decoding error probability can be calculated as

Pe =

∑n/2i=1(Mi ⊕ Mi)

n/2, (5.42)

where∑

is defined to be summation over real numbers while ⊕ is the modulo-2summation.

Up to this point, we only fix the embedding rate RC as 1/2 bit per channel useand compression rate RS as 1 bit per channel use. Below you can find the details ofeach block.

5.4.1.1 Data Hiding Coder-Decoder Pair (F1 − G1)

For F1 − G1 pair for Alice and Bob, we use the superposition embedding describedin Chapter-3.5. F1 is composed from an LDPC coder and a TCQ coder. A 1/2rate LDPC code C1 modulates the hidden message M with a variance of α⋄D1 asdescribed in Chapter-3.5.3, where α⋄ = D1

(D1+D2(D1+Q)D1+Q−D2

)is a constant that maximizes

Equation-5.40. Then the quantization code C0 finds the embedding error signal x

which has a variance D1. Finally F1 outputs the watermarked signal w = s + x.

The decoder G1 receives the noisy observation w from the Carrier and accessesthe noisy state information s. Then extracts the message M using a joint LDPC-BCJR decoding algorithm as explained in Chapter-3.5.6.

According to the performance of the data hiding system explained in Chapter-3.5,for Q = 1 the data can be embedded with a embedding noise variance D1 = 0.062.For an embedding rate of 1/2 bit per channel use, the hidden message can be decodedeven after an AWGN noise which is 1.5 dB away from the theoretical AWGN noise.

5.4.1.2 Distributed Source Coding Coder-Decoder Pair (F2 − G2)

We now explain our code design for the Carrier’s F2 − G2 pair. For F2, a 4-levellloyd-max quantizer is used for quantizing w as 2 bit per sample vector wq. TheCarrier then codes these quantized 2n bits by a 2/3 rate LDPC code as explained inChapter-2.4, and only the n bit parity vector z is transmitted to the Bob.

In the decoder end G2, with the help of noisy state information s shared by Bob,the Carrier applies an iterative belief propagation decoding process (See Chapter-2.4.3 for details).

- 116 -

5.4. Contribution 2: Practical Code Design

5.4.1.3 Theoretical Limits and Performance Analysis of the ProposedSystem

The theoretical limits of the rate-distortion and channel capacity are calculated asEquation-5.12 and Equation-5.13. Let us fix the embedding capacity RC as 1/2bit/channel use, the bit-rate of the carrier RS as 2 bits/channel use, the embeddingpower D1 as 0.062 and the variance of the host signal Q as 1. The theoretical D2

value in order to achieve this capacity can be found by evaluating Equation-5.13

1

2=

1

2log2

(

1 +0.062(0.062 + 1 − D2)

D2(0.062 + 1)

)

, (5.43)

which yields D2 = 0.0586. If we replace the theoretical D2 value to the rate distortionfunction Equation-5.12 to find the corresponding K value that achieves a rate 1bit/channel use, we end up with

1 =1

2log2

(

0.062

0.0586+

K

(1 + K)0.0586

)

, (5.44)

which corresponds to K = 0.2082. In our system the embedding process can beperfectly reconstructed up to a MSE level D2 = 0.0422 which corresponds a gap of

10 log10

(

0.0586

0.0422

)

= 1.43 dB (5.45)

from the theoretical setup.

5.4.2 Practical Code Design for Discrete Case

In this section, we develop a toy example for the combined IDH DSC setup in BinarySymmetric Case. A simple embedding process is followed by DSC coding basedon LDPC binning. The aim is to achieve a low embedding rate with a fidelitycriterion based on hamming distance. Then the watermarked signal is compressedusing Slepian-Wolf coding.

5.4.2.1 Data Hiding Coder-Decoder Pair (F1 − G1)

For the case of informed data hiding of M within S, we used basic quantizationbased on memoryless coset construction. The algorithm is described as follows: 3bits information is partitioned into 4 cosets such that each element of the coset hasa hamming distance of 3. According to the two bits data of M the coset membersof that index is chosen Coset 00 = {000, 111}, Coset 01 = {001, 110}, Coset 10 ={010, 101}, Coset 11 = {011, 100}. After creating the codebook, 2bit chunk of Mand R bit chunk of S is taken. And the least significant 3 bits of the sub-block ofthe host signal S is depicted for embedding. The 3 bits value of S is quantized toW : W (S, M) = arg minZ∈Coset M ‖ Z − S ‖ which W is at most one bit differ from

- 117 -


M . The distance metric is chosen as hamming distance. And this insertion of 2 bitswithin block length R continues until embedding all the data. As an example, assumethat the 2 bits length message 01 is being embedded into the least 3 significant bitsof S which is 010. The element which has the minimum hamming distance between010 and the elements of Coset 01 is chosen as the quantification output, which isW = 110 in this case. At the decoder side, the extraction of the watermark isstraightforward such that the knowing the codebook and insertion frequency R, thecoset index that the received block data resides in is decoded as the embedded data.

5.4.2.2 Distributed Source Coding Coder-Decoder Pair (F2 − G2)

For F2 − D2 pair, we use Syndrome Coding using LDPC. The Carrier codes thewatermarked signal bits W by using a 2/3 rate LDPC code as explained in Chapter-2.4, and only the parity vector z is transmitted to the Bob. In the decoder end G2,with the help of noisy state information s shared by Bob, the Carrier applies aniterative belief propagation decoding process (See Chapter-2.4.3 for details).

5.4.2.3 Experimental Setup

In our experiments, we fix R = 20 and embed a 50 bit message message M into4000 bit signal S which is distributed Bernoulli(1/2). Then using 2/3 LDPC binningscheme as explained in Chapter-2.4 the W is compressed to a 2000 bit length signaland transmitted to the decoder. The decoder performs a modified belief propagationdecoding using the parity bits of W and the side information S = S ⊕ T where Tis binary string with bernoulli(p1) distribution. The performance of the system fora block length 4000 is compared with the performance of DSC system without anyembedding explained in Chapter-2.4.4. As seen in Figure-5.10, decoding bit errorrate of the LDPC decoder versus entropy of the correlation noise H(p1) is plotted.The dashed curve corresponds to the case where there is no embedding in to the Swhere the other corresponds to the compression performance of the W after 1/20 bitper sample embedding rate. The embedding process has a performance loss of 0.02bit per sample if we compared with no embedding case which is acceptable.

5.5 Conclusion

In this chapter, both theoretical and practical analysis of IDH and DSC system isdone. In theoretical part, strong information theoretical results are achieved such asthe derivation of rate distortion function for the non-trust Carrier and the capacityformula of the overall embedding system. We also concluded interesting remarks onthese theoretical findings such as absence of the noisy state information of the Bob tothe Carrier encoding stage does not change the rate distortion curve. Similarly, theabsence of the original host signal to the Bob does not neither change the capacity of

- 118 -

5.5. Conclusion

Figure 5.10: Embedding performance for 1/200 bit per sample with a compressionof 2 : 1 of the watermarked string using 2/3 rate LDPC code with block length 4000.Minimum 0.02 bit per sample entropy rate loss with respect to no-embedding case.

the embedding system. Moreover, practical code designs for gaussian case and BSCcase are proposed with the help of our proposed DSC method in Chapter-2 and IDHmethod in Chapter-3.

- 119 -


- 120 -

Conclusion

Strongly motivated by the duality between the source coding and the channel codingwith state information, we would like to propose a system that contains data hidingand efficient compression functionalities. Moreover the theoretical limits of the pro-posed system should be studied while evaluating the proposed practical design withrespect to the theoretical limits. This subject intersects a wide range of signal pro-cessing fundamentals such as error correcting codes, vector quantification, likelihoodmarginalization, iterative decoding while the analysis of the system limits is relatedstrongly to information theory.

The contributions of this dissertation can be grouped as theoretical findings andpractical code designs.

Information Theoretical Contributions

In this dissertation, the theoretical rate distortion function and embedding capacitybounds are derived for the infinite alphabet gaussian case. Our theoretical contribu-tions can be itemized as follows:

1. Maximum achievable rate of the communication system in Figure-4.1 in page86, where the channel state information S is partially available to the encoderas S1 and partially available to the decoder as S2 is derived in Chapter-4. Thisgeneral setup is reduced into more simpler cases and each case is analyzed indetail.

2. The capacity of the communication system in Figure-5.9 in page 113 is eval-uated where the state information S is perfectly available to the encoder asS and partially available to the decoder as S, and the channel outputs thecompressed signal W .

3. Rate distortion function of the communication system in Figure-5.6 in page110 is derived where the compression of S + X is done with a noisy versionS + T is accessible to the decoder.

- 121 -

Conclusion

Table 5.2: Channel Coding with State Information Problems

Problem Encoder Channel Decoder Type ofState S1 State Sa State S2 source

Gel’fand and Pinsker (1980) S S - DiscreteCosta (1983) S S - Gaussian

Cover and Chiang (2002) S1 S S2 DiscreteDikici et al. S1 S S2 Gaussian

Chapter-4(General Case)Dikici et al. S S S2 Gaussian

Chapter-4(Case-B)Public Watermarking S - - Discrete or

Gaussian

Table-5.2 briefly gives the existing theoretical studies in the field of channel codingwith side information, and compares with our theoretical contributions in this area.The problems are defined as the channel state Sa, its availability to the encoder andto the decoder while the type of the state can be drawn from discrete or continuousalphabet sets. Hence the rows four and five correspond to our information theoreticalcontributions no1 and no2 respectively.

Similarly, Table-5.3 positions our contribution no3 with respect to the sourcecoding with channel information problems. The encoder input, the decoder’s sideinformation and the type of sources investigated in each problem are given.

Table 5.3: Source Coding with State Information Problems

Problem Encoder Decoder Type ofaccess access source

Slepian and Wolf (1973) S S + T Lossless rateDiscrete Case

Wyner and Ziv (1976) S S + T R(D) functionBSC and Gaussian

DIKICI et al. S + X S + T R(D) function(Chapter-5) Gaussian

Proposed Practical Code Designs

Our proposed practical code designs can be grouped into two categories such asdistributed source coding and data hiding.

In DSC, we proposed a Slepian-Wolf coder based on LDPC binning which have a

- 122 -

Conclusion

performance gap of 0.08 bits per channel use with respect to the maximum correlationnoise variance for 2 : 1 rate compression. Moreover this coding method is applied toan image compression system where Low pass DWT coefficients assume to be knownto the decoder as side information.

In Data Hiding, we proposed a low embedding rate robust image watermarkingbased on Miller et al (2004) system with using DWT coefficients of the imageand a perceptual shaping for embedding process. Furthermore a high embedding ratesystem is proposed with the concatenation of a good source code based on TCQ anda good channel code based on LDPC. The system operates at an AWGN variance1.5 dB away from the theoretical limits for an embedding rate of 1/2 bit per channeluse.

Finally, the combination of our Slepian-Wolf coding scheme and Superpositiondata hiding scheme is used to evaluate our theoretical findings in Chapter-5.

Perspectives

The perspectives can be grouped as the application point of view and the theoreticalpoint of view.

For the application perspectives, several improvements on the proposed schemesand possible can be achieved. For instance, the utilization of the max-lloyd quan-tizer for the Slepian-Wolf coder in Chapter-5.4.1.2 can be replaced by a more effectivequantization code. Moreover, in high embedding rate practical design in Chapter-3.5.2.2, the 2-level PAM coding of the channel code C1 can be done with a moreefficient way by considering also the side information available to the encoder. More-over, a practical code design for the general case given in Chapter-4 can be proposedusing a modified version of the informed high embedding rate code in Chapter-3.5.

One of the practical application based on the proposed schemes in this disserta-tion can be the transmission of high resolution image or video given that a coarseversion is publicly freely available. The second stream enhances the coarse versionif the receiver purchased the key embedded on to the second stream. Another appli-cation could be the embedding of some meta-data into image for indexing purposes.

Finally, in theoretical point of view, two main directions that we will continue tostudy can be given as:

• Our information theoretical contributions can be extended to the case wherestate information is not i.i.d. but drawn from gauss-markov source. By derivingso, a more realistic theoretical limits for image and video signals can be found.

• The theoretical setup of communicating with non-trust Carrier in Chapter-5can be extended to the encrypted domain such that Alice transmits her signalto Bob using an encryption and the Carrier try to compress the encrypteddomain signal with a fidelity criterion.

- 123 -

Conclusion

- 124 -

Appendix

- 125 -

Appendix A

Achievable Rate RegionCalculations for two partially

side information known to theencoder and decoder

respectively

A.1 Derivation of the Achievable Rate Region

Recalling that Y = X + S + Z and S2 = S + T , the joint distribution of U, Y, S2 isa multivariate Gaussian distribution f(U, Y, S2) ∼ N (0,BΣBt), where

UYS2

= B ·

XSZθT

(A.1)

and where

B =

I αI 0 αI 0I I I 0 00 I 0 0 I

. (A.2)

Then,

B · Σ · Bt =

(P + α2(Q + L))I (P + αQ)I αQI

(P + αQ)I (P + Q + N)I QI

αQI QI (Q + K)I

. (A.3)

- 127 -

Chapter A. Achievable Rate Region Calculations for two partially side informationknown to the encoder and decoder respectively

Hence, the joint entropy1 of the random variables (U ; Y ; S2) is

h(U ; Y ; S2) = 1/2 ln(

(2πe)3∣

∣BΣBt∣

∣

)

. (A.4)

The relevant mutual informations can be calculated to yield

I(U ; Y, S2)

= h (U) + h(Y ; S2) − h(U ; Y ; S2)

= h(X + αS + αθ)) + h(X + S + Z; S + T )

− h(U ; Y ; S2)

=1

2ln(

(2πe)(P + α2(Q + L)))

+1

2ln(

(2πe)2((P + Q + N)(Q + K) − Q2))

− 1

2ln(

(2πe)3(PQK(1 − α)2 + NK(P + α2(Q + L)) + α2L(PQ + PK + QK + NQ) + PNQ))

(A.5)

and similarly

I(U ; S1) = h(U) + h(S + θ) − h(U ; S + θ) =1

2ln

(

P + α2(Q + L)

P

)

. (A.6)

A.2 Maximization of the Rate

The rate function in Equation 4.5 has the form:

R(α) =1

2ln

(

D

Aα2 + Bα + C

)

, (A.7)

where A, B, C and D are constants depending of the values P,Q,K and L. Thedenominator of the ln term is a quadratic polynomial and is minimized when α =−B/2A. Then, maximization of R(α) with respect to α has the form:

R(−B/2A) =1

2ln

(

4AD

4AC − B2

)

. (A.8)

Since the term D is expressed as C+[...], the rate can be written as:

R(−B/2A) =1

2ln

(

4A(C + [...]) − B2 + B2

4AC − B2

)

=1

2ln

(

1 +4A[...] + B2

4AC − B2

)

(A.9)

and then it is straightforward to obtain the rate by replacing A, B, C and [...] bytheir values.

1See Appendix A.3.

- 128 -

A.3. Entropy of Multivariate Gaussian Distribution

A.3 Entropy of Multivariate Gaussian Distribution

It is well known that for X multivariate Gaussian distribution X ∼ N (µ,Σ) withmean µ and covariance matrix Σ, then,

fX(x1, x2, .., xn) =1

(2π)n/2|Σ| exp(−1

2(x − µ)TΣ−1(x − µ)) (A.10)

where |Σ| is the determinant of the covariance matrix.

Joint entropy of f is:

h(f) =

∫ ∞

−∞

∫ ∞

−∞...

∫ ∞

−∞f(x) ln(f(x)) dx

=1

2(n + n ln(2π) + ln |Σ|)

=1

2ln ((2πe)n |Σ|) . (A.11)

Moreover if Y is a linear transformation of X such that Y = BX, Y is alsomultivariate Gaussian distribution with X ∼ N (Bµ,BΣ−1Bt).

- 129 -

Chapter A. Achievable Rate Region Calculations for two partially side informationknown to the encoder and decoder respectively

- 130 -

Appendix B

Codes and Degree Distributionsfor Generating LDPC Matrices

You can find below the LDPC degree distributions used for Distributed Source Cod-ing in Chapter-2 and Informed Data Hiding in Chapter-3.

B.1 Degree Distributions of rate 2/3 code, for 2 : 1 com-pression rate in DSC

• Regular code: λ(x) and ρ(x) are given as

λ(x) = x2, (B.1)

andρ(x) = x5. (B.2)

• Irregular Code: λ(x) and ρ(x) are given as

λ(x) =0.41584493083218x + 0.32456702571975x2 + 0.17761981591744x6

+ 0.0025725519244473x8 + 0.0046654731946759x18

+ 0.039272974694212x20 + 0.015612811744969x21

+ 0.0017256946022807x26 + 0.01811872137005x99, (B.3)

andρ(x) = 0.80851063829787x17 + 0.19148936170213x18. (B.4)

- 131 -

Chapter B. Codes and Degree Distributions for Generating LDPC Matrices

B.2 Degree Distribution of rate 1/2 code, for InformedData Hiding

λ(x) and ρ(x) are given as

λ(x) =0.4811081282955x + 0.31433341715558x2

+ 0.15356804095148x6 + 0.050990413597444x19, (B.5)

andρ(x) = x7. (B.6)

- 132 -

Appendix C

Publications of the author

Publications Related to the Thesis

In Preparation

• «Dirty Paper Coding with Partial State Information».

• «Joint Data Hiding and Wyner-Ziv Coding, Theory and Practice»..

International Conferences and Workshops

• Dikici, C., Idrissi, K. and Baskurt, A. «Dirty-paper writing based onLDPC codes for Data Hiding». International Workshop on Multimedia ContentRepresentation, Classification and Security (MRCS), pages 114–120, LNCS,September 2006.

• Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and SourceCoding of Still Images». European Signal Processing Conference (EUSIPCO).September 2006.

• Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and SourceCoding with Partially Available Side Information». SPIE Security, Steganog-raphy, and Watermarking of Multimedia Contents VIII , volume 6072, 60721E,February 2006.

• Dikici, C., Guermazi R., Idrissi, K. and Baskurt, A. «Distributed SourceCoding of Still Images». European Signal Processing Conference (EUSIPCO)VIII , September 2005.

- 133 -

Chapter C. Publications of the author

National Conferences

• Dikici, C., Idrissi, K. and Baskurt, A. «Tatouage informé pour le codagedistribué». CORESA, September 2006.

National Plenary

• Dikici, C., «Codage et tatouage avec information adjacente». GDR ISIS,ThémeD: Télécommunications: Journée Plénière, Paris, December 2006.

Other Publications

• Dikici, C. and Bozma, I. «Video Coding Based on Pre-attentive Processing».SPIE Real-Time Imaging , volume 5671, pages 212–220, January 2005.

• Dikici, C., Civanlar, R. and Bozma, I. «Fovea based Coding for VideoStreaming». International Conference on Image Analysis and Recognition (ICIAR),LNCS, volume 3211, pages 285-294, Porto., September 2004.

• Dikici, C., Alp, U., Ayaz, H., Karadeniz, M., Civanlar, R. and Bozma,

I. «Fovea based Real-Time Video Processing and Streaming». Proc. of SignlProcessing and Applications Conference (SIU), Istanbul , 2003 [in turkish].

• Alp, U., Ayaz, H., Karadeniz, M., Dikici, C. and Bozma, I. «RemoteControl of a Robot over the Internet.». Proc. of Signl Processing and Applica-tions Conference (SIU), Istanbul , 2003 [in turkish].

• Sarac, I., Dikici, C. and Sankur, B. «New framing protocol for IP overSONET / SDH». Proc. of 1 st Communication Conference, Ankara, 2001 [inturkish].

- 134 -

Appendix D

Cited Author Index

This index lists the names of all authors cited in the references of this dissertation. Itis designed to enable the reader to locate the chapters of this book in which the workof specific authors is discussed. Entries refer the reader to a page number. Thus,the entry “Cheng, S. 49, 50, 53” means that Cheng, S. is cited in pages 49, 50 and53 respectively.

Aaron, A. M. xi, 3, 47–50, 53, 54

Acikel, O. F. 4, 47

Amraoui, A. 51, 75

Anderson, J. B. 4, 47

Bahl, L. 4, 22, 29

Bajcsy, J. 47

Baskurt, A. 38, 54, 101

Bastug, A. 66

Bauml, R. 66

Benedetto, S. 4, 47

Bennatan, A. 67, 73

Berger, T. 43

Berrou, C. 4, 47

Bilgin, A. 69

Boliek, M. P. 69

Boutros, J. 75

Burshtein, D. 67, 73

Caire, G. 67, 73, 75

Chen, B. 20, 64–66

Cheng, S. 49, 50, 53

Chiang, M. 4, 66, 85–87, 101, 107, 122

Chou, J. 4, 47, 66, 101

Chung, S. Y. 4, 26, 47, 51, 75

Cocke, J. 4, 22, 29

Cohen, A. S. 65

Costa, M. 3, 5, 7, 20, 64, 84, 85, 88, 101,122

Cover, T. M. 4, 14, 43, 66, 85–87, 101,107, 122

Cox, I. J. 20, 64–66, 68, 73, 123

Craver, S. 61

Dikici, C. 38, 54, 101

Divsalar, D. 4, 47

Doberty, L. 45, 50

Doërr, G. J. 66, 68, 73, 123

Dragotti, P. L. 50

Duhamel, P. 84, 88, 91

Eggers, J. J. 4, 66, 101

Erez, U. 65, 67, 91

- 135 -

Cited Author Index

Ficher, T. R. 21, 49, 74

Foriš, P. 70

Forney, G. D. Jr. 4, 47, 74

Gallager, R. G. 4, 25, 47

Gamal, A. E. 84, 88

Garcia-Frias, J. 47

Gehrigand, N. 50

Gel’fand, S. I. 3, 64, 84, 101, 122

Georghiades, C. N. xi, 47–50, 53, 54

Girod, B. xi, 3, 4, 47–50, 53, 54, 61, 66,101

Glavieux, A. 4, 47

Gormish, M. J. 69

Guermazi, R. 38

Guillemot, C. 49, 53

Hartung, F. 60, 61

Heegard, C. 84, 88

Holliman, M. 62

Idrissi, K. 38, 54, 101

Ishwar, P. 50

Jelinek, F. 4, 22, 29

Kerckhoffs, A. 61

Klein Gunnewiek, R. 50

Koval, O. 93, 101

Kuhn, M. 62

Kusuma, J. 45, 50

Kutter, M. 60

Lagendijk, R. L. 50

Lajnef, K. xi, 47, 49, 53, 54

Lan, C. F. 49

Lapidoth, A. 65

Le Gall, D. 55, 69

Le Guelvouit, G. 66

Levický, D. 70

Liu, Z. 49

Liveris, A. D. xi, 47–50, 53, 54

Mackay, D. J. C. 4, 25, 47

Majumbar, A. 50

Maor, A. 101

Marcellin, M. W. 21, 49, 69, 74

McKellips, A. L. 20, 65

Memon, N 62

Merhav, N. 101

Mihcak, K. M. 93, 101

Mihcak, M. K. 61

Miller, M. L. 20, 64–66, 68, 73, 123

Mitran, P. 47

Montorsi, G. 4, 47

Moulin, P. 61, 65, 84, 87, 101, 108

Narayanan, F 49

Neal, R. M. 4, 25, 47

Oohama, Y. 112

O’Sullivan, J. A. 65

Ozonat, K. 54

Pearl, J. 27

Pereira, S. 62

Pérez-González, F. 93, 101

Petitcolas, F. A. P. 61, 62, 72

Pinsker, M. S. 3, 64, 84, 101, 122

Pollara, F. 4, 47

Pradhan, S. S. 4, 45, 47–50, 66, 101

Pun, T. 93, 101

Puri, R. 50

Ramchandran, K. 3, 4, 45, 47–50, 66,101

Rane, S. 50

- 136 -

Cited Author Index

Raviv, J. 4, 22, 29

Rebollo-Monedero, D. 49, 50

Richardson, T. J. 4, 26, 30, 47

Ryan, W. E. 4, 47

Salehi, M. 84

Sankur, B. 66

Schonberg, D. 48

Setton, E. 49, 50

Shamai, S. 49, 65, 67, 73, 91, 101

Shannon, C. E. 2, 84

Shokrollahi, M. A. 4, 26, 30, 47

Siohan, P. 49, 53

Slepian, D. 3, 6, 19, 38, 39, 45, 101, 108,122

Solomon, J. A. 70

Stankovic, V. 50

Stone, H. S. 61, 62

Su, J. K. 4, 61, 101

Sun, W. 101

Tabatabai, A. 55, 69

Tepe, K. E. 4, 47

Thitimajshima, P. 4, 47

Thomas, J. A. 14, 43

Tzschoppe, R. 66

Ungerboeck, G. 21, 45, 74

Urbanke, R. L. 4, 26, 30, 47, 51, 75

Varodayan, D. 48

Villasenor, J. 70

Viterbi, A. 22

Voloshynovskiy, S. 93, 101

Wang, Y 84, 87, 101, 108

Watson, A. B. 67, 68, 70

Westerlaken, R. P. 50

Wolf, J. 3, 6, 19, 38, 39, 45, 101, 108, 122

Wornell, G. W. 20, 64–66

Wyner, A. 3, 6, 38, 41–43, 45, 101, 108,112, 122

Xiong, Z. xi, 47–50, 53, 54

Yang, E. H. 101

Yang, G. Y. 70

Yang, Y. 50

Yeo, B. L. 61

Yeung, M. M. 61

Zaidi, A. 84, 88, 91

Zamir, R. 49, 65, 67, 91

Zhao, Y. 47

Zhu, X 50

Ziv, J. 3, 6, 38, 41–43, 101, 108, 112, 122

- 137 -

Cited Author Index

- 138 -

Bibliography

Aaron, A. M. and Girod, B. «Compression with Side Information Using TurboCodes». In DCC ’02: Proceedings of the Data Compression Conference (DCC ’02),page 252. IEEE Computer Society, Washington, DC, USA. 2002.

Aaron, A. M., Setton, E. and Girod, B. «Towards practical Wyner-Ziv codingof video». In Proceedings of the IEEE Image Processing, ICIP , volume 2,3, pages869–872. 2003.

Acikel, O. F. and Ryan, W. E. «Punctured turbo-codes for BPSK/QPSK chan-nels». IEEE Trans. Commun., 47(9):1315–1323. 1997.

Amraoui, A., Chung, S. Y. and Urbanke, R. L. «LTHC: Ldpcopt.» http:

//lthcwww.epfl.ch/research/ldpcopt/. Access Date: Oct 2007. 2003.

Bahl, L., Cocke, J., Jelinek, F. and Raviv, J. «Optimal decoding of linearcodes for minimizing symbol error rate (Corresp.)». IEEE Trans. Inform. Theory ,20(2):284–287. 1974.

Bajcsy, J. and Mitran, P. «Coding for the slepian-wolf problem with turbocodes». In GlobeCom’01, San Antonio. 2001a.

Bajcsy, J. and Mitran, P. «Design of fractional rate FSM encoders using Latinsquares». In IEEE Int. Symp. Inform. Theory - Recent Results Session, Washing-ton. 2001b.

Bastug, A. and Sankur, B. «Improving the payload of watermarking channelsvia LDPC coding». IEEE Signal Processing Lett., 11(2):90–92. 2004.

Benedetto, S., Divsalar, D., Montorsi, G. and Pollara, F. «Serial concate-nation of interleaved codes: Performance analysis, design, and iterative decoding».IEEE Trans. Inform. Theory , 44(5):909–926. 1998.

Bennatan, A., Burshtein, D., Caire, G. and Shamai, S. «Superposition codingfor side-information channels». IEEE Trans. Inform. Theory , 52(5):1872–1889.2006.

- 139 -

BIBLIOGRAPHY

Berger, T. Rate-Distortion Theory: A mathematical basis for data compression.Prentice-Hall. 1971.

Berrou, C. and Glavieux, A. «Near optimum error correcting coding and decod-ing: turbo-codes». IEEE Trans. Commun., 44(6):1261–1271. 1996.

Berrou, C., Glavieux, A. and Thitimajshima, P. «Near Shannon limit error-correcting coding and decoding: Turbo-Codes». In IEEE International Conferenceon Communications, Geneve. 1993.

Boutros, J. and Caire, G. «Iterative multiuser joint decoding: United frameworkand asymptotic analysis». IEEE Trans. Inform. Theory , 48(7):1772–1793. 2002.

Chen, B. and Wornell, G. W. «Digital watermarking and information embed-ding using dither modulation». In IEEE Second Workshop on Multimedia SignalProcessing , pages 273–278. 1998.

Chen, B. and Wornell, G. W. «Provably robust digital watermarking». In SPIE:Multimedia Systems and Applications II (part of Photonics East 99), Boston, vol-ume 3845, pages 43–54. 1999.

Chen, B. and Wornell, G. W. «Quantization index modulation: A class ofprovably good methods for digital watermarking and information embedding».IEEE Trans. Inform. Theory , 47(5):1423–1443. 2001.

Chou, J., Pradhan, S. S. and Ramchandran, K. «A robust blind watermakingscheme based on distributed source coding principles». In ACM Multimedia, pages49–56. 2000.

Chou, J., Pradhan, S. S. and Ramchandran, K. «Turbo and trellis-basedconstructions for source coding with side information». In IEEE Data CompressionConf. (DCC), Snowbird, UT . 2003.

Chung, S. Y. On the Construction of Some Capacity-Approaching Coding Schemes..Ph.D. thesis, MA: MIT Press. 2000.

Chung, S. Y., Forney, G. D. J., Richardson, T. J. and Urbanke, R. L. «Onthe design of Low-Density Parity-Check codes within 0.0045dB of the Shannonlimits». IEEE Commun. Lett., 5(2):58–60. 2001a.

Chung, S. Y., Richardson, T. J. and Urbanke, R. L. «Analysis of sum-productdecoding of low-density parity-check codes using a Gaussian approximation». IEEETrans. Inform. Theory , 47(2):657–670. 2001b.

Cohen, A. S. and Lapidoth, A. «The Gaussian watermarking game». IEEETrans. Inform. Theory , 48(6):1639–1667. 2002.

Costa, M. «Writing on dirty paper (Corresp.)». IEEE Trans. Inform. Theory ,29(3):439–441. 1983.

- 140 -

BIBLIOGRAPHY

Cover, T. M. and Chiang, M. «Duality between channel capacity and rate distor-tion with two-sided state information». IEEE Trans. Inform. Theory , 48(6):1629–1638. 2002.

Cover, T. M. and Thomas, J. A. Elements of information theory . Wiley-Interscience, New York, NY, USA. 1991.

Cox, I. J. and Miller, M. L. «The First 50 Years of Electronic Water-marking». EURASIP Journal on Applied Signal Processing , 2002(2):126–132.Doi:10.1155/S1110865702000525. 2002.

Cox, I. J., Miller, M. L. and McKellips, A. L. «Watermarking as communi-cations with side information». Proceedings of the IEEE (USA), 87(7):1127–1141.1999.

Craver, S., Memon, N., Yeo, B. L. and Yeung, M. M. «Resolving RightfulOwnerships with Invisible Watermarking Techniques: Limitations, Attacks, andImplications». IEEE Journal on Selected Areas in Communications , 16(4):573–586. 1998.

Dikici, C., Guermazi, R., Idrissi, K. and Baskurt, A. «Distributed SourceCoding of Still Images». In Proc. of European Signal Processing Conf. EUSIPCO,Antalya. 2005.

Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding ofStill Images». In Proc. of European Signal Processing Conf. EUSIPCO, Florence.2006a.

Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Codingwith Partially Available Side Information». In Proc. of SPIE Electronic Imaging ,volume 6072. 2006b.

Dikici, C., Idrissi, K. and Baskurt, A. «Tatouage Informé pour le CodageDistribué». In Proc. of CORESA. 2006c.

Eggers, J. J., Bauml, R., Tzschoppe, R. and Girod, B. «Scalar costa schemefor information embedding». IEEE Trans. Signal Processing , 51(4):1003–1019.2003.

Erez, U., Shamai, S. and Zamir, R. «Capacity and lattice strategies for cancelingknown interference». IEEE Trans. Inform. Theory , 51(11):3820–3833. 2005.

Forney, G. D. J. and Ungerboeck, G. «Modulation and coding for linearGaussian channels». IEEE Trans. Inform. Theory , 44(6):2384–2415. 1998.

Gallager, R. G. Low-Density Parity-Check Codes.. Ph.D. thesis, MA: MIT Press.1963.

- 141 -

BIBLIOGRAPHY

Garcia-Frias, J. and Zhao, Y. «Compression of correlated binary sources usingturbo codes». IEEE Commun. Lett., 5(10):417–419. 2001.

Garcia-Frias, J. and Zhao, Y. «Compression of binary memoryless sources usingpunctured turbo codes». IEEE Commun. Lett., 6(9):394–396. 2002.

Gehrigand, N. and Dragotti, P. L. «Distributed Compression in Camera SensorNetwork». In IEEE International Workshop on Multimedia Signal Processing,Siena, Italy . 2004.

Gel’fand, S. I. and Pinsker, M. S. «Coding for Channel with Random Parame-ters». Prob. Contr. Inform. Theory , 9(1):19–31. 1980.

Girod, B., Aaron, A. M., Rane, S. and Rebollo-Monedero, D. «Distributedvideo coding». In Special Issue on Video Coding and Delivery, Proceedings of theIEEE , volume 93, pages 71–83. 2005.

Hartung, F. and Kutter, M. «Multimedia watermarking techniques». Proc.IEEE , 87(7):1079–1107. 1999.

Hartung, F., Su, J. K. and Girod, B. «Spread Spectrum Watermarking: Ma-licious Attacks and Counterattacks». In SPIE Electronic Imaging, Security andWatermarking of Multimedia Contents , pages 147–158. 1999.

Heegard, C. and Gamal, A. E. «On the capacity of computer memory withdefects». IEEE Trans. Inform. Theory , 29(5):731–739. 1983.

Holliman, M. and Memon, N. «Counterfeiting attacks on oblivious block-wiseindependentinvisible watermarking schemes». IEEE Trans. Image Processing ,9(3):432–441. 2000.

Kerckhoffs, A. «La cryptographie militaire». Journal des sciences militaires ,9(1):5–38. 1883.

Kuhn, M. and Petitcolas, F. A. P. «Stirmark». http://www.petitcolas.net/fabien/watermarking/stirmark/. Access Date: Oct 2007. 2000.

Kusuma, J., Doberty, L. and Ramchandran, K. «Distributed compression forsensor networks». In IEEE Intl. Conf. on Image Processing (ICIP),Thessaloniki,Greece, volume 1, pages 82–85. 2001.

Lajnef, K. Etude du codage de sources distribuées pour de nouveaux concepts encompression vidéo. Ph.D. thesis, Thèse de doctorat en Traitement du Signal,Université de Rennes 1. 2006.

Lajnef, K., Guillemot, C. and Siohan, P. «Distributed coding of three binaryand Gaussian correlated sources using punctured turbo codes». EURASIP Journalon Applied Signal Processing , 86(11):3131–3149. ISSN 0165-1684. 2006.

- 142 -

BIBLIOGRAPHY

Le Gall, D. and Tabatabai, A. «Sub-band coding of digital images using sym-metric short kernel filters and arithmetic coding techniques». In International Con-ference on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages761–764. 2000.

Le Guelvouit, G. «Trellis-coded quantization for public-key watermarking». InIEEE Int. Conf. on Acoustics, Speech and Signal Processing . 2005.

Levický, D. and Foriš, P. «Human Visual System Models in Digital Image Wa-termarking». In Radioengineering , volume 13, pages 38–43. 2004.

Liu, Z., Cheng, S., Liveris, A. D. and Xiong, Z. «Slepian-Wolf Coded NestedLattice Quantization for Wyner-Ziv Coding: High-Rate Performance Analysis andCode Design». IEEE Trans. Inform. Theory , 52(10):4358–4379. 2006.

Liveris, A. D., Lan, C. F., Narayanan, F., Xiong, Z. and Georghiades,

C. N. «Slepian-Wolf coding of three binary sources using LDPC codes». InInternational Symposium on Turbo Codes and Related Topics . 2003a.

Liveris, A. D., Xiong, Z. and Georghiades, C. N. «Compression of binarysources with side information at the decoder using LDPC codes». IEEE Commun.Lett., 6(10):440–442. 2002a.

Liveris, A. D., Xiong, Z. and Georghiades, C. N. «A Distributed SourceCoding Technique For Highly Correlated Images Using Turbo-Codes». In IEEEInt. Conf. Acoust., Speech, Signal Processing (ICASSP), Orlando. 2002b.

Liveris, A. D., Xiong, Z. and Georghiades, C. N. «Distributed compression ofbinary sources using conventional parallel and serial concatenated convolutionalcodes». In IEEE Data Compression Conf. (DCC), Snowbird, UT . 2003b.

Mackay, D. J. C. Information Theory, Inference, and Learning Algorithms . Cam-bridge University Press. 2003.

Mackay, D. J. C. and Neal, R. M. «Near Shannon limit performance of lowdensity parity check codes». Electronics Letters , 33(6):457–458. 1997.

Maor, A. and Merhav, N. «On Joint Information Embedding and Lossy Com-pression». IEEE Trans. Inform. Theory , 51(8):2998–3008. 2005.

Marcellin, M. W. and Ficher, T. R. «Trellis Coded Quantization of Memorylessand Gauss-Markov Sources». IEEE Trans. Commun., 38(1):82–93. 1990.

Marcellin, M. W., Gormish, M. J., Bilgin, A. and Boliek, M. P. «AnOverview of JPEG-2000». In Data Compression Conference, pages 523–544. 2000.

Merhav, N. and Shamai, S. «On joint source-channel coding for the Wyner-Ziv source and the Gel’fand-Pinsker channel». IEEE Trans. Inform. Theory ,49(11):2844–2855. 2003.

- 143 -

BIBLIOGRAPHY

Miller, M. L., Doërr, G. J. and Cox, I. J. «Applying informed coding andembedding to design a robust high-capacity watermark». IEEE Trans. ImageProcessing , 13(6):792–807. 2004.

Moulin, P. and Mihcak, M. K. «The parallel-Gaussian watermarking game».IEEE Trans. Inform. Theory , 50(2):272–289. 2004.

Moulin, P. and O’Sullivan, J. A. «Information-Theoretic Analysis of InformationHiding». IEEE Trans. Inform. Theory , 49(3):563–593. 2003.

Moulin, P. and Wang, Y. «Capacity and Random-Coding Exponents for ChannelCoding With Side Information». IEEE Trans. Inform. Theory , 53(4):1326–1347.2007.

Oohama, Y. «Gaussian Multiterminal Source Coding». IEEE Trans. Inform. The-ory , 43(6):1912–1923. 1997.

Ozonat, K. «Lossless distributed source coding for highly correlated still images».2000.

Pearl, J. Probabilistic Reasoning in Intelligent Systems : Networks of PlausibleInference. Morgan Kaufmann. 1988.

Pereira, S. «Checkmark». http://watermarking.unige.ch/Checkmark. AccessDate: Oct 2007. 2001.

Petitcolas, F. A. P. «Watermarking schemes evaluation». IEEE Trans. SignalProcessing , 17(5):58–64. 2000.

Pradhan, S. S., Chou, J. and Ramchandran, K. «Duality between sourcecoding and channel coding and its extension to the side information case». IEEETrans. Inform. Theory , 49(5):1181–1203. 2003.

Pradhan, S. S., Kusuma, J. and Ramchandran, K. «Distributed compression ina dense micro-sensor network». IEEE Signal Processing Mag., 19(3):51–60. 2002.

Pradhan, S. S. and Ramchandran, K. «Distributed Source Coding Using Syn-dromes (DISCUS): Design and Construction». In DCC ’99: Proceedings of theConference on Data Compression, page 158. IEEE Computer Society, Washing-ton, DC, USA. 1999.

Pradhan, S. S. and Ramchandran, K. «Distributed source coding: Symmetricrates and applications to sensor networks». In IEEE Data Compression Conf.(DCC), Snowbird, UT . 2000.

Puri, R., Majumbar, A., Ishwar, P. and Ramchandran, K. «Distributed videocoding in wireless sensor networks». IEEE Signal Processing Mag., 23(4):94–106.2006.

- 144 -

BIBLIOGRAPHY

Puri, R. and Ramchandran, K. «PRISM: A new robust video architecture basedon distributed compression principles». In Allerton Conf. Communication Control,and Computing, Allerton, IL. 2002.

Rebollo-Monedero, D. and Girod, B. «Design of optimal quantizers for dis-tributed coding of noisy sources». In IEEE Int. Conf. Acoust., Speech, SignalProcessing (ICASSP), Philadelphia. 2005.

Richardson, T. J., Shokrollahi, M. A. and Urbanke, R. L. «Design ofcapacity-approaching irregular low-density parity-check codes». IEEE Trans. In-form. Theory , 47(2):619–637. 2001.

Richardson, T. J. and Urbanke, R. L. «The capacity of Low-Density Parity-Check codes under message-passing decoding». IEEE Trans. Inform. Theory ,47(2):599–618. 2001a.

Richardson, T. J. and Urbanke, R. L. «Efficient encoding of low-density parity-check codes». IEEE Trans. Inform. Theory , 47(2):638–656. 2001b.

Salehi, M. «Capacity and Coding for Memories with Real-Time Noisy DefectInformation at Encoder and Decoder». In Proceedings of the IEE Communication,Speech and Vision, volume 139, pages 113–117. 1992.

Schonberg, D., Pradhan, S. S. and Ramchandran, K. «LDPC Codes CanApproach the Slepian-Wolf Bound for General Binary Sources». In 40th Aller-ton Conf. Communication Control, and Computing, Allerton, IL, pages 576–585.2002.

Shannon, C. E. «Channels with side information at the transmitter». In IBM J.of Research and Development , volume 2, pages 289–293. 1958.

Shannon, C. E. «Coding theorems for a discrete source with a fidelity criterion».In IRE Nat. Conv. Rec., Pt. 4 , pages 142–163. 1959.

Slepian, D. and Wolf, J. «Noiseless coding of correlated information sources».IEEE Trans. Inform. Theory , 19(4):471–480. 1973.

Stankovic, V., Yang, Y. and Xiong, Z. «Distributed Source Coding for Multi-media Multicast Over Heterogeneous Networks». IEEE Journal of Selected Topicsin Signal Processing , 1(2):220–230. 2007.

Stone, H. S. «Analysis of attacks on image watermarks with randomized coeffi-cients». 1996.

Su, J. K., Eggers, J. J. and Girod, B. «Illustration of the duality betweenchannel coding and rate distortion with side information». In 34th Asilomar ConfSignals, Systems and Computers, Pacific Grove, CA, USA, Oct. 29-Nov. 1 . 2000.

- 145 -

BIBLIOGRAPHY

Tepe, K. E. and Anderson, J. B. «Turbo codes for binary symmetric and binaryerasure channels». In IEEE International Symposium on Information Theory ,page 59. 1998.

Ungerboeck, G. «Channel Coding with Multilevel/Phase Signals». IEEE Trans.Inform. Theory , 28(1):55–67. 1982.

Varodayan, D., Aaron, A. M. and Girod, B. «Rate-adaptive distributed sourcecoding using low-density parity-check codes». In 39th Asilomar Conf Signals,Systems and Computers, Pacific Grove, CA, USA. 2005.

Varodayan, D., Aaron, A. M. and Girod, B. «Rate-adaptive codes fordistributed source coding». EURASIP Journal on Applied Signal Processing ,86(11):3123–3130. 2006.

Viterbi, A. «Error bounds for convolutional codes and an asymptotically optimumdecoding algorithm». IEEE Trans. Inform. Theory , 13(2):260–269. 1967.

Voloshynovskiy, S., Koval, O., Pérez-González, F., Mihcak, K. M. andPun, T. «Data-hiding with host state at the encoder and partial side informationat the decoder». URL http://vision.unige.ch/publications/postscript/

2005/VoloshynovskiyKovalPerezGonzalezMihcakPun_SP2005.pdf, (preprint).2004.

Watson, A. B. «DCT quantization matrices visually optimized for individual im-ages». In SPIE Human Vision, Visual Processing, and Digital Display IV , volume1913, pages 202–216. 1993.

Watson, A. B., Yang, G. Y., Solomon, J. A. and Villasenor, J. «Visibilityof wavelet quantization noise». IEEE Trans. Image Processing , 6(8):1164–1175.1997.

Westerlaken, R. P., Klein Gunnewiek, R. and Lagendijk, R. L. «Turbo-Code Based Wyner-Ziv Video Compression». In Twenty-sixth Symposium on In-formation Theory in the Benelux , pages 113–120. 2005.

Wyner, A. «Recent results in the Shannon theory». IEEE Trans. Inform. Theory ,20(1):2–10. 1974.

Wyner, A. and Ziv, J. «The rate-distortion function for source coding with sideinformation at the decoder». IEEE Trans. Inform. Theory , 22(1):1–10. 1976.

Xiong, Z., Liveris, A. D. and Cheng, S. «Distributed source coding for sensornetworks». IEEE Signal Processing Mag., 21(5):80–94. 2004.

Yang, E. H. and Sun, W. «Combined Source Coding and Watermarking». InInformation Theory Workshop, Proceedings of the IEEE , pages 322–326. 2006.

- 146 -

BIBLIOGRAPHY

Zaidi, A. and Duhamel, P. «On coding with a partial knowledge of the stateinformation». In Proceedings of the IEEE 39th Asilomar conference on Signals,Systems and Computers, pages 657–661. 2005.

Zamir, R. and Shamai, S. «Nested linear/ lattice codes for Wyner-Ziv encoding».In IEEE Information Theory Workshop, Killarney, Ireland , pages 92–93. 1998.

Zamir, R., Shamai, S. and Erez, U. «Nested linear/lattice codes for structuredmultiterminal binning». IEEE Trans. Inform. Theory , 48(6):1250–1276. 2002.

Zhu, X., Aaron, A. M. and Girod, B. «Distributed compression for large cameraarrays». In IEEE Workshop on Statistical Signal Processing, St Louis, Missouri .2003.

- 147 -

BIBLIOGRAPHY

- 148 -

INSA de LYONINSA de LYONINSA de LYON

Informed Watermarking and Compression of Multi-Sources

Technological advances in the fields of telecommunications, multimedia and the diverse choiceof portable handhelds during the last decade, derive to create novel services such as sharing of mul-timedia content, video-conferencing or content protection, where all running on low-power devices.Hence alternative low complexity coding techniques need to be developed for replacing conventionalones. Coding with state information, a potential solution to shifting the encoder complexity to thedecoder, has two main applications:

1)Distributed Source Coding(DSC) for compressing a source given a correlated version of it isavailable only to the decoder.

2)Informed Data Hiding(IDH) for embedding a watermark to a host signal where the host signalis available only to the encoder.

For each problem stated above, practical code designs that operate close to the theoretical limitsare proposed. The combination of good error correcting codes such as Low Density Parity-Check(LDPC) Codes and good quantization codes such as Trellis Coded Quantization (TCQ) are usedat the design of the proposed capacity approaching codes.

Moreover, the theoretical achievable rate limits for a relaxed IDH setup, such that a noisyobservation of the host signal is available to the decoder is derived.

Finally, motivated by the strong duality between DSC and IDH, a hybrid scheme that usesboth data hiding and compression is proposed. In addition to the derivation of theoretical channelcapacity and rate distortion function, a complete framework is proposed.

Keywords: Coding with State Information, Compression, Watermarking, Distributed Source Cod-ing, Writing on Dirty Paper, Low Density Parity Check Codes, Trellis Coded Quantization.

Tatouage informé et Compression Multi-sources

Les avancées technologiques qu’ont connu les télécommunications, le multimédia et les systèmesmobiles ont ouvert la porte à l’émergence, puis au développement de nouveaux services tels que lepartage de bases de donées multimedia, la vidéo-conférence ou la protection des contenus, tout enutilisant des systèmes à faible puissance. D’où la nécessité de disposer de nouvelles techniques decodage à complexité réduite. Les techniques de codage exploitant la présence d’une informationparallèle peuvent constituer une solution potentielle permettant de déporter la complexité de codagevers le décodeur. Celles-ci s’appliquent notamment à deux principes de codage :

1) Le codage de source distribué (Distributed Source Coding DSC ) pour compresser un signaldonné, sachant qu’un autre signal corrélé à celui d’origine est disponible au niveau du décodeur.

2) La dissimulation de données informée (Informed Data Hiding IDH ) permettant d’insérer unmessage dans un signal hôte, ce dernier n’étant connu qu’au codeur.

Pour chacune de ces deux techniques, nous proposons des solutions qui approchent les limitesthéoriques. Nous combinons pour cela des techniques performantes tant de codage canal, de typeLDPC, que de quantification de type Treillis (TCQ). Par ailleurs, nous étudions les limites théoriquespouvant être atteintes par IDH, dans le cas où une version bruitée du signal hôte est disponible audécodeur.

Enfin, exploitant la forte dualité qui existe entre DSC et IDH, nous proposons un schéma pra-tique hybride complet mettant en œuvre les deux techniques, ainsi qu’une étude théorique de lafonction débit / distorsion et de la capacité d’un tel système.

Mots clés : Codage avec information adjacente, compression, tatouage, codage de sources dis-tribuées, LDPC, TCQ.

Laboratoire d’InfoRmatique en Images et Systèmes d’information, UMR 5205 CNRS

informed watermarking and compression of multi-sourceselise, fab, antho and claris for their weekly...

Documents