distributed source coding for wireless sensor networks · l practical distributed source coding...

Distributed Source Coding for Wireless Sensor Networks

Mark PerilloMarch 14, 2007

Overview

l Review of information Theory Principles

l Distributed Source Coding Principles

− Lossless

− Lossy

− The CEO Problem

l Practical Distributed Source Coding Schemes

l Ideas For Work In Distributed Source Coding

Review of Information Theory Principles


l H(x): Entropy - a measure of the information contained of a random variable

− For Bernoulli random variable X

l p = 0.5, q=0.5 à H(X) = H(p) = 1 bit

l p = 0.1, q=0.9 à H(X) = H(p) = 0.47 bit

− For uniform random variable

ii

i ppXH log)( ∑−=

},,3,2,1{ NY L∈NYH log)( =


l H(X|Y): Conditional entropy - a measure of the information remaining in X, once Y is known

− For example

l Uniform random variable X à

l But if we know Y and we know that X = Y or X=Y+1 with equal probability, H(X|Y) = 1 bit

NXH log)( =

∑∑−=x y

yxpyxpXH )|(log),()(


l H(X,Y): Joint Entropy – a measure of the total information in X and Y

l I(X;Y): Mutual Information – a measure of the amount of information shared by two random variables X and Y

∑∑−=x y

yxpyxpYXH ),(log),(),(

∑∑=x y ypxp

yxpyxpYXI

)()(),(

log),();(


I(X;Y)H(X|Y) H(Y|X)

H(Y)H(X)

H(X,Y)


l Source coding

− A large block of n copies of i.i.d RV X can be compressed into nH(X) bits

− Based on the theory of typical sets and the Asymptotic Equipartition Property (AEP)

Asymptotic Equipartition Property

l X1, X2, X3…Xn are i.i.d. random variables

l It is very likely that

l This is a direct result of the weak law of large numbers

)(321

321

2),,,,(

)(),,,,(log1

XnHn

n

XXXXp

XHXXXXpn

−≈

≈−

…

…


l If we have a large sequence of random variables, then it is very likely that the drawn sequence will have joint probability about equal to 2-nH(x)

l There are a total of about 2nH(x) of these “typical” sequences in the typical set where nearly all of the probability is concentrated

))((321

))((

)(321

2),,,,(2

),,,,(εε

ε

−−+− ≤≤

∈XHn

nXHn

nn

xxxxp

Axxxx

……

)(nAε


l If we do a good job compressing the “typical” sequences, the overall quality of the job we do will be good

− Even if we do a poor job compressing the “atypical” sequences

l In fact, we can compress a block of X with length n into nH(X) bits

l Similarly, a block of (X,Y) with length n can be compressed into nH(X,Y) bits

Jointly Typical Sequences

<−−

<−−

<−−

=

ε

ε

ε

ε

),(),(log1

)()(log1

)()(log1

:),(

)(

YXHyxpn

YHypn

XHxpn

yx

A

nn

n

n

nn

n

Rate Distortion Theory

l The source coding theorem states that in order to reconstruct a discrete input sequence with no loss of information (i.e., no distortion), we must encode at a rate of H(X)

l But what if some distortion is allowable?

l And what if we want to describe a continuous random variable with a discrete alphabet?

l What rates are required then?


l The answers to this question comes in the form of the rate distortion function R(D)

l Typical distortion measures:

− Hamming distance (discrete sources)

− Mean squared error (continuous sources)

)ˆ;(min)()ˆ,(

)ˆ,()|ˆ()(:)|ˆ(XXIDR

xx

Dxxdxxpxpxxp ∑=

≤


l Examples

− Bernoulli Source with Hamming distortion measure

− Gaussian Source with MSE distortion measure

>≤≤−

=pD

pDDHpHDR

00)()(

)(

>

≤≤=2

22

0

0log21

)(σ

σσ

D

DDDR

Distributed Source Coding Principles

Distributed Source Coding Example

l Two temperature sensor with 8-bit ADC

l Entropy H(X) = H(Y) = 8 bits

l But Y = X + N, N={0,1} with equal probability

l H(Y|X) ~= 1 bit

l Collocated encoder:

− R = H(X,Y) = H(X) + H(Y|X) = 8 + 1 = 9 bits

l Separate encoders with no awareness of each other:

− R = H(X) + H(Y) = 8 + 8 = 16 bits

Lossless Distributed Source Coding

X Y

Decoder

Encoder

H(X,Y)

R = H(X,Y)


X Y

Decoder

Encoder

H(X)

Encoder

H(Y)

R = H(X)+H(Y)


X Y

Decoder

Encoder

H(X)

Encoder

H(Y|X)

R = H(X)+H(Y|X)= H(X,Y)

Can we do this?Slepian and Wolf proved that you can indeed

Slepian-Wolf Coding

l Can we encode data sequences separately with no rate increase? YES!!!

l Proof is achieved via analysis of an encoding strategy using “binning”

Slepian-Wolf Coding

l Choose rates Rx and Ry to meet Slepian-Wolf requirements

l Assign every sequence xn to a random bin on {1,2,… 2nRx} and every yn to a random bin on {1,2,… 2nRy}

l Transmit the bin indices to the decoder

l Look for jointly typical sets in (xn ,yn) that share indices

),(

)|()|(

XYHRR

XYHRYXHR

YX

Y

X

=+

==

Lossy Distributed Source Coding

2nRx bins

xn={0,1,0,…,0,1}xn={0,1,0,…,1,0}xn={0,1,0,…,1,1}

xn={0,1,0,…,0,0}…

…

Lossy Distributed Source Coding2n

Rybi

ns

2nRx binsxn

yn

Will result in decoding errors

2nH(X,Y) jointly typical sequences

xn={0,1,0,…,1,0}yn={0,0,0,…,1,0}

Slepian-Wolf Coding

l When do we get decoding errors?

− When the X and Y sequences are not jointly typical

l Rare from Joint AEP

− When more than one X in the same bin that is jointly typical with Y (or vice versa)

l Rare if Rx > H(X|Y) (and Ry > H(Y|X))

− When more than one jointly typical (X,Y) sequence in a single product bin (and vice versa)

l Rare if Rx + Ry > H(X,Y)

Lossy Distributed Source Coding

Rx

Ry

H(X)H(X|Y)

H(Y|X)

H(Y) Separate Encoders

Slepian-Wolf Coding

l With two variables, we can achieve considerable savings in amount of traffic required

l In general, for n sensors, the Slepian-Wolf requirement is as follows

{ }SjXSX

RSR

mSSXSXHSR

j

Sii

c

∈=

=

⊆∀>

∑∈

:)(

)(

},...,2,1{))(|)(()(

Source Coding w/ Side Information

l Look at a special case when ONLY X needs to be recovered at the decoder

EncoderXRx

Decoder

Y

X̂

EncoderRy


l For U such that

l Then we can send at rates

);(

)|(

UYIR

UXHR

y

x

≥≥

)|(),(),,( yupyxpuyxp =

UYX →→


l One extreme: U = Y

l Another extreme: U is uncorrelated with X,Y

l The results from Wyner basically say that we can operate between these two extremes

)();();(

)|()|(

YHYYIUYIR

YXHUXHR

y

x

==≥

=≥

0);(

)()|(

=≥=≥

UYIR

XHUXHR

y

x

R-D w/ Side Information

l If we want to reproduce X with some distortion D, what rate must we send at, given that we have access to side information Y?

EncoderX Decoder

Y

X̂)(* DRY

[ ] DXXdE <)ˆ,(


l If X, Y are uncorrelated, this is just the rate distortion function R(D)

l Since side information can only help,

l If no distortion is allowed, then

as Slepian and Wolf showed

)()(* DRDRY ≤

)|()0(* YXHRY =


l In general,

l f is the reconstruction function

l w is the encoded version of x

∑∑∑ ≤

−=

x w y

fxwpY

Dwyfxdxwpyxp

WYIWXIDR

)),(,()|(),(

));();((minmin)()|(

*


l Remember: in lossless (S-W coding), we pay no penalty for separation of X and Y

l Unfortunately, this is not the case in lossy source coding

l In general,

: rate required when X’s encoder has access to Y

l However, equality is achieved when (X,Y) are jointly Gaussian

)()( |* DRDR YXY ≥

)(| DR YX

Distributed Data Compression

l The rate-distortion region for this general problem is unknown

EncoderX

Decoder

Y

( )YX ˆ,ˆ

XR

EncoderYR

The CEO Problem

The CEO Problem

Encoder

Decoder X̂…

X~p(X)

)|( 1 xyW

Encoder)|( 2 xyW

Encoder)|( xyW L

The CEO Problem

l If the CEO’s L agents were able to convene, they could smooth the data (i.e., average out the noise) to obtain a true value of X and then send data at a rate R(D) required to keep distortion below D

l But what happens if they cannot convene?

− Assume a sum rate requriement ∑ ≤i

i RR

The CEO Problem

l Berger et al originally found the limits of this problem for a Hamming distance measure

l For R > H(X), Pe(R) does not go to 0

l In fact, as R gets large, whereis a constant for a given source distribution

and joint source/observation distribution

l In other words, there is always a penalty for not being able to convene

RWpe RP ),(2)( α−=

),( Wpα

The Quadratic Gaussian CEO Problem

Encoder

Decoder X̂

+

1N

Encoder+

2N

Encoder+

LN

…X


l Viswanathan and Berger et al found the limits of this problem as well

l As R and L get very large, what happens to D as a function of sum rate R


l Result

l Oohama proved that for Gaussian X,N, these bounds are tight

2

222

2

22

)|( 2),(

)|(~

log

)|;(inf0

X

NNX

X

yuQ

XUQX

E

XUYIσσ

σσβσ

≤≤

∂∂−

<

222 ),(

limlim),(X

LRNXLRD

Rσ

σσβ∞→∞→

=


l This means that distortion decays as

l Compare with the case when agents can all convene

l Agents can smooth out the data and get rid of noise

l So again, a penalty is paid for not being able to convene

RD N

2

2σ=

RXD 22 2−= σ


• How do we achieve this?

Encoder Decoder

X̂

+

1N

X

f g

SW codingBlock Coding

SW

Decoding

Block D

ecoding

Estim

ation

RD N

2

2σ=

Practical Distributed Source Coding Schemes

Draper et al’s Work

l Side Information aware limit:

− For a given rate constraint R

[ ]

);();(,,

)),(,(min,

uyIuyIRyyxyu

uyfxDEd

DE

NDE

Duf

−>→→

≥Π∈

x

yEyN

yD


l How to do this:

− Construct a code of ~2nI(ye;u) typical codewords

− Bin each of these into one of 2nR bins

l ~2n(I(ye;u)-R) codewords per bin

− Block encode and transmit codeword’s bin (cosetindex)

− Decoder choose codeword in coset that is jointly typical with its observation


l Side Information aware limit:

− For a given rate constraint Rcut

− One of these constraints exists for all possible cuts

[ ]

);();(

,

)),(,(min,

uyIuyIR

yxyu

uyfxDEd

c

c

c

AAcut

AA

A

uf

−>

→→

≥Π∈

x

A

AC

Rcut


l Quadratic Gaussian Case:

( ) Ryyxyxyyx

yyx

yyxyx

DEDDE

DE

DED

Rd

ddR

22,|

2|

2,|

2,|

2,|

2|

2)(

log21

)(

−−+=

−

−=

σσσ

σ

σσ

2| Dyxσ

2,| ED yyxσ

:Minimum distortion given decoder observation

:Minimum distortion given encoder, decoder observation


l Serial Network

2|1

2

1

21

2|

1

1

1

121

1

yx

R

l

l

x

l

yxll

lll

d

Nd

d

dNdN

d l

l

σ

σσ

=

+

−

++

≥ −−

−

−

−

−

yL

y2

y1


l Parallel Network

20

2

1

21

1

1 2

x

R

ll

l

ll

lll

d

dNd

dNdN

d

σ=

++

+≥ −

−

−

−

−

yL

y2

y1

yL-1


l We can apply the serial and parallel results to get the achievable distortion of a general sensor network tree


• Serial Network• Source variance = 4• Noise variance = 4/3• R = 2.5 bits

Source Coding Using Syndromes

l Originally introduced by Wyner in 1974

l Used by many in the literature recently, notably DISCUS by Pradhan et al.

l Exploits duality between Slepian-Wolf coding and channel coding

DISCUS Example

l X = {0,1}n and Y = {0,1}n , n = 3

l X, Y’s correlation is such that they differ in at most 1 bit

l Thus, given Y, X can take 4 values:

l Same or differ in the 1st,2nd,or 3rd bit

l H(X|Y) = 2

− If we send all of Y, this is the S-W limit

DISCUS Example

l Divide codeword space into 4 cosets

− 0: {000,111}, 1: {001,110}, 2: {010,101}, 3: {011,100}

− Send only the coset index (4 cosets = 2 bits)

− This meets the S-W bound

l Given Y, only one of the members of the coset can have a Hamming distance of 0 or 1

l Coset index is just the syndrome of x using the parity-check matrix H

=

=

101011

H

Hxs

DISCUS Example

l Decoder knows that choices for X are [010] and [101]

l Since Y is [110], it has to be [010] since [101] has a Hamming distance of 2

[ ][ ]

=

==

=

01

010

101011

011

010

HxsY

XT

T

i.e., coset index is 2


l Lets say we have correlated X and Y

l X = Y + U

− U is a Bernoulli RV with p < 0.5

l Y sends full information

− RY = H(Y)

l X sends partial information, hopefully at the Slepian-Wolf limit


l Lets generate a parity check matrix H (m x n)

l X’s encoder generates n samples and then calculates the syndrome sx = HX

− No guarantee that X is a valid codeword (in the null space of X) so sx can take any value

l X’s encoder sends sx to the decoder


l Decoder generates syndrome of Y

− sY = HY

l Decoder calculates sX+sY sX+Y = sU

l If X and Y are highly correlated, we can expect that there aren’t too many differences

− i.e., U should be mostly zeros

l Decoder can now approximate U (= f(sU) ) and therefore, approximate X


l Alternative way to do this (from DISCUS paper)

− s=Hx

− y’=y+[0|s]

− Find x’, the closest code to y’ in the coset with s=[00…0]

l i.e., the closest valid codeword in the null spoace of H

− x=x’+[0|s]


l In general, if we have a (n,k,2t+1) code,

− R = n - k

− S-W bound is the log of number of possible outcomes for X given Y, given each is equally likely

− In previous example, n=3, k=1, t=1

l = log(1+3) = 2, R = n-k = 2

∑=

t

i

SWX i

nR

0

log

SWXR


l A (n,k,2t+1) code can be used if the differences between X and Y are at most t

l Lets look at an X,Y correlation structure that can be modeled asa BSC with crossover probability p

l Probability of a correct decode is the probability that the number of crossovers <= t

it

i

in ppin

P ∑=

−−

=

0

)1(


l The benefit of using good channel codes should be apparent now

l There has been a lot of work recently applying state of the art codes like LDPC and Turbo codes to the S-W problem

l Also, some have used convolutional codes in similar ways

DISCUS with Continuous Sources

Encoder Decoder

Y

WXX’

Correlation

Exists

Fictitious channel P(Y|W) existsW sends to Y at a rate of I(W;Y)This means fewer bits actually need to be sentRequired rate is now H(W|Y) = H(W) – I(W;Y)


l Lets say that we set the fictitious rate at 1 bit

l This means we need to send two bits

l Lets make our cosets {1,5}, {2,6}, {3,7}, {4,8}

l Sample X and then send the coset index

1 3 4 5 6 7 82


l X quantizes to value of 6

l X send coset index 1 à {2,6}

l Decoder knows X has been quantized to 2 or 6

l Remember that the fictitious channel sent the other bit

l This means that due to knowledge of Y and the correlation structure, the decoder can disambiguate between the 2 possible values (1 bit)

X Y

W

Time-Sharing

Rx

Ry

H(X)H(X|Y)

H(Y|X)

H(Y) Separate Encoders

l Most typical distributed coding schemes operate on the corners of the achievability region

l But you can operate anywhere on the curve

l Easiest way is via time-sharing

− i.e., X sends full data, Y sends partial data for some portion of time, and then they reverse roles for some portion of time

Time-Sharing

Critescu et al’s Work

l Compare approaches for correlated data gathering in tree networks

− S-W model

l Coding complex, Transmission Optimization simple

− Joint Entropy Coding model

l No S-W, encoding occurs after others’ data is explicitly known

l Coding simple, Transmission Optimization complex

l How does the choice affect rate allocation?


l Main result:

− S-W Coding – allocate high rates to nodes close

l Since they have to route over fewer hops

l Liu et al in Mobicom 2006 paper came up with similar results

− Joint Entropy Coding – allocate high rates to nodes furthest nodes and lower rates to nearby nodes

l Since they can use others as side information


l SW scheme

− Optimize ST and then use LP for rate allocation

− LP solution has closed form

l Approximation:

− Each node find nodes in neighborhood that are closer on the SPT

− Transmit at a rate

),,,|(

)|()(

121

122

11

XXXXHR

XXHRXHR

NNN …L

−=

==

)|( iii CXHR =

iC


l Clustered SW scheme

− Since S-W for many nodes is very complex, we can do it in clusters

− How do we choose clusters such that

− NP-complete problem for all but a few degenerate cases

{ }{ }

= ∏

==

C

iI

ICi iC

ii

KI1,

* detminarg1


l Joint Entropy Coding scheme

− Problem is NP-complete

− Authors provide SPT-based heuristics

Sensor Correlation

l A lot of the solutions for limits of distributed source coding assume the specific cases of

− Binary sources

l Correlation modeled as a BSC

l Distortion measured as Hamming distance

− Jointly Gaussian sources

l Distortion measured as MSE

Sensor Correlation

l How should we model correlation of real world sensor data?

− We could use a training phase where the correlation is learned over time

− We can assume that nearby sensors have higher correlation

l True in many real-world applications

Sensor Correlation

• Pattem et al’s approach– If X and Y are separated

by a distance d(X,Y)

– Fit this data to empirical rainfall data

)(1

),(1

1)|( 221

12 XH

cXXd

XXH

+−=

Sensor Correlation

l Additional sensors add and additional

bits of information

l If all sensors equally spaced by d, total information is about

)(1

),(1

1 221

XH

cXXd

+−

)(1

11)1()()( 11 XH

cd

nXHXH

+−−+=

Sensor Correlation

l Critescu et al assumes a Gaussian Markov Field

( )

( ) ( )

=

=

==

=

−

−

−−− −

C

C

T

Y

YNC

k

cdijij

XKX

KK

eYYH

KeXH

eK

eK

Xf

detdet

2log21

|

det2log21

)(

)det(21

)(

2

1

22

)()(21

π

π

σσ

π

µµ

Sensor Correlation

l Liu et al use 3 models in their analysis

− Hard Continuity Field

l In general, you could use

− Linear Covariance Continuity Field

l In general, you could use

− Gaussian Markov Field

)(21 dfXX ≤−

cde−= 22,1 σσ

dXX ≤− 21

[ ] 2221 )( dXXE ≤−

[ ] )()( 221 dfXXE ≤−

Ideas For Work In Distributed Source Coding


l There has a lot of work in this field, as applied to WSN, especially in the last 5 or so years

l Some research has looked at minimizing cost of gathering data in WSN

− What about maximizing network lifetime, a la DAPR, MiLAN, etc?

− What about jointly optimizing transmission ranges with network topology and rate allocation?


l When calculating cost of data gathering, most look at fundamental limits, assuming long block lengths, no overhead

− What happens when latency is important and we cannot encode long blocks?

l DISCUS example shows that significant gains can still be made

− What happens when packet overhead can’t be ignored?

l e.g., the cost for sending data at 1 bit/sample isn’t much different than sending at 10 bits per sample?

References

References

l Slepian and Wolf, “Noiseless Coding of Correlated Information Sources,” IEEE Transactions on Information Theory, July 1973.

l Wyner, “Recent results in the Shannon theory,” IEEE Transactions on Information Theory, Jan. 1974.

l Wyner, “On Source Coding with Side Information at the Decoder,” IEEE Transactions on Information Theory, May 1975.

l Wyner and Ziv, “The Rate-Distortion Function for Source Coding with Side Information at the Decoder,” IEEE Transactions on Information Theory, Jan 1976.

l Cover and Thomas, “Elements of Information Theory,” 1991.

References

l Berger, Zhang, and Viswanathan, “The CEO Problem,” IEEE Transactions on Information Theory, May 1996.

l Viswanathan and Berger, “The Quadratic Gaussian CEO Problem,” IEEE Transactions on Information Theory, Sep. 1997.

l Oohama, “The Rate-Distortion Function for the Quadratic Gaussian CEO Problem,” IEEE Transactions on Information Theory, May 1998.

l Gastpar, “The Wyner-Ziv Problem With Multiple Sources,” IEEE Transactions on Information Theory, Nov. 2004.

Referencesl Luo and Pottie, “Balanced Aggregation Trees for Routing Correlated Data in

Wireless Sensor Networks,” ISIT 2005.

l Xiong, Liveris, and Cheng, “Distributed Source Coding for Sensor Networks,” INFOCOM, 2003.

l Draper and Wornell, “Side Information Aware Coding Strategies for Sensor Networks,” IEEE JSAC, Aug. 2004.

l Pradhan and Ramchadran, “Distributed Source Coding Using Syndromes (DSICUS): Design and Construction,” IEEE Transactions on Information Theory, Feb, 2003.

l Pradhan and Ramchadran, “Distributed Source Coding Using Syndromes (DSICUS): Design and Construction,” IEEE Transactions on Information Theory, Feb, 2003.

References

l Cristescu, Beferull-Lozano, and Vetterli, “On Network Correlated Data Gathering,” INFOCOM 2004.

l Cristescu, Beferull-Lozano, Vetterli, and Wattenhoffer, “Network Correlated Data Gathering With Explicit Communication: NP-Completeness and Algorithms,” IEEE/ACM Trnsactions on Networking, Feb. 2006.

l Liu, Adler, Towsley, and Zhang, “On Optimal Communication Cost for Gathering Correlated Data Through Wireless Sensor Networks”, MobiCom, 2006.

l Pattem, Krishnamachari, and Govindan, "The Impact of Spatial Correlation on Routing with Compression in Wireless Sensor Networks," IPSN, 2004.

distributed source coding for wireless sensor networks · l practical distributed source coding...

Documents