distributed source coding for wireless sensor networks · l practical distributed source coding...
TRANSCRIPT
Distributed Source Coding for Wireless Sensor Networks
Mark PerilloMarch 14, 2007
Overview
l Review of information Theory Principles
l Distributed Source Coding Principles
− Lossless
− Lossy
− The CEO Problem
l Practical Distributed Source Coding Schemes
l Ideas For Work In Distributed Source Coding
Review of Information Theory Principles
Review of Information Theory Principles
l H(x): Entropy - a measure of the information contained of a random variable
− For Bernoulli random variable X
l p = 0.5, q=0.5 à H(X) = H(p) = 1 bit
l p = 0.1, q=0.9 à H(X) = H(p) = 0.47 bit
− For uniform random variable
ii
i ppXH log)( ∑−=
},,3,2,1{ NY L∈NYH log)( =
Review of Information Theory Principles
l H(X|Y): Conditional entropy - a measure of the information remaining in X, once Y is known
− For example
l Uniform random variable X à
l But if we know Y and we know that X = Y or X=Y+1 with equal probability, H(X|Y) = 1 bit
NXH log)( =
∑∑−=x y
yxpyxpXH )|(log),()(
Review of Information Theory Principles
l H(X,Y): Joint Entropy – a measure of the total information in X and Y
l I(X;Y): Mutual Information – a measure of the amount of information shared by two random variables X and Y
∑∑−=x y
yxpyxpYXH ),(log),(),(
∑∑=x y ypxp
yxpyxpYXI
)()(),(
log),();(
Review of Information Theory Principles
I(X;Y)H(X|Y) H(Y|X)
H(Y)H(X)
H(X,Y)
Review of Information Theory Principles
l Source coding
− A large block of n copies of i.i.d RV X can be compressed into nH(X) bits
− Based on the theory of typical sets and the Asymptotic Equipartition Property (AEP)
Asymptotic Equipartition Property
l X1, X2, X3…Xn are i.i.d. random variables
l It is very likely that
l This is a direct result of the weak law of large numbers
)(321
321
2),,,,(
)(),,,,(log1
XnHn
n
XXXXp
XHXXXXpn
−≈
≈−
…
…
Asymptotic Equipartition Property
l If we have a large sequence of random variables, then it is very likely that the drawn sequence will have joint probability about equal to 2-nH(x)
l There are a total of about 2nH(x) of these “typical” sequences in the typical set where nearly all of the probability is concentrated
))((321
))((
)(321
2),,,,(2
),,,,(εε
ε
−−+− ≤≤
∈XHn
nXHn
nn
xxxxp
Axxxx
……
)(nAε
Asymptotic Equipartition Property
l If we do a good job compressing the “typical” sequences, the overall quality of the job we do will be good
− Even if we do a poor job compressing the “atypical” sequences
l In fact, we can compress a block of X with length n into nH(X) bits
l Similarly, a block of (X,Y) with length n can be compressed into nH(X,Y) bits
Jointly Typical Sequences
<−−
<−−
<−−
=
ε
ε
ε
ε
),(),(log1
)()(log1
)()(log1
:),(
)(
YXHyxpn
YHypn
XHxpn
yx
A
nn
n
n
nn
n
Rate Distortion Theory
l The source coding theorem states that in order to reconstruct a discrete input sequence with no loss of information (i.e., no distortion), we must encode at a rate of H(X)
l But what if some distortion is allowable?
l And what if we want to describe a continuous random variable with a discrete alphabet?
l What rates are required then?
Rate Distortion Theory
l The answers to this question comes in the form of the rate distortion function R(D)
l Typical distortion measures:
− Hamming distance (discrete sources)
− Mean squared error (continuous sources)
)ˆ;(min)()ˆ,(
)ˆ,()|ˆ()(:)|ˆ(XXIDR
xx
Dxxdxxpxpxxp ∑=
≤
Rate Distortion Theory
l Examples
− Bernoulli Source with Hamming distortion measure
− Gaussian Source with MSE distortion measure
>≤≤−
=pD
pDDHpHDR
00)()(
)(
>
≤≤=2
22
0
0log21
)(σ
σσ
D
DDDR
Distributed Source Coding Principles
Distributed Source Coding Example
l Two temperature sensor with 8-bit ADC
l Entropy H(X) = H(Y) = 8 bits
l But Y = X + N, N={0,1} with equal probability
l H(Y|X) ~= 1 bit
l Collocated encoder:
− R = H(X,Y) = H(X) + H(Y|X) = 8 + 1 = 9 bits
l Separate encoders with no awareness of each other:
− R = H(X) + H(Y) = 8 + 8 = 16 bits
Lossless Distributed Source Coding
X Y
Decoder
Encoder
H(X,Y)
R = H(X,Y)
Lossless Distributed Source Coding
X Y
Decoder
Encoder
H(X)
Encoder
H(Y)
R = H(X)+H(Y)
Lossless Distributed Source Coding
X Y
Decoder
Encoder
H(X)
Encoder
H(Y|X)
R = H(X)+H(Y|X)= H(X,Y)
Can we do this?Slepian and Wolf proved that you can indeed
Slepian-Wolf Coding
l Can we encode data sequences separately with no rate increase? YES!!!
l Proof is achieved via analysis of an encoding strategy using “binning”
Slepian-Wolf Coding
l Choose rates Rx and Ry to meet Slepian-Wolf requirements
l Assign every sequence xn to a random bin on {1,2,… 2nRx} and every yn to a random bin on {1,2,… 2nRy}
l Transmit the bin indices to the decoder
l Look for jointly typical sets in (xn ,yn) that share indices
),(
)|()|(
XYHRR
XYHRYXHR
YX
Y
X
=+
==
Lossy Distributed Source Coding
2nRx bins
xn={0,1,0,…,0,1}xn={0,1,0,…,1,0}xn={0,1,0,…,1,1}
xn={0,1,0,…,0,0}…
…
Lossy Distributed Source Coding2n
Rybi
ns
2nRx binsxn
yn
Will result in decoding errors
2nH(X,Y) jointly typical sequences
xn={0,1,0,…,1,0}yn={0,0,0,…,1,0}
Slepian-Wolf Coding
l When do we get decoding errors?
− When the X and Y sequences are not jointly typical
l Rare from Joint AEP
− When more than one X in the same bin that is jointly typical with Y (or vice versa)
l Rare if Rx > H(X|Y) (and Ry > H(Y|X))
− When more than one jointly typical (X,Y) sequence in a single product bin (and vice versa)
l Rare if Rx + Ry > H(X,Y)
Lossy Distributed Source Coding
Rx
Ry
H(X)H(X|Y)
H(Y|X)
H(Y) Separate Encoders
Slepian-Wolf Coding
l With two variables, we can achieve considerable savings in amount of traffic required
l In general, for n sensors, the Slepian-Wolf requirement is as follows
{ }SjXSX
RSR
mSSXSXHSR
j
Sii
c
∈=
=
⊆∀>
∑∈
:)(
)(
},...,2,1{))(|)(()(
Source Coding w/ Side Information
l Look at a special case when ONLY X needs to be recovered at the decoder
EncoderXRx
Decoder
Y
X̂
EncoderRy
Source Coding w/ Side Information
l For U such that
l Then we can send at rates
);(
)|(
UYIR
UXHR
y
x
≥≥
)|(),(),,( yupyxpuyxp =
UYX →→
Source Coding w/ Side Information
l One extreme: U = Y
l Another extreme: U is uncorrelated with X,Y
l The results from Wyner basically say that we can operate between these two extremes
)();();(
)|()|(
YHYYIUYIR
YXHUXHR
y
x
==≥
=≥
0);(
)()|(
=≥=≥
UYIR
XHUXHR
y
x
R-D w/ Side Information
l If we want to reproduce X with some distortion D, what rate must we send at, given that we have access to side information Y?
EncoderX Decoder
Y
X̂)(* DRY
[ ] DXXdE <)ˆ,(
R-D w/ Side Information
l If X, Y are uncorrelated, this is just the rate distortion function R(D)
l Since side information can only help,
l If no distortion is allowed, then
as Slepian and Wolf showed
)()(* DRDRY ≤
)|()0(* YXHRY =
R-D w/ Side Information
l In general,
l f is the reconstruction function
l w is the encoded version of x
∑∑∑ ≤
−=
x w y
fxwpY
Dwyfxdxwpyxp
WYIWXIDR
)),(,()|(),(
));();((minmin)()|(
*
R-D w/ Side Information
l Remember: in lossless (S-W coding), we pay no penalty for separation of X and Y
l Unfortunately, this is not the case in lossy source coding
l In general,
: rate required when X’s encoder has access to Y
l However, equality is achieved when (X,Y) are jointly Gaussian
)()( |* DRDR YXY ≥
)(| DR YX
Distributed Data Compression
l The rate-distortion region for this general problem is unknown
EncoderX
Decoder
Y
( )YX ˆ,ˆ
XR
EncoderYR
The CEO Problem
The CEO Problem
Encoder
Decoder X̂…
X~p(X)
)|( 1 xyW
Encoder)|( 2 xyW
Encoder)|( xyW L
The CEO Problem
l If the CEO’s L agents were able to convene, they could smooth the data (i.e., average out the noise) to obtain a true value of X and then send data at a rate R(D) required to keep distortion below D
l But what happens if they cannot convene?
− Assume a sum rate requriement ∑ ≤i
i RR
The CEO Problem
l Berger et al originally found the limits of this problem for a Hamming distance measure
l For R > H(X), Pe(R) does not go to 0
l In fact, as R gets large, whereis a constant for a given source distribution
and joint source/observation distribution
l In other words, there is always a penalty for not being able to convene
RWpe RP ),(2)( α−=
),( Wpα
The Quadratic Gaussian CEO Problem
Encoder
Decoder X̂
+
1N
Encoder+
2N
Encoder+
LN
…X
The Quadratic Gaussian CEO Problem
l Viswanathan and Berger et al found the limits of this problem as well
l As R and L get very large, what happens to D as a function of sum rate R
The Quadratic Gaussian CEO Problem
l Result
l Oohama proved that for Gaussian X,N, these bounds are tight
2
222
2
22
)|( 2),(
)|(~
log
)|;(inf0
X
NNX
X
yuQ
XUQX
E
XUYIσσ
σσβσ
≤≤
∂∂−
<
222 ),(
limlim),(X
LRNXLRD
Rσ
σσβ∞→∞→
=
The Quadratic Gaussian CEO Problem
l This means that distortion decays as
l Compare with the case when agents can all convene
l Agents can smooth out the data and get rid of noise
l So again, a penalty is paid for not being able to convene
RD N
2
2σ=
RXD 22 2−= σ
The Quadratic Gaussian CEO Problem
• How do we achieve this?
Encoder Decoder
X̂
+
1N
X
f g
SW codingBlock Coding
SW
Decoding
Block D
ecoding
Estim
ation
RD N
2
2σ=
Practical Distributed Source Coding Schemes
Draper et al’s Work
l Side Information aware limit:
− For a given rate constraint R
[ ]
);();(,,
)),(,(min,
uyIuyIRyyxyu
uyfxDEd
DE
NDE
Duf
−>→→
≥Π∈
x
yEyN
yD
Draper et al’s Work
l How to do this:
− Construct a code of ~2nI(ye;u) typical codewords
− Bin each of these into one of 2nR bins
l ~2n(I(ye;u)-R) codewords per bin
− Block encode and transmit codeword’s bin (cosetindex)
− Decoder choose codeword in coset that is jointly typical with its observation
Draper et al’s Work
l Side Information aware limit:
− For a given rate constraint Rcut
− One of these constraints exists for all possible cuts
[ ]
);();(
,
)),(,(min,
uyIuyIR
yxyu
uyfxDEd
c
c
c
AAcut
AA
A
uf
−>
→→
≥Π∈
x
A
AC
Rcut
Draper et al’s Work
l Quadratic Gaussian Case:
( ) Ryyxyxyyx
yyx
yyxyx
DEDDE
DE
DED
Rd
ddR
22,|
2|
2,|
2,|
2,|
2|
2)(
log21
)(
−−+=
−
−=
σσσ
σ
σσ
2| Dyxσ
2,| ED yyxσ
:Minimum distortion given decoder observation
:Minimum distortion given encoder, decoder observation
Draper et al’s Work
l Serial Network
2|1
2
1
21
2|
1
1
1
121
1
yx
R
l
l
x
l
yxll
lll
d
Nd
d
dNdN
d l
l
σ
σσ
=
+
−
++
≥ −−
−
−
−
−
yL
y2
y1
Draper et al’s Work
l Parallel Network
20
2
1
21
1
1 2
x
R
ll
l
ll
lll
d
dNd
dNdN
d
σ=
++
+≥ −
−
−
−
−
yL
y2
y1
yL-1
Draper et al’s Work
l We can apply the serial and parallel results to get the achievable distortion of a general sensor network tree
Draper et al’s Work
• Serial Network• Source variance = 4• Noise variance = 4/3• R = 2.5 bits
Source Coding Using Syndromes
l Originally introduced by Wyner in 1974
l Used by many in the literature recently, notably DISCUS by Pradhan et al.
l Exploits duality between Slepian-Wolf coding and channel coding
DISCUS Example
l X = {0,1}n and Y = {0,1}n , n = 3
l X, Y’s correlation is such that they differ in at most 1 bit
l Thus, given Y, X can take 4 values:
l Same or differ in the 1st,2nd,or 3rd bit
l H(X|Y) = 2
− If we send all of Y, this is the S-W limit
DISCUS Example
l Divide codeword space into 4 cosets
− 0: {000,111}, 1: {001,110}, 2: {010,101}, 3: {011,100}
− Send only the coset index (4 cosets = 2 bits)
− This meets the S-W bound
l Given Y, only one of the members of the coset can have a Hamming distance of 0 or 1
l Coset index is just the syndrome of x using the parity-check matrix H
=
=
101011
H
Hxs
DISCUS Example
l Decoder knows that choices for X are [010] and [101]
l Since Y is [110], it has to be [010] since [101] has a Hamming distance of 2
[ ][ ]
=
==
=
01
010
101011
011
010
HxsY
XT
T
i.e., coset index is 2
Source Coding Using Syndromes
l Lets say we have correlated X and Y
l X = Y + U
− U is a Bernoulli RV with p < 0.5
l Y sends full information
− RY = H(Y)
l X sends partial information, hopefully at the Slepian-Wolf limit
Source Coding Using Syndromes
l Lets generate a parity check matrix H (m x n)
l X’s encoder generates n samples and then calculates the syndrome sx = HX
− No guarantee that X is a valid codeword (in the null space of X) so sx can take any value
l X’s encoder sends sx to the decoder
Source Coding Using Syndromes
l Decoder generates syndrome of Y
− sY = HY
l Decoder calculates sX+sY sX+Y = sU
l If X and Y are highly correlated, we can expect that there aren’t too many differences
− i.e., U should be mostly zeros
l Decoder can now approximate U (= f(sU) ) and therefore, approximate X
Source Coding Using Syndromes
l Alternative way to do this (from DISCUS paper)
− s=Hx
− y’=y+[0|s]
− Find x’, the closest code to y’ in the coset with s=[00…0]
l i.e., the closest valid codeword in the null spoace of H
− x=x’+[0|s]
Source Coding Using Syndromes
l In general, if we have a (n,k,2t+1) code,
− R = n - k
− S-W bound is the log of number of possible outcomes for X given Y, given each is equally likely
− In previous example, n=3, k=1, t=1
l = log(1+3) = 2, R = n-k = 2
∑=
t
i
SWX i
nR
0
log
SWXR
Source Coding Using Syndromes
l A (n,k,2t+1) code can be used if the differences between X and Y are at most t
l Lets look at an X,Y correlation structure that can be modeled asa BSC with crossover probability p
l Probability of a correct decode is the probability that the number of crossovers <= t
it
i
in ppin
P ∑=
−−
=
0
)1(
Source Coding Using Syndromes
l The benefit of using good channel codes should be apparent now
l There has been a lot of work recently applying state of the art codes like LDPC and Turbo codes to the S-W problem
l Also, some have used convolutional codes in similar ways
DISCUS with Continuous Sources
Encoder Decoder
Y
WXX’
Correlation
Exists
Fictitious channel P(Y|W) existsW sends to Y at a rate of I(W;Y)This means fewer bits actually need to be sentRequired rate is now H(W|Y) = H(W) – I(W;Y)
DISCUS with Continuous Sources
l Lets say that we set the fictitious rate at 1 bit
l This means we need to send two bits
l Lets make our cosets {1,5}, {2,6}, {3,7}, {4,8}
l Sample X and then send the coset index
1 3 4 5 6 7 82
DISCUS with Continuous Sources
l X quantizes to value of 6
l X send coset index 1 à {2,6}
l Decoder knows X has been quantized to 2 or 6
l Remember that the fictitious channel sent the other bit
l This means that due to knowledge of Y and the correlation structure, the decoder can disambiguate between the 2 possible values (1 bit)
X Y
W
Time-Sharing
Rx
Ry
H(X)H(X|Y)
H(Y|X)
H(Y) Separate Encoders
l Most typical distributed coding schemes operate on the corners of the achievability region
l But you can operate anywhere on the curve
l Easiest way is via time-sharing
− i.e., X sends full data, Y sends partial data for some portion of time, and then they reverse roles for some portion of time
Time-Sharing
Critescu et al’s Work
l Compare approaches for correlated data gathering in tree networks
− S-W model
l Coding complex, Transmission Optimization simple
− Joint Entropy Coding model
l No S-W, encoding occurs after others’ data is explicitly known
l Coding simple, Transmission Optimization complex
l How does the choice affect rate allocation?
Critescu et al’s Work
l Main result:
− S-W Coding – allocate high rates to nodes close
l Since they have to route over fewer hops
l Liu et al in Mobicom 2006 paper came up with similar results
− Joint Entropy Coding – allocate high rates to nodes furthest nodes and lower rates to nearby nodes
l Since they can use others as side information
Critescu et al’s Work
l SW scheme
− Optimize ST and then use LP for rate allocation
− LP solution has closed form
l Approximation:
− Each node find nodes in neighborhood that are closer on the SPT
− Transmit at a rate
),,,|(
)|()(
121
122
11
XXXXHR
XXHRXHR
NNN …L
−=
==
)|( iii CXHR =
iC
Critescu et al’s Work
l Clustered SW scheme
− Since S-W for many nodes is very complex, we can do it in clusters
− How do we choose clusters such that
− NP-complete problem for all but a few degenerate cases
{ }{ }
= ∏
==
C
iI
ICi iC
ii
KI1,
* detminarg1
Critescu et al’s Work
l Joint Entropy Coding scheme
− Problem is NP-complete
− Authors provide SPT-based heuristics
Critescu et al’s Work
Sensor Correlation
l A lot of the solutions for limits of distributed source coding assume the specific cases of
− Binary sources
l Correlation modeled as a BSC
l Distortion measured as Hamming distance
− Jointly Gaussian sources
l Distortion measured as MSE
Sensor Correlation
l How should we model correlation of real world sensor data?
− We could use a training phase where the correlation is learned over time
− We can assume that nearby sensors have higher correlation
l True in many real-world applications
Sensor Correlation
• Pattem et al’s approach– If X and Y are separated
by a distance d(X,Y)
– Fit this data to empirical rainfall data
)(1
),(1
1)|( 221
12 XH
cXXd
XXH
+−=
Sensor Correlation
l Additional sensors add and additional
bits of information
l If all sensors equally spaced by d, total information is about
)(1
),(1
1 221
XH
cXXd
+−
)(1
11)1()()( 11 XH
cd
nXHXH
+−−+=
Sensor Correlation
l Critescu et al assumes a Gaussian Markov Field
( )
( ) ( )
=
=
==
=
−
−
−−− −
C
C
T
Y
YNC
k
cdijij
XKX
KK
eYYH
KeXH
eK
eK
Xf
detdet
2log21
|
det2log21
)(
)det(21
)(
2
1
22
)()(21
π
π
σσ
π
µµ
Sensor Correlation
l Liu et al use 3 models in their analysis
− Hard Continuity Field
l In general, you could use
− Linear Covariance Continuity Field
l In general, you could use
− Gaussian Markov Field
)(21 dfXX ≤−
cde−= 22,1 σσ
dXX ≤− 21
[ ] 2221 )( dXXE ≤−
[ ] )()( 221 dfXXE ≤−
Ideas For Work In Distributed Source Coding
Ideas For Work In Distributed Source Coding
l There has a lot of work in this field, as applied to WSN, especially in the last 5 or so years
l Some research has looked at minimizing cost of gathering data in WSN
− What about maximizing network lifetime, a la DAPR, MiLAN, etc?
− What about jointly optimizing transmission ranges with network topology and rate allocation?
Ideas For Work In Distributed Source Coding
l When calculating cost of data gathering, most look at fundamental limits, assuming long block lengths, no overhead
− What happens when latency is important and we cannot encode long blocks?
l DISCUS example shows that significant gains can still be made
− What happens when packet overhead can’t be ignored?
l e.g., the cost for sending data at 1 bit/sample isn’t much different than sending at 10 bits per sample?
References
References
l Slepian and Wolf, “Noiseless Coding of Correlated Information Sources,” IEEE Transactions on Information Theory, July 1973.
l Wyner, “Recent results in the Shannon theory,” IEEE Transactions on Information Theory, Jan. 1974.
l Wyner, “On Source Coding with Side Information at the Decoder,” IEEE Transactions on Information Theory, May 1975.
l Wyner and Ziv, “The Rate-Distortion Function for Source Coding with Side Information at the Decoder,” IEEE Transactions on Information Theory, Jan 1976.
l Cover and Thomas, “Elements of Information Theory,” 1991.
References
l Berger, Zhang, and Viswanathan, “The CEO Problem,” IEEE Transactions on Information Theory, May 1996.
l Viswanathan and Berger, “The Quadratic Gaussian CEO Problem,” IEEE Transactions on Information Theory, Sep. 1997.
l Oohama, “The Rate-Distortion Function for the Quadratic Gaussian CEO Problem,” IEEE Transactions on Information Theory, May 1998.
l Gastpar, “The Wyner-Ziv Problem With Multiple Sources,” IEEE Transactions on Information Theory, Nov. 2004.
Referencesl Luo and Pottie, “Balanced Aggregation Trees for Routing Correlated Data in
Wireless Sensor Networks,” ISIT 2005.
l Xiong, Liveris, and Cheng, “Distributed Source Coding for Sensor Networks,” INFOCOM, 2003.
l Draper and Wornell, “Side Information Aware Coding Strategies for Sensor Networks,” IEEE JSAC, Aug. 2004.
l Pradhan and Ramchadran, “Distributed Source Coding Using Syndromes (DSICUS): Design and Construction,” IEEE Transactions on Information Theory, Feb, 2003.
l Pradhan and Ramchadran, “Distributed Source Coding Using Syndromes (DSICUS): Design and Construction,” IEEE Transactions on Information Theory, Feb, 2003.
References
l Cristescu, Beferull-Lozano, and Vetterli, “On Network Correlated Data Gathering,” INFOCOM 2004.
l Cristescu, Beferull-Lozano, Vetterli, and Wattenhoffer, “Network Correlated Data Gathering With Explicit Communication: NP-Completeness and Algorithms,” IEEE/ACM Trnsactions on Networking, Feb. 2006.
l Liu, Adler, Towsley, and Zhang, “On Optimal Communication Cost for Gathering Correlated Data Through Wireless Sensor Networks”, MobiCom, 2006.
l Pattem, Krishnamachari, and Govindan, "The Impact of Spatial Correlation on Routing with Compression in Wireless Sensor Networks," IPSN, 2004.