adaptive linear filtering compression on realtime sensor ......our proposed adaptive linear...
Embed Size (px)
Adaptive Linear Filtering Compression on RealtimeSensor Networks
Aaron B. Kiely† Mingsen Xu‡ Wen-Zhan Song‡ Renjie Huang‡ Behrooz Shirazi‡
† Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109‡ Sensorweb Research Laboratory, Washington State University, Vancouver, WA 98686
Abstract—We present a lightweight lossless compression al-gorithm for realtime sensor networks. Our proposed AdaptiveLinear Filtering Compression (ALFC) algorithm performs pre-dictive compression, using adaptive linear filtering to predictsample values followed by entropy coding of prediction residuals,encoding a variable number of samples into fixed-length packets.Adaptive prediction eliminates the need to determine predictioncoefficients a priori and, more importantly, allows compression todynamically adjust to a changing source. The algorithm requiresonly integer arithmetic operations and thus is compatible withsensor platforms that do not support floating-point operations.Significant robustness to packets losses is provided by includingsmall but sufficient overhead data to allow samples in each packetto be independently decoded. Real-world evaluations on seismicdata from a wireless sensor network testbed show that ALFCprovides more effective compression and uses less resourcesthan some other lossless compression approaches such as S-LZW. Experiments in a multi-hop sensor network also show thatALFC can significantly improve raw data throughput and energyefficiency.1
Wireless sensor networks have attracted significant interestfrom the research community for a broad range of applica-tions –. Several recent applications involve high datarate signals, such as monitoring industrial plants , volcanohazards ,  and civil structures such as buildings orbridges , . For these high rate applications, collectinghigh-fidelity data subject to the limited radio bandwidth avail-able to sensor nodes presents a key challenge. In addition tothe limited physical bit rate of the radios used in low powerplatforms, radio links may experience frequent packet lossesdue to congestion, interference, and multipath effects. Theseproblems are exacerbated over multi-hop routing paths. Thereis a fundamental tradeoff between the network size and thetotal quantity of raw signals that can be collected.
Data compression is an important tool to maximize datareturn over unreliable and low rate radio links. For example, ithas been shown in  that in some applications a compressioncomputation yielding only a single byte of data reduction maybe worth spending between roughly four thousand (ChipconCC2420) and two million (MaxStream XTend) cycles ofcomputation.
1The research described in this paper was supported by NASA ESTOAIST program and USGS Volcano Hazard program under the research grantNNX06AE42G. The research was carried out at the Jet Propulsion Laboratory,California Institute of Technology, and at Washington State University.
Sensor networks have some significant limitations such aslimits on computational capability, memory and packet size,as well as high packet loss rates and small energy supplies.Those constraints present significant challenges in designingcompression schemes for sensor networks.
At present, sensor nodes (e.g., MICAz, TelosB) usuallyhave relatively modest computational power and do not sup-port floating-point arithmetic, thus requiring a low-complexitycompression approach that can operate in real time without theuse of floating-point operations. Even on platforms that sup-port floating-point arithmetic (e.g, iMote2), it may be desirableto avoid such calculations to reduce power consumption.
Simultaneous transmission from multiple nodes throughmulti-hop relay within a network can lead to congestion,collision and high packet losses, demanding a compressionscheme that allows packets to be decompressed even whenpreceding packets have been lost. Lossless compression isfrequently demanded by science users to preserve fidelity ofcritical data (e.g., earthquake data). But packet losses may beunavoidable in realtime high-fidelity sensor networks due tolimited bandwidth or node mobility. Motivated by this, somesensor network applications are designed to provide reliabledelivery of critical data and best-effort delivery of non-criticaldata.
Many sensor platforms employ hardware that uses the IEEE802.15.4 standard and limits the maximum packet size to 128bytes. The TinyOS operating system sets the default datapayload length to 29 bytes and many systems recommenda packet size of no more than 90 bytes. Providing indepen-dently decodable packets under such limitations requires thatincreased attention be paid to compression parameters or otheroverhead information included with each packet.
In this paper, we present a lightweight compression schemefor sensor networks, called Adaptive Linear Filtering Com-pression (ALFC). Our intended application is compression ofseismic data, but this method may work well for other datatypes.
Adaptive prediction eliminates the need to determine pre-diction coefficients a priori and, more importantly, allowsthe compressor to dynamically adjust to a changing source.This is particularly important for seismic data because thesource behavior can vary so dramatically depending on seismicactivity. To eliminate floating-point operations, we make useof rational approximations to real-valued quantities in linear
prediction. To ensure that samples in a packet can be decodedwithout requiring preceding packets to be available to thedecoder, we alter the prediction approach for the first fewsamples in the packet so that it does not rely on sample valuesin preceding packets.
Predicted sample values are used to losslessly encode sourcesamples using a variable-length coding scheme. We map eachsample value to a non-negative integer and then encode theresulting sequence using Golomb codes. This general strategyis used in the Rice entropy coding algorithm ,  andthe LOCO-I image compressor , among myriad otherapplications.
We have implemented and evaluated the proposed algorithmin a multi-hop sensor network testbed with seismic datafeed. The testbed experiments show that our ALFC algorithmcan significantly increase data quantity and energy efficiency,and provide better performance compared to a recent relatedapproach  in the literature.
The rest of the paper is organized as follows. In section II,we review related work on predictive lossless seismic datacompression and data compression in sensor networks. Weintroduce our proposed linear prediction, packetization andentropy coding algorithm in section III, then present ourevaluation experiments in section IV. Finally, we conclude ourpaper and present potential future research in section V.
II. RELATED WORK
A. Predictive Lossless Seismic Data Compression
Lossless predictive compression generally consists of aprediction stage, where each sample value is predicted basedon past sample values, and a coding stage, where an entropycoding method is applied to losslessly encode predictionresiduals (the difference between predicted and actual samplevalues). A variety of different approaches for the predictionand entropy coding stages have been used in the literature forlossless compression of seismic data.
The least complex predictors include simply using the previ-ous sample value for prediction, as in the approaches describedin , or using linear prediction with fixed coefficientsdetermined a priori, as in . Better performance can beobtained at the expense of higher complexity by selectingprediction coefficients separately optimized for each blockof samples, as is done in  and  for two differentsource models. More sophisticated prediction approaches inthe literature include the use of a 20-stage adaptive recursiveleast squares lattice adaptive filtering approach , severalrepeated stages of adaptive linear prediction using gradientadaptive lattice filters on very large blocks of samples ,and a source model designed to incorporate structural systeminformation when external exogenous input information isavailable . Reference  makes use of a somewhatmore conventional adaptive linear filtering approach, as in ourapproach. Unlike our sensor network application, however, thework of  is applied to much larger blocks of data (eachat least 50 KB), and employs somewhat higher complexity
prediction, requiring more arithmetic operations and makinguse of floating-point prediction coefficients.
For lossless encoding of prediction residuals, approachesinclude a method introduced in  based on arithmeticcoding and used by , –. Slightly lower compres-sion effectiveness is provide by the simpler bi-level codingapproach described in . The bi-level coding method hassimilar complexity to our approach, which relies on Golombcodes, but note that selecting the coding parameters to obtainthe best performance from bi-level coding assumes knowledgeof the standard deviation of the prediction residuals beingencoded in the block. Even lower complexity ad hoc entropycoding approaches can be found in the compression methodsdescribed in .
The most significant differences between our approach andthese related works is our design requirement that packetscan be decompressed independently as a means of deal-ing with potentially high packet losses, and the constraintsimposed by small packet size and limited computationalcapabilities. Higher computational capabilities would permithigher complexity compression approaches, e.g., performinga least-squares optimization to determine optimal predictioncoefficients, optimizing the filter prediction order for eachpacket, using floating point coefficients, or using arithmeticcoding. The use of relatively short packets means that theproblem of efficiently encoding overhead information (e.g.,to indicate coding parameters or prediction coefficient infor-mation) becomes more significant than in typical seismic datacompression scenarios. We also note that nearly all of the otherapproaches encode a fixed number of samples into variable-length encoded units, while our approach encodes a variablenumber of samples into fixed length packets.
B. Compression in Sensor Networks
The literature includes many papers on data compressionfor sensor networks, though many of these approaches havenot been evaluated in real sensor network testbeds.
Much prior work focuses on exploiting high spatial cor-relation in data from fixed sensors in dense networks. In, , distributed source coding approaches are used toexploit data dependencies between nodes in a sensor network.The approach in  minimizes the amount of inter-nodecommunication for compression using both a quantized sourceand correlated side information within each individual node.This is an interesting theoretical approach, but the choice ofthe correlated side information is essential to the performanceof the algorithm and normally not well known in practice. In, an intelligent data gathering node determines the degreeof correlation between data from different source nodes, andbased on this information tells each data node the allowableamount of compression. Each data source node executes a verysimple compression algorithm that allows correlation betweensources to be exploited even when the node does not haveaccess to data from neighboring nodes. Note that experimentsin  were performed using light, temperature, and humiditysensors, and it is unclear how well this compression approach
would work in the presence of seismic events, when thenature of the data dependency between nodes might changedramatically and without warning.
Several methods have been proposed to use wavelets andtheir variants in analyzing and compressing the sensed data.Ganesan’s DIMENSIONS  was one of the first systemsaddressing multi-resolution data access and spatio-temporalpattern mining in a sensor network. In , nodes are par-titioned into different clusters and organized in a multi-levelhierarchy. Within each cluster, the cluster head performs a twodimensional wavelet transformation and stores the coefficientlocally. These coefficients are passed to the next level in thehierarchy for wavelet transformation at a coarser resolution.While it demonstrates promising results, it relies on two keyassumptions that limit its general applicability: (1) nodes aredistributed in a regular grid; (2) cluster heads can alwayscommunicate with their parents. Wagner  proposed anarchitecture for distributed wavelet analysis that removes theassumption about the regularity of the grid. Moveover, proposes an algorithm for performing the wavelet transforma-tion by tracing through the path in the minimum spanning treeand applying the wavelet filter along the path. It minimizesinter-node communication by transmitting partial coefficientsforward and updating future sensors until the full coefficientsare computed. This approach implicitly assumes that the pathwill be sufficiently long enough that wavelet analysis willbe effective. Furthermore, it is not clearly explained howto choose an optimal path for compression and the spatialcorrelation is not fully explored.
While several authors propose compression algorithmsbased on spatio-temporal correlation, some recent works donot assume any correlation based on spatial relationship ofsensor nodes or on temporality between data packets. Assuch, they are more general and are better equipped to handlesorts of low-correlation situations common to mobile sensornetworks.  develops a lossless energy aware compressionalgorithm and shows the major factor of power usage in re-source constrained devices. It shows that sending a single bit isroughly equivalent to performing 485∼1267 ADD operations.When profiling such usage, it becomes obvious where the mostenergy can be saved, and that is in reducing the number ofbits sent over the radio.  proposes and evaluates a family ofbasic lossless compression algorithms, S-LZW, based on LZW and tailored to static and mobile sensor networks andapplicable to a wide range of applications. They further discussadditional steps for transforming or preconditioning the data tofurther reduce energy consumption. S-LZW-MC (S-LZW withMini Cache) is an extension that makes use of a cache of themost recently used entries. This additional cache serves as anappendix to the standard S-LZW dictionary, under the assump-tion that the sensor’s incoming data tends to be repetitive overshort intervals. S-LZW-MC with Burrows-Wheeler Transform(BWT) or Structured Transpose (ST) provides more overallenergy saving . In the dictionary-based methods (suchas LZW and its extensions), senders and receivers maintainthe same hash-indexed dictionary, either static or dynamic,
and take the same symbol to represent the incoming datastrings. However, as discussed by the author, some packetsmay get lost during transmission and the dictionary may losesynchronization. Packet loss throughout the sensor networkcan be prohibitive for S-LZW compression algorithms thateven stop the decompression process of this block.
Some recent works also adopt linear prediction coding todata compression in wireless sensor networks.  examinesthe performance of different variations of the linear predictivecoding (LPC) compression algorithm on different hardwareconfigurations. The authors do not address the issue of mod-ifying the compression algorithms to provide robustness topacket losses on unreliable networks.  applies second orderlinear prediction with coefficients determined a priori basedon training data. Compression is applied in sensor networksto humidity sensor data using small (29-byte) packets. Ratherthan employ a compression scheme that allows independentdecoding of packets to provide robustness to packet losses,they assume that lost packets will be re-transmitted as manytimes as needed.  considers the problem of microacous-tic/seismic data compression in wireless sensor networks.An algorithm is employed so that events are automaticallydetected and no data is transmitted unless an event is detected.Prediction approaches considered are limited to (at most) tak-ing the differences between successive samples. Compressionis performed on blocks of 512 samples, which is fairly largecompared to our application.
III. ADAPTIVE LINEAR PREDICTION COMPRESSION
In this section, we describe our linear predictive compres-sion approach, originally presented in . We model a sensoras a one-dimensional data source that produces integer-valuedsamples x1, x2, . . .. The instrument has a dynamic range of bbits, and without loss of generality, we may assume that eachsample value is in the range [−2b−1, 2b−1−1]. In our intendedapplication, the source is a single component seismometerwith the instrument dynamic range of b = 16 bits. Eachnode has relatively modest computational power, and so ourcompression approach must have relatively low complexity.Encoded sample values are transmitted using fixed-lengthpackets, and so we would like to losslessly encode as manysamples as possible in each packet. A significant additionalproblem is that packets are often lost on the channel. For thisreason we impose the additional constraint that the decodingof a received packet must not depend on the contents of otherpackets. We assume that time stamp information is alreadyincluded with each packet so the decoder can properly syn-chronize received sample values once they are decoded. Ourcompression approach relies on an adaptive linear predictionof sample values and entropy coding of prediction residuals.
1) Adaptive Linear Prediction: We maintain a runningestimate of the mean input signal value µi. This estimate isused to compute a “de-biased” version of the source samples
di = xi − µi.
We apply M th order adaptive linear prediction to the de-biasedsignal di. I.e., the predicted value di is a linear combinationof the preceding M de-biased values,
wj · di−j = wTi ui. (1)
Here wi = [w1, w2, . . . , wM ]T is a vector of weightcoefficients that are adapted to the source and ui =[di−1, di−2, . . . , di−M ]T is the vector of the preceding M de-biased sample values. The predicted sample value is
xi = µi + di.
The estimation error, or prediction residual, is
ei = xi − xi = di − di.
The prediction xi is used to losslessly encode xi using avariable length coding scheme described in Section III-C. Theentropy coding procedure takes into consideration the fact thatxi is an integer while xi is usually not.
After encoding xi, we use the sign algorithm  to updatethe weight vector:
wi+1 = wi + α · ui · sign(ei)
and we update the mean value estimate via
µi+1 = µi + β · (xi − µi).
Here α and β are parameters that control the adaptation ofthe weight vector and mean estimate to the source statistics.
It might seem more natural to perform mean estimationas part of the sign algorithm instead of as a separate step.Under this alternative, one could define extended vectors w′
0, w′1, . . . , w
′M ]T and u′i = [κ, xi−1, xi−2, . . . , xi−M ]T
where κ is some fixed constant. Then we could predict thevalue of sample xi directly as x′i = w′T
i u′i. We do not adoptthis alternative approach because the prediction of the firstsample value in a packet becomes less straightforward andbecause compression effectiveness becomes somewhat moresensitive to parameter selection.
2) Prediction Using Integer Arithmetic: To eliminatefloating-point operations in the basic algorithm of Sec-tion III-A1, we use rational approximation to real-valued datasamples to produce a version of the algorithm that requiresonly integer arithmetic. Specifically, the real-valued quantitiesdi, xi, µi, ei, wi are approximated using rational values:
di = Di/2R
xi = Xi/2R
µi = Ωi/2R
ei = Ei/2R
Here R is some fixed integer, Di, Xi, Ωi, Ei are integervariables, and Wi is a vector of integers. The value of Reffectively controls the resolution at which linear prediction
calculations are performed; we use R = 14 in our experiments.Since the sample values being predicted are integer-valued,compression effectiveness should not be highly sensitive tothe value of R used, provided that R is not so large thatarithmetic overflow occurs. In this version of the algorithm, weperform round-off operations that result in di being integer-valued, and, consequently, ui being a vector of integers. Theadaptation parameters α and β are chosen to be α = 2−A,β = 2−B for some integers A and B so that the multiplicationneeded to perform the updates can be accomplished via bit-shift operations.
Each iteration of the integer version of the predictionalgorithm consists of the following steps:(i) Compute
Di = WTi ui. (2)
(ii) ComputeXi = Di + Ωi.
(iii) Encode the integer sample value xi using the rationalpredicted value xi = Xi/2R.(iv) Compute the (integer) de-biased value di
di = xi −⌊(Ωi + 2R−1 − 1)/2R
(v) Compute the prediction error
Ei = di · 2R − Di.
(vi) Update the weight vector
Wi+1 = Wi + sign(Ei)⌊(2Rui + (2A−1 − 1)I)/2A
⌋where I denotes a vector of ones, and the floor operation isapplied to each component of the vector.(vii) Update the mean value estimate:
Ωi+1 = Ωi −⌊(Ωi − xi · 2R + 2B−1 − 1)/2B
B. Encoding Into Independent Packets
We collect samples from the source, placing the samplevalues into a buffer. After each source sample is collected,we determine whether the encoded bit cost of the samples inthe buffer exceeds the available packet size. If not, then wegather the next sample. Otherwise, we encode the samples inthe buffer (excluding the newest one), and reset the buffer tocontain only the newest sample.
Compression speed improves if we can quickly identifysituations where the samples in our buffer can fit within apacket. Towards this end, one can make use of bounds onthe size of encoded samples whenever possible and avoidexplicitly computing the coding cost except when necessary.
To ensure that samples in a packet can be decoded withoutrequiring preceding packets to be available to the decoder, wemake the following modifications at the start of each packet:(i) We place encoded quantized versions of µi and wi at thebeginning of each packet. The values of µi and wi are setto these quantized versions in both the encoder and decoder.Quantization is uniform, using some numbers Qµ bits of
resolution for the value of µi and Qw bits for each componentof wi.(ii) We alter the prediction approach for the first M samplesin the packet so that it does not rely on sample values in thepreceding packet.
We describe these modifications in further detail below.1) Quantization: The range of possible values of µi is, at
least in principle, equal to the range of possible sample valuesxi. This range is uniformly partitioned into 2Qµ bins, and theindex of the quantizer bin containing the value of µi is encodedin the packet using Qµ bits. The quantizer index could beencoded a little more efficiently using a variable length codesince smaller magnitude values of µi are presumably morelikely than larger magnitudes, but we did not investigate sucha scheme.
Each component of wi is clipped as needed to ensurethat its magnitude does not exceed some cap 2C (we useC = 2 in our experiments). Each component is then quantizedusing a uniform quantizer with 2Qw bins spanning the range[−2C , 2C ]. We make use of a simple variable length codingscheme to exploit the fact that the components of wi areusually non-increasing in magnitude and alternating in sign,i.e.,
(i) |w1| ≥ |w2| ≥ |w3| ≥ . . .(ii) sign(wj) = (−1)j+1
We say that a quantized weight vector is ordinary if itsatisfies these conditions.
We use a single bit to indicate whether the quantizedweight vector is ordinary. If it is not ordinary, the weightcomponents are sent uncoded using an additional M ·Qw bits.If the weight vector is ordinary, we encode |w1| directly usingQw − 1 bits, and for j > 1, we encode the value of |wj |using dlog2 |wj−1|e bits. And the quantized weight vector iscalculated and encoded every data packet. In our experimentson seismic data, the quantized weight vector was ordinarymore than 90% of the time.
2) Predicting the Initial Samples in a Packet: Since theprocedure for decoding a packet cannot depend on the contentsof another packet, for the first M samples in a packet we mustmodify the regular prediction approach since we do not haveenough data to perform the calculations of equations (1) or(2) directly. Instead, we do the following:(i) The first de-biased sample value in a packet is predicted tobe zero. That is to say, for the first sample, prediction makesuse of the quantized bias estimate but not the weight vector.(ii) For subsequent samples in the packet, when the calculationin equations (1) or (2) would require the use of samplesfrom the preceding packet, the first de-biased sample valueis repeated enough times to artificially produce M de-biasedsample values to fill the vector ui.(iii) Updates of the weight vector wi (or Wi) are not per-formed for the first M samples in a packet, during whichtime this modified prediction strategy is in effect.
3) Packet Overhead and Parameter Tradeoffs: Compres-sion related packet overhead consists of the following:(i) The quantized value of µi (encoded using Qµ bits)
(ii) The quantized value of wi (encoded using at most M ·Qw + 1 bits)(iii) The value of an index indicating which variable lengthcode was used to encode prediction residuals (using dlog2 bebits); see Section III-C.
We summarize the tradeoffs involved in selecting compres-sion parameters:(i) Higher order prediction (a larger value of M ) allowsfor more accurate prediction, but increases the number ofcomponents of wi, thus generally increasing the amount ofoverhead used to encode wi at the start of each packet.(ii) Higher resolution quantization of µi and wi (i.e., largervalues of Qµ and Qw) increases prediction accuracy butincreases packet overhead.(iii) Larger adaptation step sizes (larger values of α = 2−A,β = 2−B) allow faster initial adaptation and adaptation to adynamically changing source, but provide worse steady-stateperformance when the source is not rapidly changing. In ourexperiments, we set A = 15 and B = 8.
C. Entropy Coding
Source sample values are collected in a buffer. After eachsource sample arrives, we determine whether the encoded bitcost of the samples in the buffer exceeds the available spacein the packet. If not, then we proceed to the next sample.Otherwise, we encode the samples in the buffer (excludingthe newest one) using the entropy coding procedure describedin the remainder of this section and reset the buffer to containonly the newest sample.
Thus, with the arrival of each new sample, we must de-termine whether the accumulated samples fit within a packet.Since the coding option used for a packet can change as newsamples arrive, explicitly computing the encoded length of thesamples in the buffer is not always as simple as incrementingan encoded bit count with the cost of the new sample. Theobvious brute-force approach is to simply apply the entropycoding procedure to the samples in the buffer.
Fortunately, we can often avoid the brute-force calculationby bounding the encoded bit cost to quickly identify caseswhere the packet can accommodate the accumulated samples.For example, the entropy coding procedure guarantees thatthe average bit cost to encode n samples is no more than n · bbits. As another example, if we have computed the bit cost toencode the first n−1 samples, we can bound the cost to encoden samples based on a bound on the incremental cost to encodea single sample. We omit further details of our approach tobounding the encoded bit cost.
Our entropy coding problem then is to efficiently encodea length-n sequence of integer-valued samples xi given real-valued predictions xi. To do this, we map each sample value toa non-negative integer and then encode the resulting sequenceof non-negative integers using a Golomb code. This generalstrategy is used in the Rice entropy coding algorithm , and the LOCO-I image compressor , among myriadother applications.
1) Mapping: It is sensible to refine the predicted value xi
to take into account the fact that the true sample value xi is aninteger and is constrained by the instrument dynamic range.Accordingly, we define
[xi] = minmaxround(xi), xmin, xmax
where xmin = −2b−1 and xmax = 2b−1 − 1 are the minimumand maximum possible sample values. We use this refinedprediction to calculate the integer-valued prediction residual
ei = xi − [xi].
We map the signed integer quantity ei to a nonnegative integerfi using a slight variation on the mapping used in , :
2|ei| − δi, if |ei| ≤ θ
|ei|+ θ, otherwise.
Here we define
θ = min[xi]− xmin, xmax − [xi]
1, sign(ei) = sign(xi − [xi])0, otherwise.
This mapping is invertible and ensures that fi ∈ [0, 2b−1], i.e.,fi is a nonnegative integer with dynamic range that matchesthat of the original source. More significantly, the mappingassigns smaller magnitude residuals ei to smaller values offi. Since smaller magnitude prediction residuals should occurmore frequently than larger magnitudes, we would like toencode fi using a variable length code that assigns shortercodewords to smaller integers, such as the codes that wediscuss next.
2) Variable Length Coding: For positive integer m, the mthGolomb code  defines a reversible prefix-free mapping ofnonnegative integers to variable length binary codewords.
We restrict our choices to codes for which m = 2k for somenonnegative integer k. As noted in , coding in this casebecomes especially simple. The codeword for the integer jconsists of the unary representation of
followed by a one) concatenated with the k least significantbits of the binary representation of j. Following the conventionof , we refer to this special case as a Golomb-power-of-2(GPO2) code with parameter k.
The samples in a packet are either all sent uncoded, usingb bits for each sample, or they are all encoded using the sameGPO2 code with some fixed parameter k. The coding optionselected is explicitly encoded as part of the packet overhead,as described in Section III-B3. For a source with b-bit dynamicrange, the cost of using code parameter k ≥ b − 1 is alwaysat least as large as the cost of sending the samples uncoded. Thus, in our application, when we use a GPO2 code, itmust have parameter k satisfying
0 ≤ k ≤ b− 2.
This gives us b−1 GPO2 code choices, along with the uncodedoption, so our selected code can be indicated using dlog2 bebits of overhead.
The coded samples do not usually perfectly fill a packet;following the last encoded sample in the packet, any remainingunused bits in the packet (fill bits) are set to zero. Because eachGPO2 codeword begins with a unary-encoded value, these fillbits will never be mistaken for a coded sample value, andthe decoder can correctly determine the number of samplesencoded in the packet.
We now turn our attention to the problem of selectinga coding option that efficiently encodes some number n ofnon-negative integers f1, f2, . . . , fn. The traditional solutionto this problem is the Rice algorithm’s brute-force approach:explicitly compute the coding cost of each option and selectthe best one , . But it was shown in  that thebrute-force approach is unnecessarily complex; if we simplycompute the sum
then the mean value F/n allows us to narrow the possible op-timum code choices to at most three candidates. Furthermore,by simply comparing this mean value to a list of pre-definedthresholds, we can perform code selection in a way that givescompression effectiveness that is quite close to that obtainedunder an optimum code selection.2
Our code selection procedure uses the following steps(further details and mathematical background can be foundin ):(i) If the mean value F/n is sufficiently large, the fi are sentuncoded. Specifically, we first check if F
n > µ†b4= 1
(µ†16 ≈ 23637). If this condition is satisfied, then the uncodedoption is selected. Otherwise, proceed to the next step.(ii) Compute K as follows. If F
n + 49128 < 1 then K = 0,
otherwise K is the unique nonnegative integer satisfying 2K <Fn + 49
128 ≤ 2K+1. This can be implemented in C source codeas:
for (K=0; (n<<(K+1))<= F+(N*49>>7); K++);
(iii) Assign k = minK, b− 2.(iv) Compute the bit cost of using the GPO2 code withparameter k to encode f1, f2, . . . , fn. If this cost exceeds theuncoded cost (which is n · b bits) then we use the uncodedoption, otherwise we select the GPO2 code with parameter k.
Analysis in  shows that the GPO2 code parameterk selected under this strategy is always within one of theoptimum parameter values. Our experiments with seismic datasamples suggest that the increased bit rate due to occasionalsuboptimum code selection is negligible.
2The Rice algorithm also differs from our coding problem in that the Ricecoder encodes a fixed number of samples into a variable number of encodedbits, whereas in our problem we encode a variable number of samples intofixed-length packets.
IV. EXPERIMENTS AND EVALUATIONS
We have conducted compression evaluations on a realsensor network testbed including several sensor nodes. Onesuch node is pictured in Fig. 1. The core componentof each node is an iMote2 sensor mote, a new gen-eration hardware platform for wireless sensor networks.
Fig. 1. Sensor node in Oasis  project.
The iMote2 CPUcore frequency canbe configured tooperate from 13MHz to 416 MHz.In our evaluation,we configured thePXA271 processor onthe iMote2 to operatein a low voltage (0.85V) and low frequency(13 MHz) mode toconserve energy. AnMDA320CA sensor board is connected to the iMote2 througha serial peripheral interface, and a low-pass filter is added tothe sensor board to reduce noise. To provide real data setsto the node for compression evaluation, we wired the sensorconnectors to a PCI-DAC6703 (Digital Analog Converter)board installed in a PC. The PCI-DAC6703 provides 16channels of 16-bit analog output and 8 digital I/O bits.We read real data sets from an archive and wrote them tothe PCI-DAC6703 board. In this way the PCI-DAC6703simulated real sensor data during the experiments. To form amulti-hop network, we set the radio power to level 2, whichis the lowest setting.
We use the following data sets in our compression experi-ments:(i) FEQ: Frequent EarthQuake swarm data on Sep. 26, 2004,provided by the U. S. Geological Survey (USGS).(ii) LLA: Low Level Activity from crater on Jul. 20, 2007,provided by USGS.(iii) NS: Noisy Storm with rain, wind, flood and mudflows onNov. 6, 2006, provided by USGS.(iv) Calgeo: A seismic data set consisting of dramaticallychanging data samples. This data set is part of the CalgaryCorpus3 and is widely used for compression benchmarking.(v) SS: A data set from environment measurements (such aslight, temperature and sound), retrieved from the SensorScopesystem . It is relatively less volatile.
Our experiments with ALFC use compression parametersR = 14, A = 15, B = 8, (see Section III-A2), and quantizerresolution parameter Qw = 5, Qµ = 11 (see Section III-B).The packet payload is set to 56 bytes. We evaluate compressionperformance using prediction order M = 1 as a “baseline”measure, and also varying M from 3 to 5.
We also compare compression performance of ALFC withthe S-LZW compressor presented in , which is a recently-developed compressor developed for sensor networks, dis-
cussed in Section II-B. S-LZW is a dictionary-based com-pressor that partitions input data into small blocks to achieve abalance between dictionary size and the “hit” rate, i.e., the rateat which matches are found in the dictionary. Both ALFC andS-LZW are low-complexity lossless compression algorithmsthat are designed for sensor networks and do not depend ondependencies between nodes in the network.
A. Compression Ratio and Resource Usage
In , Sadler et al. introduce S-LZW-MC-BWT, which in-corporates a mini-cache and the Burrows-Wheeler Transform(BWT) to achieve the best compression among the relevantS-LZW variations.4
Table I lists the compression ratios achieved on the SS andCalgeo data sets using the S-LZW-MC-BWT compressor withdifferent mini-cache sizes (results taken from ), and forALFC with different prediction orders.
Over the range of parameters evaluated, ALFC demonstratesan advantage in compression effectiveness for both data sets.Both compressors provide much better compression on thebetter-behaved SS data set. The volatile seismic data in theCalgeo data set reduces the mini-cache’s hit rate in S-LZW-MC-BWT, and reduces the accuracy of linear prediction inALFC. Interestingly, on the Calgeo data set, first-order pre-diction outperforms the higher-order prediction in ALFC, andsmaller mini-cache size yields better performance in S-LZW-MC-BWT. Worth to mention that, we realized that Calgeostores data in 32-bit word format and the first 16-bit andsecond 16-bit has very different behavior, but both S-LZWand ALFC just takes it as continuous raw byte streams.
TABLE IAVERAGE PACKET-LEVEL COMPRESSION RATIO ACHIEVED USINGS-LZW-MC-BWT AND OUR LINEAR PREDICTIVE ALGORITHM
S-LZW-MC-BWT Proposed AlgorithmData Mini-Cache Size Prediction Order MSet 8 16 32 64 1 3 4 5Calgeo 1.3 1.25 1.25 1.25 1.54 1.43 1.38 1.35SS 4.1 4.2 3.9 3.5 6.26 6.0 6.30 6.13
We next compare the code size and memory usage of thealgorithms. The default memory usage of S-LZW-MC is about2 kB, which is smaller than other compression algorithmspresented in . Our ALFC implementation uses even lessmemory: 768 bytes, which makes it truly lightweight andsuitable for resource-constrained sensor nodes. Our AFLC im-plementation also has smaller code size of 19.4 kB, comparedto 32 kB required for S-LZW-MC. S-LZW-MC-BWT has evenlarger code size because it includes also a Burrows-Wheelertransform.
Computation cost comparisons between S-LZW-MC-BWTand ALFC are somewhat less direct because the former is
4Reference  also introduces the S-LZW-MC-ST variation, but thiscompressor assumes that the pattern of the data to be compressed is known inadvance and attempts to reorder the data to improve compression performance.Since we assume no such advance knowledge, we make no comparisons withthis variation.
implemented on an 8 MHz TI MSP430 processor (with 10kB RAM and 48 kB on-chip flash memory) while the latteris implemented on the iMote2 running at 13 MHz. S-LZW-MC-BWT compresses data in blocks of 528 bytes, yieldingcompression time of 0.625 ms/byte and 0.587 ms/byte forSS and Calgeo data sets respectively. For ALFC, compressiontime is 0.25 ms/byte for the SS data set and 0.4 ms/byte forthe Calgeo data set. Prediction order was observed to havelittle impact on the compression speed of ALFC. Comparedwith S-LZW-MC-BWT, our ALFC implementation providesslightly slower but more effective compression. Fig. 2 showscompression effectiveness and computation cost of our ALFCimplementation for the different data sets. It’s worth keeping inmind that dictionary-based methods such as S-LZW variationsare much less robust to packet losses than ALFC.
SS FEQ LLA NS Calgeo-1.5
Baseline CompressionLP-3LP-4LP-5Computation Cost
Fig. 2. Comparision of compression ratio and computation cost
Fig. 2 illustrates the compression ratio of AFLC when theprediction order M is varied (indicated in the figure as “LP-M”, or “baseline” for M = 1). ALFC performs significantlybetter on the SS data set than on other data sets, with aaverage compression ratio of 6.17. For the FEQ data set,M = 5 achieves the best compression ratio of 2.3. Fifth orderprediction also delivers the best performance for the LLA dataset. For the NS data set, the compression ratio of M = 3 is1.7, which is 13% better than that of M = 4. M = 1 alsoperforms better than M = 4, suggesting that higher orderlinear prediction is not worth the added overhead on somedata sets. The same situation occurs for the Calgeo data set,which contains volatile seismic data.
We also evaluate local energy savings under ALFC, i.e.,the energy saved when nodes use compression and transmitcompressed packets rather than transmitting raw packets. Here,we take into account the energy for computation, memoryaccess and radio communication. The iMote2 uses a CC2420radio transceiver.5 Fig. 3 shows that for all of the test datasets, sensor nodes consume less local energy when ALFC isused than the when no compression is used. For the seismicdata sets from volcanos (FEQ, LLA and NS), fifth-orderprediction minimizes local energy consumption. The figurealso indicates that the local energy saving depends on thedynamics of the source data. For the Calgeo data set, the
SS FEQ LLA NS Calgeo
Baseline CompressionLP-3 LP-4 LP-5
Fig. 3. Normalized local energy consumption. It is the ratio of energyconsumption with compression against the energy consumption without com-pression
compression algorithm with most overhead bits is most energyexpensive: fifth order prediction consumes approximately 7%more energy than the three alternatives considered.
B. Network throughput and energy efficiency
In this section, we evaluate the performance of our ALFCalgorithm in a multi-hop environment over various seismicdata sets with varying network traffic. The testbed consistsof 5 sensor nodes which form a multi-hop topology, as isshown in Fig. 5. The routing protocol is an enhanced versionof MultihopLQI in TinyOS-1.x. The underlying MAC protocolis the default MAC protocol in TinyOS-1.x, which is a CSMA-based MAC protocol. A desktop PC is connected to the sinknode to collect upstream data from all sensor nodes. The datainjected into sensor nodes for our tests is from the FEQ dataset, which contains 30,000 packets of continuous seismic data.The payload of each packet is 56 bytes. The injected datarate into each node ranges from 5.4 kbps to 20.8 kbps. Theduration of experiments ranges from 10 minutes to 40 minutes,and results are averaged over 5 repeated experiments.
Base Station (PC)
Fig. 5. Topology of the multi-hop sensor network testbed
We first evaluate the energy saving in a network en-vironment. Improved compression yields reduced transmis-sions, thus resulting in lower energy consumption from radiotransceivers. In addition, for a given amount of raw data,increased compression reduces network load, which leads tofewer collisions and thus fewer retransmissions. We observedfrom Fig. 2 that fifth order prediction provides the mosteffective compression among alternatives evaluated, and thusminimizes energy consumption over the network. Fig. 4(a)shows that the margin becomes even larger with increasingdata rate. The energy consumption without compression is
5.4K 8K 10.8K 14.1K 20.8K
Data Rate (bps)
No CompressionBaseline CompressionLP-3LP-4LP-5
5.4K 8K 10.8K 14.1K 20.8K
Data Rate (bps)
No CompressionBaseline CompressionLP-3LP-4LP-5
5.4K 8K 10.8K 14.1K 20.8K
Data Rate (bps)
No CompressionBaseline CompressionLP-3LP-4LP-5
(a) Network energy consumption (b) Delivery ratio (c) Energy efficiency
Fig. 4. The performance evaluation based on the FEQ data set in a multi-hop network with varying data rate.
as high as 2.5, which is almost twice more than that ofAFLC with fifth order prediction, when the data rate is 20.8kbps. In other words, compression significantly reduces energyconsumption by reducing the total amount of transmissions.
We further evaluate the network efficiency by measuring thepacket delivery ratio and energy efficiency. Packet deliveryratio refers to ratio of the successfully decompressed rawsignals at the gateway against the initially collected raw signalsat sensor nodes. When the network is not saturated (e.g., thedata rate from 5.4 kbps to 10.8 kbps), the delivery ratio ofcompressed data flow is slightly lower than the case withoutcompression. This is because compressed packets containsmore raw data samples than uncompressed packets, and theloss of each compressed packet results in more loss of originalsamples. However, as the data rate increases, the networkbecomes saturated and congested, more packets are droppeddue to buffer overflow or transmission collision. When thedata rate increases from 10.8 kbps to 14.1 kbps, the deliveryratio of the uncompressed data flow quickly drops from 0.92to 0.69, while that of the compressed data flow only dropsslightly from 0.9 to 0.8. It can be seen in Fig. 4(b) that fifthorder prediction provides the best delivery ratio of 0.74 at thedata rate of 20.8 kbps, while that of the uncompressed dataflow is only 0.5. The problem of buffer overflow failure andchannel failure due collision is mitigated by compression. Thatis to say, compression algorithm is very important to supporthigh data-rate sensor networks.
Finally, we evaluate energy efficiency, which refers to thenumber of delivered bits per energy unit. It can be calculatedbased on the delivery ratio versus the network energy con-sumption. Using fifth order prediction, the energy efficiencyindex drops slowly from 2.4 to 1.0 as we increase the datarate from 5.4 kbps to 20.8 kbps. Experiencing packet lossand network congestion, the index of the data flow with nocompression applied suffers from a sudden drop at the datarate of 10.8 kbps, and keeps decreasing to 0.2 at the datarate of 20.8 kbps. Fig. 4(c) indicates that, on average, energyefficiency is improved by more than a factor of two by ourALFC algorithm compared to the uncompressed case.
V. CONCLUSION AND FUTURE WORK
We have presented a lightweight lossless compression al-gorithm based on adaptive linear prediction of sample valuesand entropy coding of prediction residuals. The relatively lowoverhead of linear prediction makes it suitable for real-timehigh-fidelity compression. Decoding the data source does notrely on previously received packets, hence compression ishighly robust to packet losses. We also have evaluated ourapproach on a real sensor network testbed by injecting realseismic data sets.
We note two areas for potential improvement. The firstimprovement would be to devise a more effective strategyfor encoding overhead information than the one currentlyemployed. Along these lines, we could envision strategiesfor joint quantization and encoding of overhead information.For example, we might want to alter the quantization ofwi depending on the magnitude of the bias estimate or thevalue of the GPO2 parameter k. Second, the algorithm wouldclearly be more practical if it were able to adaptively choosequantizer resolution based on observation of the data, ratherthan requiring user intervention to make this selection.
We thank the anonymous reviewers and Paul Castro, ourshepherd, for their valuable feedback that helped to improvethis paper.
 R. Szewczyk, J. Polastre, A. Mainwaring, J. Anderson, and D. Culler,“Analysis of a large scale habitat monitoring application,” in Proc. 2ndACM Conf. on Embedded Networked Sensor Systems (SenSys 2004),Baltimore, MD, USA, Nov. 2004.
 J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister,“System architecture directions for networked sensors,” in SIGOPS Oper.Syst. Rev., vol. 34, no. 5, pp. 93–104, Dec. 2000.
 C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed diffusion: ascalable and robust communication paradigm for sensor networks,” inthe 6th annual international Conf. on Mobile computing and networking(MobiCom 2000), Boston, MA, USA, Aug. 2000.
 R. Adler, P. Buonadonna, J. Chhabra, M. Flanigan, L. Krishnamurthy,N. Kushalnagar, L. Nachman, and M. Yarvis, “Design and deploymentof industrial sensor networks: Experiences from the north sea and asemiconductor plant,” in Proc. 3rd ACM Conf. Embedded NetworkedSensor Systems (SenSys 2005), San Diego, CA, USA, Nov. 2005.
 G. Werner-Allen, K. Lorincz, J. Johnson, J. Lees, and M. Welsh,“Fidelity and yield in a volcano monitoring sensor network,” in Proc. 7thUSENIX Symposium on Operating Systems Design and Implementation(OSDI), Seattle, WA, USA, Nov. 2006.
 W.-Z. Song, B. Shirazi, R. Lahusen, S. Kedar, S. Chien, F. Webb,J. Pallister, D. Dzurisin, S. Moran, M. Lisowski, D. Tran, A. Davis, andD. Pieri, “Optimized autonomous space in-situ sensor-web for volcanomonitoring,” in IEEE Aerospace 2008, Big Sky, MT, USA, Mar. 2008.
 N. Xu, S. Rangwala, K. K. Chintalapudi, D. Ganesan, A. Broad,R. Govindan, and D. Estrin, “A wireless sensor network for structuralmonitoring,” in Proc. 2nd ACM Conf. on Embedded Networked SensorSystems (SenSys 2004), Baltimore, MD, USA, Nov. 2004.
 S. N. Pakzad, S. Kim, G. L. Fenves, S. D. Glaser, D. E. Culler,and J. W. Demmel, “Multi-purpose wireless accelerometers for civilinfrastructure monitoring,” in 5th International Workshop on StructuralHealth Monitoring (IWSHM 2005), Stanford, Sep. 2005.
 C. M. Sadler and M. Martonosi, “Data compression algorithms forenergy-constrained devices in delay tolerant networks,” in Proc. 4thACM Conf. on Embedded Networked Sensor Systems (SenSys 2006),Boulder, CO, USA, Nov. 2006.
 R. F. Rice, “Some Practical Universal Noiseless Coding Techniques, PartIII,” JPL Publication 83-17, Jet Propulsion Laboratory, Mar. 1983.
 Consultative Committee for Space Data Systems, CCSDS 121.0-B-1:Lossless Data Compression, Blue Book, issue 1, May 1997.
 M. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO-I lossless imagecompression algorithm: principles and standardization into JPEG-LS,”IEEE Trans. Image Process., vol. 9, no. 8, pp. 1309–1324, 2000.
 SEED Reference Manual: Standard for the Exchange of EarthquakeData, SEED Format Version 2.4, Incorporated Research Institutions forSeismology, Oct. 2007.
 Y.W. Nijim, S.D. Stearns, and W.B. Mikhael, “Lossless compressionof seismic signals using differentiation,” IEEE Trans. Geosci. RemoteSens., vol. 34, no. 1, pp. 52–56, 1996.
 S.D. Stearns, L.Z. Tan, and N. Magotra, “Lossless compression ofwaveform data for efficient storage and transmission,” IEEE Trans.Geosci. Remote Sens., vol. 31, no. 3, pp. 645–654, 1993.
 Y.W. Nijim, S.D. Stearns, and W.B. Mikhael, “Quantitative performanceevaluation of the lossless compressionapproach using pole-zero model-ing,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 1, pp. 39–43, 2000.
 N. Magotra, W. McCoy, F. Livingston, S. Stearns, “Lossless data com-pression using adaptive filters,” in Proc. 20th International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP), Detroit, MI,USA, May 1995
 R.W. Ives, N. Magotra, S.D. Stearns, “Effects of multiple-pass filteringin lossless predictive compression of waveform data,” IEEE Trans.Geosci. Remote Sens., vol. 40, no. 11, pp. 2448–2453, 2002.
 Y. Zhang and J. Li, “Efficient seismic response data storage and trans-mission using ARX model-based sensor data compression algorithm,”Earthq. Eng. Struct. Dyn., vol. 35, no. 6, pp. 781–788, 2006.
 G. Mandyam, N. Magotra, and W. McCoy, “Lossless seismic datacompression using adaptive linear prediction,” IEEE International Geo-science & Remote Sensing Symposium (IGARSS 1996), vol. 2, no. 2, pp.1029–1031, May, 1996.
 S.D. Stearns, “Arithmetic coding in lossless waveform compression,”IEEE Transactions on Signal Processing, vol. 43, no. 8, pp. 1874–1879,Aug. 1995.
 S. S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed compres-sion in a dense microsensor network,” IEEE Signal Process. Mag., vol.19, no. 2, pp. 51–60, Mar. 2002.
 J. Chou, D. Petrovic, and K. Ramachandran, “A distributed and adaptivesignal processing approach to reducing energy consumption in sensornetworks,” in Proc. 22nd Annual Joint Conference of the IEEE Computerand Communications Societies (INFOCOM 2003), San Francisco, CA,USA, Apr. 2003.
 D. Ganesan, D. Estrin, and J. Heidemann, “Dimensions: Why do weneed a new data handling architecture for sensor networks,” in Proc. theACM Workshop on Hot Topics in Networks, Princeton, NJ, USA, Oct.2002.
 R. S. Wagner, R. G. Baraniuk, S. Du, D. B. Johnson, and A. Cohen,“An architecture for distributed wavelet analysis and processing insensor networks,” in Proc. 5th International Conference on InformationProcessing in Sensor Networks (IPSN 2006), Nashville, TN, USA, Apr.2006.
 A. Ciancio and A. Ortega, “A distributed wavelet compression algorithmfor wireless multihop sensor networks using lifting,” in 30th IEEE In-ternational Conf. on Acoustics, Speech, and Signal Processing (ICASSP2005), Philadelphia, PA, USA, Mar. 2005.
 K. C. Barr and K. Asanovic, “Energy aware lossless data compres-sion,”in 1st international Conf. on Mobile Systems, Applications andServices (MobiSys), San Francisco, CA, USA, May 2003.
 T. A. Welch, “A technique for high-performance data compression,”IEEE Computer, vol. 17, no. 6, pp. 8–19, Jun. 1984.
 S. Puthenpurayil, R. Gu, and S.S. Bhattacharyya, “Energy-Aware DataCompression for Wireless Sensor Networks,” in 32nd InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP 2007),Honolulu, HI, USA, Apr. 2007.
 F. Huang and Y. Liang, “Towards Energy Optimization in EnvironmentalWireless Sensor Networks for Lossless and Reliable Data Gathering,” inIEEE Internatonal Conf. on Mobile Adhoc and Sensor Systems (MASS2007), Pisa, Italy, Oct. 2007.
 C. Alippi, R. Camplani, and C. Galperti, “Lossless CompressionTechniques in Wireless Sensor Networks: Monitoring MicroacousticEmissions,” in 2007 International Workshop on Robotic and SensorsEnvironments (ROSE 2007), Ottawa, Canada, Oct. 2007.
 A. B. Kiely, “Lossless compression of seismic data into fixed-lengthpackets,” IPN Progress Report, vol. 42-173, pp. 1–13, May, 2008.
 A. Gersho, “Adaptive filtering with binary reinforcement,” IEEE Trans.Inform. Theory, vol. 30, no. 2, pp. 191–199, Mar. 1984.
 S. W. Golomb, “Run-length encodings,” IEEE Trans. Inform. Theory,vol. 12, no. 3, pp. 399–401, Jul. 1966.
 A. B. Kiely, “Selecting the Golomb parameter in Rice coding,” IPNProgress Report, vol. 42-159, pp. 1–18, Nov. 2004.
 T. Schmid, H. Dubois-Ferriere, and M. Vetterli, “Sensorscope: Experi-ences with a wireless building monitoring sensor network,” in Workshopon Real-World Wireless Sensor Networks (RealWSN), Stockholm, Swe-den, Jun. 2005.