[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

1
File-Size Preserving LZ Encoding for Reversible Data Embedding Hidetoshi Yokoo Department of Computer Science, Gunma University Kiryu 376–8515, JAPAN Email: [email protected] We propose an LZ77 variation that, instead of reducing the file size, leaves it of the same size but embeds additional data in the file. We evaluate its asymptotic embedding capacity, and show that, for a data string drawn from a stationary and ergodic α-ary source with the entropy rate H , the expected embedding capacity per symbol achieved by our algorithm approaches log α - H bits as the data length increases. If one uses an ideal (= asymptotically optimal) compression method and embeds additional data by simply appending them at the end of the compressed data, the above result on the embedding capacity is quite obvious. Therefore, we should argue about the significance of our study. We first note that, in any actual data embedding applications, it is rare to apply such a trivial method that uses the space left by compression to embed data. Especially when we use such a compression method and try to achieve asymptotically optimal embedding in the above sense, we may have to wait for sufficiently long before actual embedding. If we wish to perform real-time embedding, we have to develop a new method instead of resorting to compression methods. In that case, whether a developed method can attain asymptotically optimal embedding becomes an individual problem. Our method uses the reference multiplicity of LZ77 [1], which has already been applied to various data embedding problems. Interested readers should refer to [2] and the references therein for application examples. LZ77 parses a data stream into phrases as the longest matches to the previous part and encodes each phrase by its length and distance to the match. If the same phrase has appeared more than once, the multiplicity of the match can be used to embed extra information. For example, if a phrase has been already repeated four times, we can embed a 2-bit watermark in the codeword by choosing one of the four copies. Based on this idea, we have developed an encoding algorithm that specializes in data embedding. It performs neither compression nor expansion. Instead, it asymptotically achieves the embedding rate mentioned in the first paragraph above. Our emphasis is on the fact that a method as an extension of practical schemes for real-time embedding can achieve the rate with no compression support. Finally, as a generalization of the proposed encoder, we define a class of compression– embedding hybrid schemes. We have empirically evaluated the sum of savings obtained by both compression and embedding over the class. REFERENCES [1] J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inform. Theory, vol. IT-23, no.3, pp. 337–349, May 1977. [2] D. Dub´ e and V. Beaudoin, Constructing optimal whole-bit recycling codes, 2009 IEEE Information Theory Workshop on Networking and Information Theory (ITW 2009), Volos, Greece, June 2009. 2010 Data Compression Conference 1068-0314/10 $26.00 © 2010 IEEE DOI 10.1109/DCC.2010.78 559

Upload: hidetoshi

Post on 21-Feb-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - File-Size Preserving LZ Encoding for Reversible Data Embedding

File-Size Preserving LZ Encoding for ReversibleData Embedding

Hidetoshi YokooDepartment of Computer Science, Gunma University

Kiryu 376–8515, JAPANEmail: [email protected]

We propose an LZ77 variation that, instead of reducing the file size, leaves it of thesame size but embeds additional data in the file. We evaluate its asymptotic embeddingcapacity, and show that, for a data string drawn from a stationary and ergodic α-arysource with the entropy rate H , the expected embedding capacity per symbol achievedby our algorithm approaches log α − H bits as the data length increases.

If one uses an ideal (= asymptotically optimal) compression method and embedsadditional data by simply appending them at the end of the compressed data, the aboveresult on the embedding capacity is quite obvious. Therefore, we should argue about thesignificance of our study.

We first note that, in any actual data embedding applications, it is rare to apply sucha trivial method that uses the space left by compression to embed data. Especially whenwe use such a compression method and try to achieve asymptotically optimal embeddingin the above sense, we may have to wait for sufficiently long before actual embedding.If we wish to perform real-time embedding, we have to develop a new method instead ofresorting to compression methods. In that case, whether a developed method can attainasymptotically optimal embedding becomes an individual problem.

Our method uses the reference multiplicity of LZ77 [1], which has already beenapplied to various data embedding problems. Interested readers should refer to [2] andthe references therein for application examples. LZ77 parses a data stream into phrasesas the longest matches to the previous part and encodes each phrase by its length anddistance to the match. If the same phrase has appeared more than once, the multiplicityof the match can be used to embed extra information. For example, if a phrase has beenalready repeated four times, we can embed a 2-bit watermark in the codeword by choosingone of the four copies. Based on this idea, we have developed an encoding algorithm thatspecializes in data embedding. It performs neither compression nor expansion. Instead, itasymptotically achieves the embedding rate mentioned in the first paragraph above. Ouremphasis is on the fact that a method as an extension of practical schemes for real-timeembedding can achieve the rate with no compression support.

Finally, as a generalization of the proposed encoder, we define a class of compression–embedding hybrid schemes. We have empirically evaluated the sum of savings obtainedby both compression and embedding over the class.

REFERENCES

[1] J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inform.Theory, vol. IT-23, no.3, pp. 337–349, May 1977.

[2] D. Dube and V. Beaudoin, Constructing optimal whole-bit recycling codes, 2009 IEEE InformationTheory Workshop on Networking and Information Theory (ITW 2009), Volos, Greece, June 2009.

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.78

559