Transcript
Page 1: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - File-Size Preserving LZ Encoding for Reversible Data Embedding

File-Size Preserving LZ Encoding for ReversibleData Embedding

Hidetoshi YokooDepartment of Computer Science, Gunma University

Kiryu 376–8515, JAPANEmail: [email protected]

We propose an LZ77 variation that, instead of reducing the file size, leaves it of thesame size but embeds additional data in the file. We evaluate its asymptotic embeddingcapacity, and show that, for a data string drawn from a stationary and ergodic α-arysource with the entropy rate H , the expected embedding capacity per symbol achievedby our algorithm approaches log α − H bits as the data length increases.

If one uses an ideal (= asymptotically optimal) compression method and embedsadditional data by simply appending them at the end of the compressed data, the aboveresult on the embedding capacity is quite obvious. Therefore, we should argue about thesignificance of our study.

We first note that, in any actual data embedding applications, it is rare to apply sucha trivial method that uses the space left by compression to embed data. Especially whenwe use such a compression method and try to achieve asymptotically optimal embeddingin the above sense, we may have to wait for sufficiently long before actual embedding.If we wish to perform real-time embedding, we have to develop a new method instead ofresorting to compression methods. In that case, whether a developed method can attainasymptotically optimal embedding becomes an individual problem.

Our method uses the reference multiplicity of LZ77 [1], which has already beenapplied to various data embedding problems. Interested readers should refer to [2] andthe references therein for application examples. LZ77 parses a data stream into phrasesas the longest matches to the previous part and encodes each phrase by its length anddistance to the match. If the same phrase has appeared more than once, the multiplicityof the match can be used to embed extra information. For example, if a phrase has beenalready repeated four times, we can embed a 2-bit watermark in the codeword by choosingone of the four copies. Based on this idea, we have developed an encoding algorithm thatspecializes in data embedding. It performs neither compression nor expansion. Instead, itasymptotically achieves the embedding rate mentioned in the first paragraph above. Ouremphasis is on the fact that a method as an extension of practical schemes for real-timeembedding can achieve the rate with no compression support.

Finally, as a generalization of the proposed encoder, we define a class of compression–embedding hybrid schemes. We have empirically evaluated the sum of savings obtainedby both compression and embedding over the class.

REFERENCES

[1] J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inform.Theory, vol. IT-23, no.3, pp. 337–349, May 1977.

[2] D. Dube and V. Beaudoin, Constructing optimal whole-bit recycling codes, 2009 IEEE InformationTheory Workshop on Networking and Information Theory (ITW 2009), Volos, Greece, June 2009.

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.78

559

Top Related