[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

1
A pseudo-random number generator based on LZSS Weiling Chang 1 *, Binxing Fang 1, 2 , Xiaochun Yun 2 , Shupeng Wang 2 , Xiangzhan Yu 1 1. Research Centre of Computer Network and Information Security Technology, Harbin Institute of Technology, Harbin 150001, China 2. Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China *Email: [email protected] A pseudo-random sequence generator (PRNG), L12RC4, inspired by the LZSS compression algorithm and RC4 stream cipher, was presented and implemented. In LZSS, the encoded file consists of a sequence of items, each of which is either a single character (literal) or a pointer of the form (index, length). The probability distributions of the literal, length and flag bit values are different from uniform, it is worth to coding them using an entropy coding such as the Huffman or the arithmetic coding. However, the index value is defined not only by the context of data stream but also by its temporal position in the window, it has uniform or near uniform probability distribution, so we can use this characteristics to design a pseudo-random number generator. The variance-to-mean ratio is a normalized measure of the dispersion of a probability distribution. It is defined as the ratio of the variance σ2 to the mean μ, D = . If the variance to mean ratio is approximately 1 we can conclude that the distribution is random. If the ratio is greater than 1 we can conclude that the distribution is clumped and for values less than 1 the distribution is uniform. The more uniformly distributed, the less the variance. Figure 1 shows the VMR (variance-to-mean ratio) values under differently parameterized LZSS. In figure 1, the x-axis is the LZSS with different INDEX_BIT_COUNT value, the y-axis is the VMR of index value frequency. The One pass means compressing the test file using LZSS once, the double pass mode means compressing the compressed file again using LZSS. As can be seen from Figure 1 that the VMR value of the double pass mode is less than 1 and the VMR value of one pass mode is greater than 1, and the VMR declines with the increasement of INDEX_BIT_COUNT for the one pass mode. Thus the double pass mode has better uniformity than the one pass mode. The probability distribution of index values is about uniform, so they can be utilized to generate pseudorandom bit sequences. In order to remove the statistical characteristics of original file, we first compress the test file using original LZSS, and then encode the compressed file again using different randomization algorithms which are the modification to the original LZSS algorithm and test its randomness. The result of the NIST and Diehard test suite indicate that the L12RC4 is a good PRNG, and so it seems to be sound and may be suitable for use in some cryptographic applications. This work is supported by the National High-Tech Development 863 Program of China (Grant Nos. 2009AA01A403, 2007AA01Z406, 2007AA010501, 2009AA01Z437) 2010 Data Compression Conference 1068-0314/10 $26.00 © 2010 IEEE DOI 10.1109/DCC.2010.77 524

Upload: xiangzhan

Post on 24-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - A Pseudo-Random Number Generator Based on LZSS

A pseudo-random number generator based on LZSS

Weiling Chang1*, Binxing Fang1, 2, Xiaochun Yun2, Shupeng Wang2, Xiangzhan Yu1 1. Research Centre of Computer Network and Information Security Technology,

Harbin Institute of Technology, Harbin 150001, China 2. Institute of Computing Technology, Chinese Academy of Science, Beijing 100190,

China *Email: [email protected]

A pseudo-random sequence generator (PRNG), L12RC4, inspired by the LZSS compression algorithm and RC4 stream cipher, was presented and implemented.

In LZSS, the encoded file consists of a sequence of items, each of which is either a single character (literal) or a pointer of the form (index, length). The probability distributions of the literal, length and flag bit values are different from uniform, it is worth to coding them using an entropy coding such as the Huffman or the arithmetic coding. However, the index value is defined not only by the context of data stream but also by its temporal position in the window, it has uniform or near uniform probability distribution, so we can use this characteristics to design a pseudo-random number generator.

The variance-to-mean ratio is a normalized measure of the dispersion of a probability distribution. It is defined as the ratio of the variance σ2 to the mean μ, D =

. If the variance to mean ratio is approximately 1 we can conclude that the

distribution is random. If the ratio is greater than 1 we can conclude that the distribution is clumped and for values less than 1 the distribution is uniform. The more uniformly distributed, the less the variance. Figure 1 shows the VMR (variance-to-mean ratio) values under differently parameterized LZSS. In figure 1, the x-axis is the LZSS with different INDEX_BIT_COUNT value, the y-axis is the VMR of index value frequency. The One pass means compressing the test file using LZSS once, the double pass mode means compressing the compressed file again using LZSS. As can be seen from Figure 1 that the VMR value of the double pass mode is less than 1 and the VMR value of one pass mode is greater than 1, and the VMR declines with the increasement of INDEX_BIT_COUNT for the one pass mode. Thus the double pass mode has better uniformity than the one pass mode.

The probability distribution of index values is about uniform, so they can be utilized to generate pseudorandom bit sequences. In order to remove the statistical characteristics of original file, we first compress the test file using original LZSS, and then encode the compressed file again using different randomization algorithms which are the modification to the original LZSS algorithm and test its randomness. The result of the NIST and Diehard test suite indicate that the L12RC4 is a good PRNG, and so it seems to be sound and may be suitable for use in some cryptographic applications.

This work is supported by the National High-Tech Development 863 Program of China (Grant Nos. 2009AA01A403, 2007AA01Z406, 2007AA010501, 2009AA01Z437)

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.77

524