[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

1
Lossless Compression of Maps, Charts, and Graphs via Color Separation Saif alZahir and Arber Borici Graphics and Image Processing Lab, Computer Science University of N. British Columbia, BC, Canada E-mail: [email protected] and [email protected] In this research, we present a fast and efficient lossless compression scheme for discrete-color digital map images, charts, and graphs stored in the raster image format. The proposed scheme determines the number of different colors in the given image and creates a separate bi-level data layer for each color. Then, the bi-level layers are individually compressed using the proposed algorithm. This scheme comprises two components: (i) a codebook; and (ii) our row-column reduction coding algorithm, RCRC. The codebook is a fixed–to–variable Huffman dictionary that is based on symbol entropy. In order to construct an efficient and practical codebook, we performed a frequency analysis on a sample of more than 250,000 non- overlapping 8×8 blocks. The data were obtained by partitioning 120 randomly chosen noiseless binary image samples which we believe strongly represent binary layers of map, chart and graph images. Based on the frequency analysis, we identified a total of 65,534 unique blocks. In this research, we only included 6,952 blocks as they occurred more than once. We employed the Huffman algorithm to generate the Huffman codes for the selected blocks. To employ this codebook on our test images, we compare each 8×8 block of an input layer matrix to the 8×8 block entries in the codebook. If a match occurs, we compress the block using the corresponding Huffman code, and continue until all blocks are processed. The second component of our scheme is a new algorithm, the RCRC, designed to deal with those blocks that are not found in the codebook. The RCRC proceeds as follows. For each 8×8 block, generate a row reference vector (RRV), a column reference vector (CRV), and a reduced block (RB). We construct the RRV as follows: (i) we create an empty 8×1 vector; (ii) we compare the first row of bits in the block with the second row. If they are identical, we place ‘1’ in the first RRV location and ‘0’ in the second RRV bit location; (iii) we compare the third row with the first row. If they are identical we assign ‘0’ to the third RRV index as well. We continue until we encounter a non-identical row or until we reach the last row of the block; (iv) if two consecutive rows are not identical, we place ‘1’ for both rows in the corresponding RRV locations. At this point, the second row becomes the current row and we proceed as above; (v) remove all identical rows from the block to obtain a row-reduced block. We follow the same procedure for the columns of the row-reduced block. In this case, we generate the 1×8 column reference vector (CRV) whose zero-valued entries correspond to columns that have been removed from the block. The output of RCRC will be a row-column-reduced block (RB), given by concatenating the bits in RRV, CRV, and RB. Our experimental results show that our lossless compression scheme achieved an average compression equal to 0.035 bpp for map images and 0.03 bpp for charts and graphs. These results are better than most reported results in the literature. Moreover, our scheme is simple and fast. 2010 Data Compression Conference 1068-0314/10 $26.00 © 2010 IEEE DOI 10.1109/DCC.2010.102 518

Upload: arber

Post on 11-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Lossless Compression of Maps, Charts, and Graphs via Color Separation

Lossless Compression of Maps, Charts, and Graphs via Color Separation Saif alZahir and Arber Borici

Graphics and Image Processing Lab, Computer Science University of N. British Columbia, BC, Canada

E-mail: [email protected] and [email protected] In this research, we present a fast and efficient lossless compression scheme for discrete-color digital map images, charts, and graphs stored in the raster image format. The proposed scheme determines the number of different colors in the given image and creates a separate bi-level data layer for each color. Then, the bi-level layers are individually compressed using the proposed algorithm. This scheme comprises two components: (i) a codebook; and (ii) our row-column reduction coding algorithm, RCRC. The codebook is a fixed–to–variable Huffman dictionary that is based on symbol entropy. In order to construct an efficient and practical codebook, we performed a frequency analysis on a sample of more than 250,000 non-overlapping 8×8 blocks. The data were obtained by partitioning 120 randomly chosen noiseless binary image samples which we believe strongly represent binary layers of map, chart and graph images. Based on the frequency analysis, we identified a total of 65,534 unique blocks. In this research, we only included 6,952 blocks as they occurred more than once. We employed the Huffman algorithm to generate the Huffman codes for the selected blocks. To employ this codebook on our test images, we compare each 8×8 block of an input layer matrix to the 8×8 block entries in the codebook. If a match occurs, we compress the block using the corresponding Huffman code, and continue until all blocks are processed. The second component of our scheme is a new algorithm, the RCRC, designed to deal with those blocks that are not found in the codebook. The RCRC proceeds as follows. For each 8×8 block, generate a row reference vector (RRV), a column reference vector (CRV), and a reduced block (RB). We construct the RRV as follows: (i) we create an empty 8×1 vector; (ii) we compare the first row of bits in the block with the second row. If they are identical, we place ‘1’ in the first RRV location and ‘0’ in the second RRV bit location; (iii) we compare the third row with the first row. If they are identical we assign ‘0’ to the third RRV index as well. We continue until we encounter a non-identical row or until we reach the last row of the block; (iv) if two consecutive rows are not identical, we place ‘1’ for both rows in the corresponding RRV locations. At this point, the second row becomes the current row and we proceed as above; (v) remove all identical rows from the block to obtain a row-reduced block. We follow the same procedure for the columns of the row-reduced block. In this case, we generate the 1×8 column reference vector (CRV) whose zero-valued entries correspond to columns that have been removed from the block. The output of RCRC will be a row-column-reduced block (RB), given by concatenating the bits in RRV, CRV, and RB. Our experimental results show that our lossless compression scheme achieved an average compression equal to 0.035 bpp for map images and 0.03 bpp for charts and graphs. These results are better than most reported results in the literature. Moreover, our scheme is simple and fast.

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.102

518