Download - Lossless - GBV
0ШШШ • -
J&> ELSEVIER Lossless (ompression
Handbook EDITOR KHALID SAYOOD
ш w W 1 i v W •Щ: • щ • . •.•. 1 9 ^ 1 R H F : :
-ашштюшШГ 1
мяШ' ж ЁИИк щВшш Яш' ШШШШШ
A§»V % ; ! ' >
1
Contents
List of Contributors xvii
Preface xix
Part I: Theory
Chapter 1: Information Theory behind Source Coding 3 Frans M. J. Willems and TjallingJ. Tjalkens 1.1 Introduction 3
1.1.1 Definition of Entropy 3 1.1.2 Properties of Entropy 4 1.1.3 Entropy as an Information Measure 5 1.1.4 Joint Entropy and Conditional Entropy 6 1.1.5 Properties of Joint Entropy and Conditional Entropy 7 1.1.6 Interpretation of Conditional Entropy 8
1.2 Sequences and Information Sources 9 1.2.1 Sequences 9 1.2.2 Information Sources 9 1.2.3 Memoryless Sources 10 1.2.4 Binary Sources 10 1.2.5 Discrete Stationary Sources 10 1.2.6 The Entropy Rate 11
1.3 Variable-Length Codes for Memoryless Sources 11 1.3.1 A Source Coding System, Variable-Length Codes
for Source Symbols 12 1.3.2 Unique DecodabiUty, Prefix Codes 13 1.3.3 Kraft's Inequality for Prefix Codes and Its Counterpart 14 1.3.4 Redundancy, Entropy, and Bounds 16 1.3.5 Variable-Length Codes for Blocks of Symbols 17
1.4 Variable-Length Codes for Sources with Memory 18 1.4.1 Block Codes Again 18 1.4.2 The Elias Algorithm 19 1.4.3 Representation of Sequences by Intervals 19 1.4.4 Competitive Optimality 25
vii
viii CONTENTS
1.5 Fixed-Length Codes for Memoryless Sources, the AEP 26 1.5.1 The Fixed-Length Source Coding Problem 27 1.5.2 Some Probabilities 27 1.5.3 An Example Demonstrating the Asymptotic Equipartition Property 28 1.5.4 The Idea behind Fixed-Length Source Coding 28 1.5.5 Rate and Error Probability 28 1.5.6 A Hamming Ball 30 1.5.7 An Optimal Balance between R and Pf 31 1.5.8 The Fixed-Length Coding Theorem 33 1.5.9 Converse and Conclusion 34
1.6 References 34
Chapter 2: Complexity Measures 35 Stephen R. Tate
2.1 Introduction 35 2.1.1 An Aside on Computability 36
2.2 Concerns with Shannon Information Theory 37 2.2.1 Strings versus Sources 37 2.2.2 Complex Non-random Sequences 37 2.2.3 Structured Random Strings 37
2.3 Kolmogorov Complexity 38 2.3.1 Basic Definitions 39 2.3.2 Incompressibility 40 2.3.3 Prefix-free Encoding 42
2.4 Computational Issues of Kolmogorov Complexity 44 2.4.1 Resource-Bounded Kolmogorov Complexity 45 2.4.2 Lower-Bounding Kolmogorov Complexity 46
2.5 Relation to Shannon Information Theory 47 2.5.1 Approach 1: An InfiniteSequence of Sources 48 2.5.2 Approach 2: Conditional Complexities 49 2.5.3 Discussion 50
2.6 Historical Notes 51 2.7 Further Reading 51 2.8 References 51
Part I I: Compression Techniques
Chapter 3: Universal Codes 55 Peter Fenwick
3.1 Compact Integer Representations 55 3.2 Characteristics of Universal Codes 55 3.3 Polynomial Representations 56 3.4 Unary Codes 57 3.5 Levenstein and Elias Gamma Codes 58 3.6 Elias Omega and Even-Rodeh Codes 59 3.7 Rice Codes 60 3.8 Golomb Codes 62 3.9 Start-Step-Stop Codes 64
CONTENTS ix
3.10 Fibonacci Codes 65 3.10.1 Zeckendorf Representation 66 3.10.2 Fraenkel and Klein Codes 66 3.10.3 Higher-Order Fibonacci Representations 66 3.10.4 Apostolico and Fraenkel Codes 67 3.10.5 A New Order-3 Fibonacci Code 69
3.11 Ternary Comma Codes 70 3.12 Summation Codes 71
3.12.1 Goldbach Gi Codes 72 3.12.2 Additive Codes 72
3.13 Wheeler 1/2 Code and Run-Lengths 73 3.13.1 The Wheeler 1/2 Code 74 3.13.2 Using the Wheeler 1/2 Code 74
3.14 Comparison of Representations 75 3.15 Final Remarks 77 3.16 References 78
Chapter 4: Huffman Coding 79 Steven Pigeon
4.1 Introduction 79 4.2 Huffman Codes 80
4.2.1 Shannon-Fano Coding 80 4.2.2 Building Huffman Codes 80 4.2.3 N-ary Huffman Codes 83 4.2.4 Canonical Huffman Coding 84 4.2.5 Performance of Huffman Codes 84
4.3 Variations on a Theme 86 4.3.1 Modified Huffman Codes 87 4.3.2 Huffman Prefixed Codes 87 4.3.3 Extended Huffman Codes 87 4.3.4 Length-Constrained Huffman Codes 89
4.4 Adaptive Huffman Coding 89 4.4.1 Brute Force Adaptive Huffman 89 4.4.2 The Faller, Gallager, and Knuth (FGK) Algorithm 91 4.4.3 Vitter's Algorithm: Algorithm Л 93 4.4.4 Other Adaptive Huffman Coding Algorithms 93 4.4.5 An Observation on Adaptive Algorithms 94
4.5 Efficient Implementations 94 4.5.1 Memory-Efficient Algorithms 95 4.5.2 Speed-Efficient Algorithms 95
4.6 Conclusion and Further Reading 97 4.7 References 97
Chapter 5: Arithmetic Coding 101 Amir Said
5.1 Introduction 101 5.2 Basic Principles 103
5.2.1 Notation 103
X CONTENTS
5.2.2 Code Values 104 5.2.3 Arithmetic Coding 106 5.2.4 Optimality of Arithmetic Coding 111 5.2.5 Arithmetic Coding Properties 112
5.3 Implementation 120 5.3.1 Coding with Fixed-Precision Arithmetic 121 5.3.2 Adaptive Coding 132 5.3.3 Complexity Analysis 142 5.3.4 Further Reading 147
5.4 References 150
Chapter 6: Dictionary-Based Data Compression: An Algorithmic Perspective 153 S. Cenk Sahinalp and Nasir M. Rajpoot
6.1 Introduction 153 6.2 Dictionary Construction: Static versus Dynamic 154
6.2.1 Static Dictionary Methods 154 6.2.2 Parsing Issues 155 6.2.3 Semidynamic and Dynamic Dictionary Methods 157
6.3 Extensions of Dictionary Methods for Compressing Biomolecular Sequences 162 6.3.1 The Biocompress Program 162 6.3.2 The GenCompress Program 162
6.4 Data Structures in Dictionary Compression 163 6.4.1 Tries and Compact Tries 163 6.4.2 Suffix Trees 163 6.4.3 Trie-Reverse Trie Pairs 164 6.4.4 Karp-Rabin Fingerprints 164
6.5 Benchmark Programs and Standards 165 6.5.1 The g z i p Program 165 6.5.2 The compress Program 165 6.5.3 The GIF Image Compression Standard 166 6.5.4 Modem Compression Standards: v . 4 2 b i s and v . 44 166
6.6 References 166
Chapter 7: Burrows-Wheeler Compression 169 Peter Fenwick
7.1 Introduction 169 7.2 The Burrows-Wheeler Algorithm 170 7.3 The Burrows-Wheeler Transform 170
7.3.1 The Burrows-Wheeler Forward Transformation 170 7.3.2 The Burrows-Wheeler Reverse Transformation 171 7.3.3 Illustration of the Transformations 171 7.3.4 Algorithms for the Reverse Transformation 172
7.4 Basic Implementations 173 7.4.1 The Burrows-Wheeler Transform or Permutation 173 7.4.2 Move-To-Front Recoding 174 7.4.3 Statistical Coding 176
7.5 Relation to Other Compression Algorithms 180 7.6 Improvements to Burrows-Wheeler Compression 180
CONTENTS xi
7.7 Preprocessing 181 7.8 The Permutation 181
7.8.1 Suffix Trees 183 7.9 Move-To-Front 183
7.9.1 Move-To-Front Variants 184 7.10 Statistical Compressor 185 7.11 Eliminating Move-To-Front 187 7.12 Using the Burrows-Wheeler Transform in File Synchronization 189 7.13 Final Comments 190 7.14 Recent Developments 190 7.15 References 191
Chapter 8: Symbol-Ranking and ACB Compression 195 Peter Fen wick
8.1 Introduction 195 8.2 Symbol-Ranking Compression 195
8.2.1 Shannon Coder 196 8.2.2 History of Symbol-Ranking Compressors 197 8.2.3 An Example of a Symbol-Ranking Compressor 197 8.2.4 A Fast Symbol-Ranking Compressor 200
8.3 Buynovsky's ACB Compressor 201 8.4 References 204
Part 111: Applications
Chapter 9: Lossless Image Compression 207 K. P. Subbalakshmi
9.1 Introduction 207 9.2 Preliminaries 208
9.2.1 Spatial Prediction 209 9.2.2 Hierarchical Prediction 211 9.2.3 Error Modeling 212 9.2.4 Scanning Techniques 212
9.3 Prediction for Lossless Image Compression 214 9.3.1 Switched Predictors 214 9.3.2 Combined Predictors 217
9.4 Hierarchical Lossless Image Coding 220 9.5 Conclusions 222 9.6 References 223
Chapter 10: Text Compression 227 Amar Mukherjee and Fauzia Awan
10.1 Introduction 227 10.2 Information Theory Background 228 10.3 Classification of Lossless Compression Algorithms 229
10.3.1 Statistical Methods 229
xii CONTENTS
10.3.2 Dictionary Methods 232 10.3.3 Transform-Based Methods: The Burrows-Wheeler Transform (BWT) 233 10.3.4 Comparison of Performance of Compression Algorithms 233
10.4 Transform-Based Methods: Star (*) Transform and Length-Index Preserving Transform 234 10.4.1 Star (*) Transformation 234 10.4.2 Length-Index Preserving Transform (LIPT) 235 10.4.3 Experimental Results 237 10.4.4 Timing Performance Measurements 240
10.5 Three New Transforms—ILPT, NTT, and LIT 241 10.6 Conclusions 243 10.7 References 243
Chapter 11: Compression of Telemetry 247 Sheila Horan
11.1 What is Telemetry? 247 11.2 Issues Involved in Compression of Telemetry 250
11.2.1 Why Use Compression on Telemetry 250 11.2.2 Structure of the Data 251 11.2.3 Size Requirements 251
11.3 Existing Telemetry Compression 252 11.4 Future of Telemetry Compression 253 11.5 References 253
Chapter 12: Lossless Compression of Audio Data 255 Robert С Mäher
12.1 Introduction 255 12.1.1 Background 255 12.1.2 Expectations 256 12.1.3 Terminology 257
12.2 Principles of Lossless Data Compression 257 12.2.1 Basic Redundancy Removal 257 12.2.2 Amplitude Range and Segmentation 259 12.2.3 Multiple-Channel Redundancy 260 12.2.4 Prediction 260 12.2.5 Entropy Coding 262 12.2.6 Practical System Design Issues 263 12.2.7 Numerical Implementation and Portability 263 12.2.8 Segmentation and Resynchronization 263 12.2.9 Variable Bit Rate: Peak versus Average Rate 264 12.2.10 Speed and Complexity 264
12.3 Examples of Lossless Audio Data Compression Software Systems 265 12.3.1 Shorten 265 12.3.2 Meridian Lossless Packing (MLP) 266 12.3.3 Sonic Foundry Perfect Clarity Audio (PCA) 266
12.4 Conclusion 267 12.5 References 267
CONTENTS xiii
Chapter 13: Algorithms for Delta Compression and Remote File Synchronization Torsten Suel and Nasir Memon
13.1
13.2
13.3
13.4 13.5
Introduction 13.1.1 13.1.2
Problem Definition Content of This Chapter
Delta Compression 13.2.1 13.2.2 13.2.3 13.2.4 13.2.5 13.2.6
Applications Fundamentals LZ77-Based Delta Compressors Some Experimental Results Space-Constrained Delta Compression Choosing Reference Files
Remote File Synchronization 13.3.1 13.3.2 13.3.3 13.3.4 13.3.5 13.3.6 13.3.7
Applications The rsync Algorithm Some Experimental Results for rsync Theoretical Results Results for Particular Distance Measures Estimating File Distances Reconciling Database Records and File Systems
Conclusions and Open Problems References
269
269 270 271 271 271 273 274 275 276 278 279 279 280 282 283 285 286 286 287 287
Chapter 14: Compression of Unicode Files Peter Fenwick
14.1 14.2
14.3
14.4
14.5 14.6 14.7 14.8
Introduction Unicode Character Codings 14.2.1 Big-endian versus Little-endian 14.2.2 UTF-8 Coding Compression of Unicode 14.3.1 Finite-Context Statistical Compressors 14.3.2 Unbounded-Context Statistical Compressors 14.3.3 LZ-77 Compressors Test Compressors 14.4.1 The Unicode File Test Suite Comparisons UTF-8 Compression Conclusions References
291
291 291 292 292 293 293 293 294 294 294 295 296 296 297
Part IV: Standards
Chapter 15: JPEG-LS Lossless and Near Lossless Image Compression Michael W. Hoffman
15.1 Lossless Image Compression and JPEG-LS 15.2 JPEG-LS
301
301 301
xiv CONTENTS
15.2.1 Overview of JPEG-LS 301 15.2.2 JPEG-LS Encoding 302 15.2.3 JPEG-LS Decoding 309
15.3 Summary 309 15.4 References 310
Chapter 1 6: The CCSDS Lossless Data Compression Recommendation for Space Applications 311
Pen-Shu Yeh
311 312 313 313 314 316 317 317 318 318 319 320 321 321 324 324 326 326
Chapter 17: Lossless Bilevel Image Compression 327 Michael W. Hoffman
327 327 327 330 336 338 339 339 341 346 348 349
Chapter 18: JPEG2000: Highly Scalable Image Compression 351 AH Bilgin and Michael W. Marcellin
18.1 Introduction 351 18.2 JPEG2000 Features 352
18.2.1 Compressed Domain Image Processing/Editing 353
16.1 16.2 16.3
16.4
16.5 16.6 16.7 16.8 16.9 16.10
Introduction The e-Rice Algorithm The Adaptive Entropy Coder 16.3.1 Fundamental Sequence Encoding 16.3.2 The Split-Sample Option 16.3.3 Low-Entropy Options 16.3.4 No Compression 16.3.5 Code Selection Preprocessor 16.4.1 Predictor 16.4.2 Reference Sample 16.4.3 Prediction Error Mapper Coded Data Format Decoding Testing Implementation Issues and Applications Additional Information References
17.1 17.2
17.3
17.4 17.5
Bilevel Image Compression JBIG 17.2.1 17.2.2 17.2.3 17.2.4 JBIG2 17.3.1 17.3.2 17.3.3
Overview of JBIG Encoding/Decoding JBIG Encoding Data Structure and Formatting JBIG Decoding
Overview of JBIG2 JBIG2 Decoding Procedures Decoding Control and Data Structures
Summary References
18.3
18.4 18.5
18.2.2 Progression The JPEG2000 Algorithm 18.3.1 18.3.2 18.3.3 18.3.4 18.3.5 18.3.6
Tiles and Component Transforms The Wavelet Transform Quantization Bit-Plane Coding Packets and Layers JPEG2000 Codestream
Performance References
CONTENTS xv
353 354 354 355 358 360 364 365 366 369
Chapter 19: PNG Lossless Image Compression 371 Greg Roelofs
371 372 374 376 376 378 383 385 388 390 390
Chapter 20: Facsimile Compression 391 Khalid Sayood
20.1 A Brief History 391 20.2 The Compression Algorithms 393
20.2.1 Modified Huffman 393 20.2.2 Modified READ 393 20.2.3 Context-Based Arithmetic Coding 397 20.2.4 Run-Length Color Encoding 398
20.3 The Standards 398 20.3.1 ITU-T Group 3 (T.4) 398 20.3.2 Group 4 (T.6) 399 20.3.3 JBIG and JBIG2(T.82 and T.88) 399 20.3.4 MRC—T.44 399 20.3.5 Other Standards 402
20.4 Further Reading 402 20.5 References 402
Part V: Hardware
Chapter 21: Hardware Implementation of Data Compression 405 Sanjukta Bhanja and N. Ranganathan
21.1 Introduction 405 21.2 Text Compression Hardware 407
19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 19.9 19.10 19.11
Historical Background Design Decisions Compression Engine zlib Format zlib Library Filters Practical Compression Tips Compression Tests and Comparisons MNG Further Reading References
xvi CONTENTS
21.2.1 Tree-Based Encoder Example 408 21.2.2 Lempel-Ziv Encoder Example 412
21.3 Image Compression Hardware 415 21.3.1 DCT Hardware 416 21.3.2 Wavelet Architectures 416 21.3.3 JPEG Hardware 417
21.4 Video Compression Hardware 417 21.4.1 Some Detailed Examples 420 21.4.2 Commercial Video and Audio Products 426
21.5 References 442
Index 447