lossless - gbv

11
0ШШШ •- J&> ELSEVIER Lossless (ompression Handbook EDITOR KHALID SAYOOD ш w W 1 i v W •Щ: щ • . •.•. 1 9 ^ 1 RHF :: -ашштюшШГ 1 мяШ' ж ЁИИк щВшш Яш' ШШШШШ A§»V % ; ! ' > 1

Upload: others

Post on 03-Feb-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lossless - GBV

0ШШШ • -

J&> ELSEVIER Lossless (ompression

Handbook EDITOR KHALID SAYOOD

ш w W 1 i v W •Щ: • щ • . •.•. 1 9 ^ 1 R H F : :

-ашштюшШГ 1

мяШ' ж ЁИИк щВшш Яш' ШШШШШ

A§»V % ; ! ' >

1

Page 2: Lossless - GBV

Contents

List of Contributors xvii

Preface xix

Part I: Theory

Chapter 1: Information Theory behind Source Coding 3 Frans M. J. Willems and TjallingJ. Tjalkens 1.1 Introduction 3

1.1.1 Definition of Entropy 3 1.1.2 Properties of Entropy 4 1.1.3 Entropy as an Information Measure 5 1.1.4 Joint Entropy and Conditional Entropy 6 1.1.5 Properties of Joint Entropy and Conditional Entropy 7 1.1.6 Interpretation of Conditional Entropy 8

1.2 Sequences and Information Sources 9 1.2.1 Sequences 9 1.2.2 Information Sources 9 1.2.3 Memoryless Sources 10 1.2.4 Binary Sources 10 1.2.5 Discrete Stationary Sources 10 1.2.6 The Entropy Rate 11

1.3 Variable-Length Codes for Memoryless Sources 11 1.3.1 A Source Coding System, Variable-Length Codes

for Source Symbols 12 1.3.2 Unique DecodabiUty, Prefix Codes 13 1.3.3 Kraft's Inequality for Prefix Codes and Its Counterpart 14 1.3.4 Redundancy, Entropy, and Bounds 16 1.3.5 Variable-Length Codes for Blocks of Symbols 17

1.4 Variable-Length Codes for Sources with Memory 18 1.4.1 Block Codes Again 18 1.4.2 The Elias Algorithm 19 1.4.3 Representation of Sequences by Intervals 19 1.4.4 Competitive Optimality 25

vii

Page 3: Lossless - GBV

viii CONTENTS

1.5 Fixed-Length Codes for Memoryless Sources, the AEP 26 1.5.1 The Fixed-Length Source Coding Problem 27 1.5.2 Some Probabilities 27 1.5.3 An Example Demonstrating the Asymptotic Equipartition Property 28 1.5.4 The Idea behind Fixed-Length Source Coding 28 1.5.5 Rate and Error Probability 28 1.5.6 A Hamming Ball 30 1.5.7 An Optimal Balance between R and Pf 31 1.5.8 The Fixed-Length Coding Theorem 33 1.5.9 Converse and Conclusion 34

1.6 References 34

Chapter 2: Complexity Measures 35 Stephen R. Tate

2.1 Introduction 35 2.1.1 An Aside on Computability 36

2.2 Concerns with Shannon Information Theory 37 2.2.1 Strings versus Sources 37 2.2.2 Complex Non-random Sequences 37 2.2.3 Structured Random Strings 37

2.3 Kolmogorov Complexity 38 2.3.1 Basic Definitions 39 2.3.2 Incompressibility 40 2.3.3 Prefix-free Encoding 42

2.4 Computational Issues of Kolmogorov Complexity 44 2.4.1 Resource-Bounded Kolmogorov Complexity 45 2.4.2 Lower-Bounding Kolmogorov Complexity 46

2.5 Relation to Shannon Information Theory 47 2.5.1 Approach 1: An InfiniteSequence of Sources 48 2.5.2 Approach 2: Conditional Complexities 49 2.5.3 Discussion 50

2.6 Historical Notes 51 2.7 Further Reading 51 2.8 References 51

Part I I: Compression Techniques

Chapter 3: Universal Codes 55 Peter Fenwick

3.1 Compact Integer Representations 55 3.2 Characteristics of Universal Codes 55 3.3 Polynomial Representations 56 3.4 Unary Codes 57 3.5 Levenstein and Elias Gamma Codes 58 3.6 Elias Omega and Even-Rodeh Codes 59 3.7 Rice Codes 60 3.8 Golomb Codes 62 3.9 Start-Step-Stop Codes 64

Page 4: Lossless - GBV

CONTENTS ix

3.10 Fibonacci Codes 65 3.10.1 Zeckendorf Representation 66 3.10.2 Fraenkel and Klein Codes 66 3.10.3 Higher-Order Fibonacci Representations 66 3.10.4 Apostolico and Fraenkel Codes 67 3.10.5 A New Order-3 Fibonacci Code 69

3.11 Ternary Comma Codes 70 3.12 Summation Codes 71

3.12.1 Goldbach Gi Codes 72 3.12.2 Additive Codes 72

3.13 Wheeler 1/2 Code and Run-Lengths 73 3.13.1 The Wheeler 1/2 Code 74 3.13.2 Using the Wheeler 1/2 Code 74

3.14 Comparison of Representations 75 3.15 Final Remarks 77 3.16 References 78

Chapter 4: Huffman Coding 79 Steven Pigeon

4.1 Introduction 79 4.2 Huffman Codes 80

4.2.1 Shannon-Fano Coding 80 4.2.2 Building Huffman Codes 80 4.2.3 N-ary Huffman Codes 83 4.2.4 Canonical Huffman Coding 84 4.2.5 Performance of Huffman Codes 84

4.3 Variations on a Theme 86 4.3.1 Modified Huffman Codes 87 4.3.2 Huffman Prefixed Codes 87 4.3.3 Extended Huffman Codes 87 4.3.4 Length-Constrained Huffman Codes 89

4.4 Adaptive Huffman Coding 89 4.4.1 Brute Force Adaptive Huffman 89 4.4.2 The Faller, Gallager, and Knuth (FGK) Algorithm 91 4.4.3 Vitter's Algorithm: Algorithm Л 93 4.4.4 Other Adaptive Huffman Coding Algorithms 93 4.4.5 An Observation on Adaptive Algorithms 94

4.5 Efficient Implementations 94 4.5.1 Memory-Efficient Algorithms 95 4.5.2 Speed-Efficient Algorithms 95

4.6 Conclusion and Further Reading 97 4.7 References 97

Chapter 5: Arithmetic Coding 101 Amir Said

5.1 Introduction 101 5.2 Basic Principles 103

5.2.1 Notation 103

Page 5: Lossless - GBV

X CONTENTS

5.2.2 Code Values 104 5.2.3 Arithmetic Coding 106 5.2.4 Optimality of Arithmetic Coding 111 5.2.5 Arithmetic Coding Properties 112

5.3 Implementation 120 5.3.1 Coding with Fixed-Precision Arithmetic 121 5.3.2 Adaptive Coding 132 5.3.3 Complexity Analysis 142 5.3.4 Further Reading 147

5.4 References 150

Chapter 6: Dictionary-Based Data Compression: An Algorithmic Perspective 153 S. Cenk Sahinalp and Nasir M. Rajpoot

6.1 Introduction 153 6.2 Dictionary Construction: Static versus Dynamic 154

6.2.1 Static Dictionary Methods 154 6.2.2 Parsing Issues 155 6.2.3 Semidynamic and Dynamic Dictionary Methods 157

6.3 Extensions of Dictionary Methods for Compressing Biomolecular Sequences 162 6.3.1 The Biocompress Program 162 6.3.2 The GenCompress Program 162

6.4 Data Structures in Dictionary Compression 163 6.4.1 Tries and Compact Tries 163 6.4.2 Suffix Trees 163 6.4.3 Trie-Reverse Trie Pairs 164 6.4.4 Karp-Rabin Fingerprints 164

6.5 Benchmark Programs and Standards 165 6.5.1 The g z i p Program 165 6.5.2 The compress Program 165 6.5.3 The GIF Image Compression Standard 166 6.5.4 Modem Compression Standards: v . 4 2 b i s and v . 44 166

6.6 References 166

Chapter 7: Burrows-Wheeler Compression 169 Peter Fenwick

7.1 Introduction 169 7.2 The Burrows-Wheeler Algorithm 170 7.3 The Burrows-Wheeler Transform 170

7.3.1 The Burrows-Wheeler Forward Transformation 170 7.3.2 The Burrows-Wheeler Reverse Transformation 171 7.3.3 Illustration of the Transformations 171 7.3.4 Algorithms for the Reverse Transformation 172

7.4 Basic Implementations 173 7.4.1 The Burrows-Wheeler Transform or Permutation 173 7.4.2 Move-To-Front Recoding 174 7.4.3 Statistical Coding 176

7.5 Relation to Other Compression Algorithms 180 7.6 Improvements to Burrows-Wheeler Compression 180

Page 6: Lossless - GBV

CONTENTS xi

7.7 Preprocessing 181 7.8 The Permutation 181

7.8.1 Suffix Trees 183 7.9 Move-To-Front 183

7.9.1 Move-To-Front Variants 184 7.10 Statistical Compressor 185 7.11 Eliminating Move-To-Front 187 7.12 Using the Burrows-Wheeler Transform in File Synchronization 189 7.13 Final Comments 190 7.14 Recent Developments 190 7.15 References 191

Chapter 8: Symbol-Ranking and ACB Compression 195 Peter Fen wick

8.1 Introduction 195 8.2 Symbol-Ranking Compression 195

8.2.1 Shannon Coder 196 8.2.2 History of Symbol-Ranking Compressors 197 8.2.3 An Example of a Symbol-Ranking Compressor 197 8.2.4 A Fast Symbol-Ranking Compressor 200

8.3 Buynovsky's ACB Compressor 201 8.4 References 204

Part 111: Applications

Chapter 9: Lossless Image Compression 207 K. P. Subbalakshmi

9.1 Introduction 207 9.2 Preliminaries 208

9.2.1 Spatial Prediction 209 9.2.2 Hierarchical Prediction 211 9.2.3 Error Modeling 212 9.2.4 Scanning Techniques 212

9.3 Prediction for Lossless Image Compression 214 9.3.1 Switched Predictors 214 9.3.2 Combined Predictors 217

9.4 Hierarchical Lossless Image Coding 220 9.5 Conclusions 222 9.6 References 223

Chapter 10: Text Compression 227 Amar Mukherjee and Fauzia Awan

10.1 Introduction 227 10.2 Information Theory Background 228 10.3 Classification of Lossless Compression Algorithms 229

10.3.1 Statistical Methods 229

Page 7: Lossless - GBV

xii CONTENTS

10.3.2 Dictionary Methods 232 10.3.3 Transform-Based Methods: The Burrows-Wheeler Transform (BWT) 233 10.3.4 Comparison of Performance of Compression Algorithms 233

10.4 Transform-Based Methods: Star (*) Transform and Length-Index Preserving Transform 234 10.4.1 Star (*) Transformation 234 10.4.2 Length-Index Preserving Transform (LIPT) 235 10.4.3 Experimental Results 237 10.4.4 Timing Performance Measurements 240

10.5 Three New Transforms—ILPT, NTT, and LIT 241 10.6 Conclusions 243 10.7 References 243

Chapter 11: Compression of Telemetry 247 Sheila Horan

11.1 What is Telemetry? 247 11.2 Issues Involved in Compression of Telemetry 250

11.2.1 Why Use Compression on Telemetry 250 11.2.2 Structure of the Data 251 11.2.3 Size Requirements 251

11.3 Existing Telemetry Compression 252 11.4 Future of Telemetry Compression 253 11.5 References 253

Chapter 12: Lossless Compression of Audio Data 255 Robert С Mäher

12.1 Introduction 255 12.1.1 Background 255 12.1.2 Expectations 256 12.1.3 Terminology 257

12.2 Principles of Lossless Data Compression 257 12.2.1 Basic Redundancy Removal 257 12.2.2 Amplitude Range and Segmentation 259 12.2.3 Multiple-Channel Redundancy 260 12.2.4 Prediction 260 12.2.5 Entropy Coding 262 12.2.6 Practical System Design Issues 263 12.2.7 Numerical Implementation and Portability 263 12.2.8 Segmentation and Resynchronization 263 12.2.9 Variable Bit Rate: Peak versus Average Rate 264 12.2.10 Speed and Complexity 264

12.3 Examples of Lossless Audio Data Compression Software Systems 265 12.3.1 Shorten 265 12.3.2 Meridian Lossless Packing (MLP) 266 12.3.3 Sonic Foundry Perfect Clarity Audio (PCA) 266

12.4 Conclusion 267 12.5 References 267

Page 8: Lossless - GBV

CONTENTS xiii

Chapter 13: Algorithms for Delta Compression and Remote File Synchronization Torsten Suel and Nasir Memon

13.1

13.2

13.3

13.4 13.5

Introduction 13.1.1 13.1.2

Problem Definition Content of This Chapter

Delta Compression 13.2.1 13.2.2 13.2.3 13.2.4 13.2.5 13.2.6

Applications Fundamentals LZ77-Based Delta Compressors Some Experimental Results Space-Constrained Delta Compression Choosing Reference Files

Remote File Synchronization 13.3.1 13.3.2 13.3.3 13.3.4 13.3.5 13.3.6 13.3.7

Applications The rsync Algorithm Some Experimental Results for rsync Theoretical Results Results for Particular Distance Measures Estimating File Distances Reconciling Database Records and File Systems

Conclusions and Open Problems References

269

269 270 271 271 271 273 274 275 276 278 279 279 280 282 283 285 286 286 287 287

Chapter 14: Compression of Unicode Files Peter Fenwick

14.1 14.2

14.3

14.4

14.5 14.6 14.7 14.8

Introduction Unicode Character Codings 14.2.1 Big-endian versus Little-endian 14.2.2 UTF-8 Coding Compression of Unicode 14.3.1 Finite-Context Statistical Compressors 14.3.2 Unbounded-Context Statistical Compressors 14.3.3 LZ-77 Compressors Test Compressors 14.4.1 The Unicode File Test Suite Comparisons UTF-8 Compression Conclusions References

291

291 291 292 292 293 293 293 294 294 294 295 296 296 297

Part IV: Standards

Chapter 15: JPEG-LS Lossless and Near Lossless Image Compression Michael W. Hoffman

15.1 Lossless Image Compression and JPEG-LS 15.2 JPEG-LS

301

301 301

Page 9: Lossless - GBV

xiv CONTENTS

15.2.1 Overview of JPEG-LS 301 15.2.2 JPEG-LS Encoding 302 15.2.3 JPEG-LS Decoding 309

15.3 Summary 309 15.4 References 310

Chapter 1 6: The CCSDS Lossless Data Compression Recommendation for Space Applications 311

Pen-Shu Yeh

311 312 313 313 314 316 317 317 318 318 319 320 321 321 324 324 326 326

Chapter 17: Lossless Bilevel Image Compression 327 Michael W. Hoffman

327 327 327 330 336 338 339 339 341 346 348 349

Chapter 18: JPEG2000: Highly Scalable Image Compression 351 AH Bilgin and Michael W. Marcellin

18.1 Introduction 351 18.2 JPEG2000 Features 352

18.2.1 Compressed Domain Image Processing/Editing 353

16.1 16.2 16.3

16.4

16.5 16.6 16.7 16.8 16.9 16.10

Introduction The e-Rice Algorithm The Adaptive Entropy Coder 16.3.1 Fundamental Sequence Encoding 16.3.2 The Split-Sample Option 16.3.3 Low-Entropy Options 16.3.4 No Compression 16.3.5 Code Selection Preprocessor 16.4.1 Predictor 16.4.2 Reference Sample 16.4.3 Prediction Error Mapper Coded Data Format Decoding Testing Implementation Issues and Applications Additional Information References

17.1 17.2

17.3

17.4 17.5

Bilevel Image Compression JBIG 17.2.1 17.2.2 17.2.3 17.2.4 JBIG2 17.3.1 17.3.2 17.3.3

Overview of JBIG Encoding/Decoding JBIG Encoding Data Structure and Formatting JBIG Decoding

Overview of JBIG2 JBIG2 Decoding Procedures Decoding Control and Data Structures

Summary References

Page 10: Lossless - GBV

18.3

18.4 18.5

18.2.2 Progression The JPEG2000 Algorithm 18.3.1 18.3.2 18.3.3 18.3.4 18.3.5 18.3.6

Tiles and Component Transforms The Wavelet Transform Quantization Bit-Plane Coding Packets and Layers JPEG2000 Codestream

Performance References

CONTENTS xv

353 354 354 355 358 360 364 365 366 369

Chapter 19: PNG Lossless Image Compression 371 Greg Roelofs

371 372 374 376 376 378 383 385 388 390 390

Chapter 20: Facsimile Compression 391 Khalid Sayood

20.1 A Brief History 391 20.2 The Compression Algorithms 393

20.2.1 Modified Huffman 393 20.2.2 Modified READ 393 20.2.3 Context-Based Arithmetic Coding 397 20.2.4 Run-Length Color Encoding 398

20.3 The Standards 398 20.3.1 ITU-T Group 3 (T.4) 398 20.3.2 Group 4 (T.6) 399 20.3.3 JBIG and JBIG2(T.82 and T.88) 399 20.3.4 MRC—T.44 399 20.3.5 Other Standards 402

20.4 Further Reading 402 20.5 References 402

Part V: Hardware

Chapter 21: Hardware Implementation of Data Compression 405 Sanjukta Bhanja and N. Ranganathan

21.1 Introduction 405 21.2 Text Compression Hardware 407

19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 19.9 19.10 19.11

Historical Background Design Decisions Compression Engine zlib Format zlib Library Filters Practical Compression Tips Compression Tests and Comparisons MNG Further Reading References

Page 11: Lossless - GBV

xvi CONTENTS

21.2.1 Tree-Based Encoder Example 408 21.2.2 Lempel-Ziv Encoder Example 412

21.3 Image Compression Hardware 415 21.3.1 DCT Hardware 416 21.3.2 Wavelet Architectures 416 21.3.3 JPEG Hardware 417

21.4 Video Compression Hardware 417 21.4.1 Some Detailed Examples 420 21.4.2 Commercial Video and Audio Products 426

21.5 References 442

Index 447