compression algorithm

Compression Algorithm

Data dan Teknologi MultimediaSesi 08

Nofriyadi Nurdam

Introduction

Course Outlines

Data compression involves encoding information using fewer bits than the original representation would use.

Compression is useful to reduce the consumption of hard disk space or transmission bandwidth.

On the downside, compressed data must be decompressed to be used, and it may be detrimental to some applications.

Introduction

For instance, a compression of video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed

The option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video

Introduction

The data compression schemes involves the degree of compression, the amount of distortion, and the computational resources required to compress and uncompress the data

Compression was one of the main drivers for the growth of information during the past two decades.

There are two compression concept, lossy and lossless compression

Introduction

Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data.

The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange for better compression rates.

Lossless Data Compression

Lossless data compression is used in many applications, such as ZIP and gzip

It is also used as a component within lossy data compression technologies


Lossless compression is used if it is important that the original and the decompressed data be identical

Some image file formats, like PNG or GIF, use only lossless compression, while others like TIFF and MNG may use either lossless or lossy methods

Lossless audio formats are most often used for archiving or production purposes


Most lossless compression programs do two things in sequence, first generate a statistical model for the input data, and second use this model to map input data to bit sequences in such a way that frequently encountered data will produce shorter output than "improbable" data.

The algorithms used to produce bit sequences are Huffman coding and arithmetic coding

Lossless Compression Techniques

Arithmetic coding achieves compression rates close to the best possible for a particular statistical mode

Huffman compression is simpler and faster but produces poor results for models that deal with symbol probabilities close to 1


There are two primary ways of constructing statistical models: static model and adaptive model

In a static model, the data is analyzed and a model is constructed, then this model is stored with the compressed data


This approach is simple and modular, but has the disadvantage that the model itself can be expensive to store, and also that it forces a single model to be used for all data being compressed, and so performs poorly on files containing heterogeneous data


Adaptive models dynamically update the model as the data is compressed. Both the encoder and decoder begin with a trivial model, yielding poor compression of initial data, but as they learn more about the data, performance improves.

Most popular types of compression used in practice now use adaptive coders.


Lossy compression is a data encoding method that compresses data by discarding (losing) some of it

The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer.

Typically, a substantial amount of data can be discarded before the result is sufficiently degraded to be noticed by the user.

Lossy Compression

Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming media and internet telephony

By contrast, lossless compression is required for text and data files, such as bank records and text articles

Lossy Compression

Lossy Compression

Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run

This is most useful on data that contains many such runs, example simple graphic images such as icons, line drawings, and animations

It is not useful with files that don't have many runs as it could greatly increase the file size.

Run Length Encoding

For example, a screen with black text on a solid white background. There are black pixel for text and white pixel

B for black pixel and W for whiteWWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

The RLE converts to 12W1B12W3B24W1B14W The run-length code represents the original

67 characters in only 18.

Run Length Encoding

Run-length encoding is lossless data compression and is well suited to palette-based iconic images

It does not work well at all on continuous-tone images such as photographs

Run Length Encoding

Original text: ADA ATE APPLE There are 7 symbols, A, D, E, L, P, T and space with

frequency: 4 As, 2 Ps, 2 Es, 2 spaces, 1 D, 1 T and 1 L The symbols are presented by 3 bits:

A:000D: 001E:010L:011P:100T: 101Space: 110

Encoded text needs 39 bits (compared to original text 104 bits)

Fix Length Encoding

Original text: ADA ATE APPLE There are 7 symbols, A, D, E, L, P, T and space

with frequency: 4 As, 2 Ps, 2 Es, 2 spaces, 1 D, 1 T and 1 L

The symbols are presented depending on frequency

A : 0P:10E:110Space: 1110D: 11110T: 111110L: 111111

Variable Length Encoding

The Preffix Property Encoded text needs 4+4+6+8+5+6+6 bits

(39 bits) In general variable length encoding is better

the fix length encoding Deencoding is done with tree structure

Variable Length Encoding

Variable length coding Tree structure is built bottom up Level paling bawah terdiri dari simbol

dengan kemunculan paling sedikit

Huffman Coding

compression algorithm

Documents

compression of video

compression concept

degree of compression

decompressed data

input data

improbable data

encountered data

compression rates close