data compression notes1
TRANSCRIPT
-
8/2/2019 Data Compression Notes1
1/32
Eric Dubois
-
8/2/2019 Data Compression Notes1
2/32
InformationSource
Encodersignal binary stream
Channel
DecoderInformation
Receiver binary streamsignal
-
8/2/2019 Data Compression Notes1
3/32
InformationSource
Encodersignal binary stream
Channel
DecoderInformation
Receiver binary streamsignal
aka data
-
8/2/2019 Data Compression Notes1
4/32
InformationSource
Encodersignal binary stream
Channel
DecoderInformation
Receiver binary streamsignal
aka dataerrormeasure
-
8/2/2019 Data Compression Notes1
5/32
Speech Image
Video
Text file Music
Radiograph
Binary executable computer program
Computer graphics primitives
Weather radar map
-
8/2/2019 Data Compression Notes1
6/32
Airwaves (EM radiation) Cable
Telephone line
Hard disk CD, DVD
Flash memory device
Optical path
Internet
-
8/2/2019 Data Compression Notes1
7/32
TV screen and viewer Audio system and listener
Computer file
Image printer and viewer Compute engine
-
8/2/2019 Data Compression Notes1
8/32
No errors permitted (lossless coding) Numerical measures of error, e.g. mean-
squared error (MSE), signal-to-noise ratio(SNR)
Numerical measures of perceptual difference
Mean opinion scores from human users
-
8/2/2019 Data Compression Notes1
9/32
Data rate (bits per second) Transmission time (seconds)
File size (bytes)
Average number of bits per source symbol
-
8/2/2019 Data Compression Notes1
10/32
There is usually a natural representation forthe source data at a given level of fidelity andsampling rate. Examples: 8 bits per character in ASCII data
24 bits per RGB color pixel 16 bits audio signal sample
This natural representation leads to a certainraw channel rate (which is generally too high).
Compression involves reducing the channelrate for a given level of distortion (which maybe zero for lossless coding).
-
8/2/2019 Data Compression Notes1
11/32
ratechannelcompressedratechannelrawrationcompressio =
Example: HDTV, 1080IRaw channel rate: 1493 Mbit/s
(1920*1080*30*24)
Compressed channel rate: ~20 Mbit/s
Compression ratio: ~75
-
8/2/2019 Data Compression Notes1
12/32
Categories of sources continuous time or domain: x(t), x(h,v) discrete time or domain: x[n], x[m,n] continuous amplitude or value: xR discrete amplitude or value: x A = {a1, a2, aM}
We will only consider discrete domain sources. Weassume that continuous domain signals can be sampledwith negligible loss. This is not considered in thiscourse.
We will mainly concentrate on one-dimensional signalssuch as text, speech, audio, etc. Extensions to images
are covered in ELG5378. A source signal is a sequence of values drawn from a
source alphabetA:x[1], x[2], , x[n] A
-
8/2/2019 Data Compression Notes1
13/32
A source coder transforms a source sequence into acoded sequence whose values are drawn from a codealphabet G: u[1], u[2], , u[i] G
Normally G= {0,1}, and we will limit ourselves to thiscase.
Note that the time indexes for the source sequence x[n]and the coded sequence u[i] do not correspond.
The decoder must estimate the source signal on thebasis of the received coded sequence [i]. This may be
different from u[i] if there are transmission errors. Wewill generally assume that there are no transmissionerrors.
-
8/2/2019 Data Compression Notes1
14/32
Lossless coding: The source sequence has discretevalues, and these must be reproduced withouterror. Examples where this is required is text, data,executables, and some quantized signals such asX-rays.
Lossy coding: The source sequence may be eithercontinuous or discrete valued. There exists adistortion criterion. The decoded sequence may bemathematically different from the source sequence,but the distortion should be kept sufficiently small.
Examples are speech and images. Often aperceptual distortion criterion is desired. Lossless coding methods are often a component of
a lossy coding system.
-
8/2/2019 Data Compression Notes1
15/32
There are two variants of the compressionproblem
1. For a given source and distortion measure,minimize the channel rate for a given levelof distortion D0 (which can be zero).
2. For a given source and distortion measure,minimize the distortion (or maximize the
quality) for a given channel rate R0.
-
8/2/2019 Data Compression Notes1
16/32
R
D
In a coding system, there is typically atradeoff between rate and distortion
-
8/2/2019 Data Compression Notes1
17/32
R
D
In a coding system, there is typically atradeoff between rate and distortion
D0
-
8/2/2019 Data Compression Notes1
18/32
R
D
In a coding system, there is typically atradeoff between rate and distortion
R0
-
8/2/2019 Data Compression Notes1
19/32
1. When there is statistical redundancy. For example, for a sequence of outcomes of a fair
16-sided die, we need 4 bits to represent eachoutcome. No compression is possible.
In English text, some letters occur far more oftenthan others. We can assign shorter codes to thecommon ones and longer codes to the uncommonones and achieve compression (e.g., Morse code).
-
8/2/2019 Data Compression Notes1
20/32
There are many types of statisticalredundancy.
For example, in English text, we are prettysure that the next letter after a Q will be a U,so we can exploit it.
The key to successful compression will be toformulate models that capture the statistical
redundancy in the source.
-
8/2/2019 Data Compression Notes1
21/32
2. When there is irrelevancy. In many cases, the data is specified more
precisely than it needs to be for the intendedpurpose.
The data may be oversampled, or quantized morefinely than it needs to be, either everywhere, or insome parts of the signal.
This particularly applies to data meant only forconsumption and not further processing.
-
8/2/2019 Data Compression Notes1
22/32
-
8/2/2019 Data Compression Notes1
23/32
Change of representation Quantization (not for lossless coding)
Binary code assignment
All will depend on good models of the sourceand the receiver.
-
8/2/2019 Data Compression Notes1
24/32
-
8/2/2019 Data Compression Notes1
25/32
Eric Dubois
CBY A-512
Tel: 562-5800 X 6400 [email protected] www.eecs.uottawa.ca/~edubois/courses/ELG5126
http://www.eecs.uottawa.ca/~edubois/courses/ELG5126http://www.eecs.uottawa.ca/~edubois/courses/ELG5126 -
8/2/2019 Data Compression Notes1
26/32
Textbook: K. Sayood, Introduction to DataCompression, third edition, Morgan KaufmannPublishers, 2006.
http://www.sciencedirect.com/science/book/9780126208627http://www.sciencedirect.com/science/book/9780126208627http://www.sciencedirect.com/science/book/9780126208627http://www.sciencedirect.com/science/book/9780126208627 -
8/2/2019 Data Compression Notes1
27/32
Basic probability and signal processing astypically obtained in an undergraduateElectrical Engineering program
(e.g., at uOttawa, ELG3125 Signal and System Analysis,
ELG3126 Random Signals and Systems
-
8/2/2019 Data Compression Notes1
28/32
The objective of this course is to present thefundamental principles underlying data andwaveform compression.
The course begins with the study of lossless
compression of discrete sources. Thesetechniques are applicable to compression oftext, data, programs and any other type of information where no loss is tolerable. They alsoform an integral part of schemes for lossy
compression of waveforms such as audio andvideo signals, which is the topic of the secondpart of the course.
-
8/2/2019 Data Compression Notes1
29/32
The main goal of the course is to provide anunderstanding of the basic techniques and theoriesunderlying popular compression systems andstandards such as ZIP, FAX, MP3, JPEG, MPEG and
so on, as well as the principles underlying futuresystems.
Some of the applications will be addressed instudent projects.
-
8/2/2019 Data Compression Notes1
30/32
Lossless coding: Discrete sources, binary codes,entropy, Huffman and related codes, Markovmodels, adaptive coding.
Arithmetic coding: Principles, coding anddecoding techniques, implementation issues. Dictionary techniques: Principles, static
dictionary, adaptive dictionary.
Waveform coding: Distortion measures, rate-distortion theory and bounds, models.
-
8/2/2019 Data Compression Notes1
31/32
Quantization: Formulation, performance, uniformand non-uniform quantizers, quantizeroptimization, vector quantization.
Predictive coding: Prediction theory, differentialcoding (DPCM), adaptive coding.
Transform and subband coding: Change of basis,block transforms and filter banks, bit allocationand quantization.
Applications (student projects)
-
8/2/2019 Data Compression Notes1
32/32
20% Assignments: Several assignments, to behanded in during class on the due-date specified.There will be a 5% penalty for each day late, and noassignment will be accepted after one week.
30% Project: An individual project on an applicationof data compression involving some experimentalwork. A project report and presentation at the end ofthe course will be required. More details will followearly in the course.
20% Midterm exam: Closed-book exam, 80 minutesin length.
30% Final exam: Closed-book exam, 3 hours inlength, covering the whole course.