chapter 12 multimedia information
DESCRIPTION
Chapter 12 Multimedia Information. Lossless Data Compression Compression of Analog Signals Image and Video Coding. Bits, numbers, information. Bit: number with value 0 or 1 n bits: digital representation for 0, 1, … , 2 n Byte or Octet, n = 8 Computer word, n = 16, 32, or 64 - PowerPoint PPT PresentationTRANSCRIPT
Chapter 12Multimedia
Information
Lossless Data Compression
Compression of Analog Signals
Image and Video Coding
Bits, numbers, information
Bit: number with value 0 or 1 n bits: digital representation for 0, 1, … , 2n
Byte or Octet, n = 8 Computer word, n = 16, 32, or 64
n bits allows enumeration of 2n possibilities n-bit field in a header n-bit representation of a voice sample Message consisting of n bits
The number of bits required to represent a message is a measure of its information content More bits → More content
Block vs. Stream Information
Block Information that occurs
in a single block Text message Data file JPEG image MPEG file
Size = Bits / block
or bytes/block 1 kbyte = 210 bytes 1 Mbyte = 220 bytes 1 Gbyte = 220 bytes
Stream Information that is
produced & transmitted continuously Real-time voice Streaming video
Bit rate = bits / second 1 kbps = 103 bps 1 Mbps = 106 bps 1 Gbps =109 bps
Transmission Delay L number of bits in message R bps speed of digital transmission system L/R time to transmit the information tprop time for signal to propagate across medium d distance in meters c speed of light (3x108 m/s in vacuum)
Use data compression to reduce LUse higher speed modem to increase R
Place server closer to reduce d
seconds /// RLcdRLtDelay prop
Compression
Information usually not represented efficiently Data compression algorithms
Represent the information using fewer bits Noiseless: original information recovered exactly
E.g. zip, compress, GIF, fax Noisy: recover information approximately
JPEG Tradeoff: # bits vs. quality
Compression Ratio#bits (original file) / #bits (compressed file)
H
W
= + +H
W
H
W
H
W
Color image
Red component
image
Green component
image
Blue component
image
Total bits = 3 H W pixels B bits/pixel = 3HWB
Example: 810 inch picture at 400 400 pixels per in2
400 400 8 10 = 12.8 million pixels8 bits/pixel/color
12.8 megapixels 3 bytes/pixel = 38.4 megabytes
Color Image
Examples of Block Information
Type Method Format Original Compressed
(Ratio)
Text Zip, compress
ASCII Kbytes- Mbytes
(2-6)
Fax CCITT Group 3
A4 page 200x100 pixels/in2
256 kbytes
5-54 kbytes (5-50)
Color Image
JPEG 8x10 in2 photo
4002 pixels/in2
38.4 Mbytes
1-8 Mbytes (5-30)
Chapter 12Multimedia
Information
Lossless Data Compression
Lossless data compression
Data expansion
ASDF9H... 11010101... ASDF9H...
Data Compression
Information is produced by a source Usually contains redundancy
Lossless Data Compression system exploits redundancy to produce a more efficient (usually binary) representation of the information
Compressed stream is stored or transmitted depending on application
Data Expansion system recovers exact original information stream
Binary Tree Codes
Suppose information source generates symbols fromA = {a1, a2, … , aK}
Binary tree code K leafs 1 leaf assigned to each symbol Binary codeword for symbol aj is
sequence of bits from root to corresponding leaf
Encoding use table Decoding: trace path from root
to leaf, output corresponding symbol; repeat
a1
a2
a3 a4
0 1
0 1
0 1
a1 00a2 1a3 010a4 011
Encoding Table
Performance of Tree Code
Average number of encoded bits per source symbol Let l(aj) = length of codeword for aj
][)(][1
j
K
jj aPallE
To minimize above expression, assign short codeword to frequent symbols and longer codewords to less frequent symbols
Assume• 5 symbol information source: {a,b,c,d,e}
• symbol probabilities: {1/4, 1/4,1/4,1/8,1/8}
0 1
00
0
1
1
1
00a
a 00b 01c 10d 110e 111
Symbol Codeword
01b
10c
110 111d e
aedbbad.... mapped into 00 111 110 01 01 00 110 ...Note: decoding done without commas or spaces
17 bits
Example
Finding Good Tree Codes What is the best code if
K=2? Simple! There is only one tree
code: assign 0 or 1 to each of the symbols
What about K=3? Assign the longest pair of
codeword to the two least frequent symbols
If you don’t, then switching most frequent symbol to shortest codeword will reduce average length
Picking the two least probable symbols is always best thing to do
a1 a2
0 1
a1 a2
0 1
0 1
a3
?
Huffman Code Algorithm for finding optimum binary tree code for a set of
symbols A={1,2,…,K}, denote symbols by index Symbol probabilities: {p1, p2, p3, … , pK}
Basic step: Identify two least probable symbols, say i and j Combine them into new symbol (i,j) with probability pi + pj
Remove i and j from A and replace them with (i,j) New alphabet A has 1 fewer symbol If A has two symbols, stop
Else repeat Basic Step Building the tree code
Each time two symbols are combined join them in the binary tree
1.00
a b c d e
.50 .20 .15 .10 .05
.15
.30
.50
a 0b 10c 110d 1110e 1111
a
b
c
d e
0 1
0
0
0
1
1
1
The final tree code
Building the tree code by Huffman algorithm
E[l]=1(.5)+2(.20)+3(.15)+4(.1+.05)=1.95
What is the best performance?
Can we do better? Huffman is optimum, so we cannot do better for A
If we take pairs of symbols, we have a different alphabet A’={aa, ab, ac, …, ba, bb, …, ea, eb, …ee} {(.5)(.5), (.5)(.2), ….., (.05)(.05)}
By taking pairs, triplets, and so on, we can usually improve performance
So what is the best possible performance? Entropy of the source
Entropy of an Information Source
Suppose a source: produces symbols from alphabet A={1,2,…,K} with probabilities {p1, p2, p3, … , pK} Source outputs are statistically independent of
each other Then the entropy H of the source is the best
possible performance
lbits/symbo log1
2
K
jjj ppH
Examples Example 1: source with {.5, .2, .15, .10, .05}
923.12ln/)05.ln05.1.ln1.15.ln15.2.ln2.5.ln5.( H Huffman code gave E[l]=1.95, so its pretty close to H
Example 2: source with K equiprobable symbols
K
j
KKK
H1
22 log1
log1
(
Example 3: source with K=2m equiprobable symbols
m
j
mmm
mH2
122 lbits/symbo 2log
2
1log
2
1(
Fixed-length code with m bits is optimum!
“Blank” in strings of alphanumeric information ------$5----3-------------$2--------$3------
“0” (white) and “1” (black) in fax documents
Run-Length Codes
When one symbol is much more frequent than the rest, block codes don’t work well Runlength codes work better
Parse the symbol stream into runs of the frequent symbol Apply Huffman or similar code to encode the lengths of the runs
Binary Runlength Code 1
Run Length Codeword Codeword (m = 4)
1 0 00..00 000001 1 00..01 0001001 2 00..10 00100001 3 00..11 001100001 4 . .000001 5 . .0000001 6 . . . . . . . . . .000...01 2m – 2 11..10 1110000...00 run >2m – 2 11..11 1111
Inputs:
m
Use m-bit counter to count complete runs up to length 2m-2 If 2m-1 consecutive zeros, send m 1s to indicate length>2m-2
000…001000…01000…0100…001… 137 bits
Example: Code 1, m = 4
25 57 36 15
Symbols >14 10>14>14>14 12 >14>14 6 >14 0
1111 1010 1111 1111 1111 1100 1111 1111 0110 1111 0000 44 bits
Runs
15w 10wb 15w 15w 15w 12wb 15w 15w 6wb 15w b
Example: Code 1
Code 1 performance: m / E[R] encoded bits/source bits
Run Length Codeword Codeword (m = 4)
1 0 10..00 1000001 1 10..01 10001001 2 10..10 100100001 3 10..11 1001100001 4 . .000001 5 . .0000001 6 . . . . . . . . . .000... 01 2m – 1 11..11 11111000... 00 run >2m – 1 0 0
Inputs:
m + 1
Binary Runlength Code 2
When all-zero runs are frequent, encode event with 1 bit to get higher compression
Example: Code 2, m = 4
000…001000…01000…0100…001… 137 bits25 57 36 15
Symbols >15 9 >15>15>15 9 >15>15 4 15
0 11001 0 0 0 11001 0 0 10100 11111 26 bits
Runs
16w 9wb 16w 16w 16w 9wb 16w 16w 4wb 15wb
EncodedStream
DecodedStream
Example: Code 2
Code 2 performance: E[ l ] / E[R] encoded bits/source bits
(a) Huffman code applied to white runs and black runs
(b) Encode differences between consecutive lines
Predictive Coding
Fax Documents use Runlength Encoding
CCITT Group 3 facsimile standard Default: 1-D Huffman coding of runlengths Option: 2-D (predictive) run-length coding
CCITT G-II G-III G-IV
Business LetterCircuit DiagramInvoiceDense textTechnical paperGraphDense Japanese textHandwriting and simple graphics
256K256K256K256K256K256K256K256K
17K15K31K54K32K23K53.5K26K
10K5.4K14K35K16K8.3K34.6K10K
Average 256K 31.4K 16.6K
Compression ratio 1 8.2 15.4
Maximum compression ratio 17.1 47.4
Note: Documents scanned at 200 100 pixels/square inch. G-IV intended for
documents scanned at 400 100 pixels/square inch and ISDN transmission at 64 kbps.
Adaptive Coding Adaptive codes provide compression when symbol
and pattern probabilities unknown Essentially, encoder learns/discovers frequent
patterns Lempel-Ziv algorithm powerful & popular
Incorporated in many utilities Whenever a pattern is repeated in the symbol stream, it is
replaced by a pointer to where it first occurred & a value to indicate the length of the pattern
All tall We all are tall. All small We all are small
All_ta[2,3]We_[6,4]are[4,5]._[1,4]sm[6,15][31,5].
ll all_ _tall All_ small
all_We_all_are_
Chapter 12Multimedia
Information
Compression of Analog Signals
Th e s p ee ch s i g n al l e v el v a r ie s w i th t i m(e)
Stream Information
A real-time voice signal must be digitized & transmitted as it is produced
Analog signal level varies continuously in time
Samplert
x(t)
t
x(nT)
Interpolationfilter
t
x(t)
t
x(nT)
(a)
(b)
Nyquist: Perfect reconstruction if sampling rate 1/T > 2Ws
Sampling Theorem
Quantization of Analog Samples
input x(nT)
output y(nT)
0.51.5
2.5
3.5
-0.5
-1.5-2.5
-3.5
Quantization error:“noise” = x(nT) – y(nT)
Quantizer maps inputinto closest of 2m
representation values
Original signalSample value
Approximation
3 b
its /
sam
ple
Bit Rate of Digitized Signal
Bandwidth Ws Hertz: how fast the signal changes Higher bandwidth → more frequent samples Minimum sampling rate = 2 x Ws
Bit Rate = 2 Ws samples/second x m bits/sample
Representation accuracy: range of approximation error Higher accuracy
→ smaller spacing between approximation values
→ more bits per sample
SNR = 6m – 7 dB
Example: Voice & Audio
Telephone voice Ws = 4 kHz → 8000
samples/sec 8 bits/sample Rs=8x8000 = 64 kbps
Cellular phones use more powerful compression algorithms: 8-12 kbps
CD Audio Ws = 22 kHz → 44000
samples/sec 16 bits/sample Rs=16x44000= 704 kbps
per audio channel MP3 uses more powerful
compression algorithms: 50 kbps per audio channel
Differential Coding
Successive samples tend to be correlated
Use prediction to get better quality for m bits
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
k
x(kT
), d(
kT)
Smooth signal
Successive differences
Differential PCMQuantize the difference between prediction and actual signal:
The end-to-end error is only the error introduced by the quantizer!
y(n)- x(n) = ˆ x (n) + ˜ d (n) - x(n) = ˜ d (n) - d(n) = e(n)
LINEARPREDICTOR
h
Decoder
LINEARPREDICTOR
h+
+
+
QUANTIZER
+
+
Encoder+
y(n)
+
+
-
d~
(n) to channelx(n)
x(n)
d~
(n ) y(n)
x(n)
d (n )
x(n)
.log10 10 pPCMDPCM GSNRSNR
Voice Codec Standards
A variety of voice codecs have been standardized for different target bit rates and implementation complexities. These include:
G.711 64 kbps using PCMG.723.1 5-6 kbps using CELPG.726 16-40 kbps using ADPCMG.728 16 kbps using low delay CELPG.729 8 kbps using CELP
X(f)
f 0 W-W
Q(f)Noise spectrum
Signal spectrum
Transform Coding
Quantization noise in PCM is “white” (flat spectrum) At high frequencies, noise power can be higher than signal
power If coding can produce noise that is “shaped” so that signal
power is always higher than noise power, then masking effects in ear results in better subjective quality
Transform coding maps original signal into a different domain prior to encoding
X(f)
f
0 W-W
Q(f)
Subband Coding
Subband coding is a form of transform coding Original signal is decomposed into multiple signals
occupying different frequency bands Each band is PCM or DPCM encoded separately Each band allocated bits so that signal power
always higher than noise power in that band
MP3 Audio Coding
MP3 is coding for digital audio in MPEG Uses subband coding
Sampling rate: 16 to 48 kHz @ 16 bits/sample Audio signal decomposed into 32 subbands Fast Fourier transform used for decomposition Bits allocated according to signal power in subbands
Adjustable compression ratio Trade off bitrate vs quality 32 kbps to 384 kbps per audio signal
Chapter 12Multimedia
Information
Image and Video Coding
Image Coding
Two-dimensional signal Variation in Intensity in 2 dimensions RGB Color representation Raw representation requires very large number of
bits Linear prediction & transform techniques
applicable Joint Picture Experts Group (JPEG) standard
Transform Coding
Time signal on left side is smooth, that is, it changes slowly with time
If we take its discrete cosine transform (DCT) we find that the non-negligible frequency components are clustered near zero frequency; other components are negligible.
(frequency)
X(f)
(time)
x(t) 1-D DCT(a)
80 10 3 2 08 2 1 0 ..2 0 0 ..0 0 ...0 ...
100 95 85 ..102 99 70..101 80 70..95 77 65.. (u,v)
Frequency(n,m)Space
2-D DCT(b)
Take a block of samples from a smooth image If we take two-dimensional DCT, non-negligible
values will cluster near low spatial frequencies (upper left-hand corner)
Image Transform Coding
Sample Image in 8x8 blocks
In image and video coding, the picture array is divided into 8x8 pixel blocks which are coded separately.
Quantized DCT coefficients are scanned in zigzag fashion
Resulting sequence is run-length and variable-length (Huffman) coded
180 150 115 100 100 100 100 100250 180 128 100 100 100 100 100190 170 120 100 100 100 100 100160 130 110 100 100 100 100 100110 100 100 100 100 100 100 100100 100 100 100 100 100 100 100100 100 100 100 100 100 100 100100 100 100 100 100 100 100 100
DCT
8x8 block of 8-bit pixel values
111 22 15 5 1 0 0 014 17 10 4 1 0 0 0 2 2 1 0 0 0 0 0-4 -4 -2 -1 0 0 0 0-3 -3 -1 0 0 0 0 0-1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Quantized DCT Coefficients
DCT Coding
JPEG Image Coding Standard JPEG defines:
Several coding modes for different applications Quantization matrices for DCT coefficients Huffman VLC coding tables
Baseline DCT/VLC coding gives 5:1 to 30:1 compression
VLC codingDCT Quantization
8x8block
QuantizationMatrices
Huffman TabDC: DPCM VLIAC: O-run/VLI
SymmetricDCT/I-DCT
transfer
Low (23.5 kb) High (64.8 kb)
Look for jaggedness along boundaries
Video Signal
Sequence of picture frames Each picture digitized &
compressed Frame repetition rate
10-30-60 frames/second depending on quality
Frame resolution Small frames for
videoconferencing Standard frames for
conventional broadcast TV HDTV frames
30 fps
Rate = M bits/pixel x (WxH) pixels/frame x F frames/second
. . .
. . .
. . .
. . .
. . .
. . .
m x n pixelcolor
picture
m x n blockred component
m x n blockgreen component
m x n blockcomponent signal
A scanned color picture produces 3 color component signals
BGRQ
BGRI
BGRY
xxxx
xxxx
xxxx
31.052.021.0
32.028.060.0
11.059.030.0
Luminance signal (black & white)
Chrominance signals
Color Representation
RGB (Red, Green, Blue) Each RGB component has the same bandwidth and Dynamic
Range
YUV Commonly used to mean YCbCr, where Y represents the
intensity and Cr and Cb represent chrominance information Derived From "Color Difference" Video Signals: Y, R–Y, B–Y
Y = 0.299R + 0.587G + 0.114B
Sampling Ratio of Y:Cr:Cb Y is typically sampled more finely than Cr & Cb 4:4:4, 4:2:2, 4:2:0, 4:1:1
(b) Broadcast TV at 30 frames/sec =
10.4 x 106 pixels/sec
720
480
(c) HDTV at 30 frames/sec =
67 x 106 pixels/sec1080
1920
(a) QCIF videoconferencing at 30 frames/sec =
760,000 pixels/sec
144
176
Typical Video formats
CIF: Common Interchange Format 352x288 pixels, 30 frames/second, sampling rate 4:2:0
SIF: Simple Input Format 360x242 pixels, 30 frames/second, sampling rate 4:2:0 360x288 pixels, 25 frames/second, sampling rate 4:2:0
CCIR-601 (ITU-601) 720x525 pixels, 30 frames/second, sampling rate 4:4:4
& 4:2:2 720x625 pixels, 25 frames/second, sampling rate 4:4:4
& 4:2:2
Video Compression Techniques
Intraframe coding: compression of single image, e.g. JPEG
Interframe coding: compression of difference between current image block & reference block in another frame Requires motion compensation Prediction: reference frame is in past Interpolation: reference frames are in past &
future
- Motion Vector
- Error Block
- Intra Block
Intra- /inter-frame
processorEncoder
Frame buffer
Fn
Fn-1
Fn
Fn-1(+1,+2)
(x,y)Hybrid Encoder:
MotionVector
Motion Compensation
Find block from previous frame that best matches current block; transmit displacement vector
Encode difference between current & previous block
CRC errorand
Fixed-lengthcontrol
p x 64
I-DCT
MotionEstimation
Q-1
Q
Intra
DCTHuffman
VLC
FrameMemory
- Inter
Filter
Motion Vector
8x8block
H.261 Encoder
Intended for videoconferencing applications Bit rates = p x 64 kbps, p = 2, 6, 24 common
Video Codecs: H.263
Frame-based coding
Low Bit rate Coding: < 64 Kbps (typical)
H.261 coding with improvements I/P/B frames Additional Image formats: 4CIF, 16CIF
Suitable for desktop video conferencing over low-speed links
MPEG Coding Standard
Motion Picture Expert Group (MPEG) Video and audio compression & multiplexing Video display controls
Fast forward, reverse, random access Elements of encoding
Intra- and inter-frame coding using DCT Bidirectional motion compensation Group of Picture structure Scalability options
MPEG only standardizes the decoder
MPEG Video Block Diagram
DCT: Discrete Cosine TransformFS: Frame StoreMC: Motion Compensation
VB: Variable BufferVLC: Variable length codingVLD: Variable length decoding
Fn-1
Linear prediction Interpolation
1-D examples:
Fn
Fn+1
Fn
Bidirectional MC
Fn-1
Fn+1
- Intra- Forward- Backward- Bidirectional
Quantize individualsamples
xx
MPEG Motion Compensation
Bidirectional Motion Compensation
B11
Pred 12
B1B2
B4B5
B7B8
B10
Intra 0
Pred 3
Pred 6
Intra 9
- Intra- Forward- Reverse- Bidirectional
16 x 16 bidirectionalmacroblocks
Group of Picture Structure
I-frames: for random access intraframe coded; lowest compression
P-frames: predictive encoded most recent I- or P- frame, medium compression
B-frames: interpolation most recent & subsequent I- or P-frame, highest compression
MPEG2 Scalability Modes
Scalability modes Data Partitioning
Separate headers and payloads apart SNR (Signal-to-Noise Ratio)
Different levels of quality Temporal
Different frame rates Spatial
Different resolutions
Limited scalability capabilities Three layers only
Decoder
Decoder
Base stream
Enhancement stream Largeimage
Smallimage
Decoder
Decoder
Base stream
Enhancement stream Highquality
Lowquality
SNR scalability
Spatial Scalability
MPEG Scalability
MPEG Versions MPEG-1
For video storage in CD-ROM & transmission over T-1 lines (1.5 Mbps)
MPEG-2 Many options: 352x240 pixel; 720x480 pixel; 1440x1152
pixel; 1920x1080 pixel Many profiles (set of coding tools & parameters) Main Profile
I, P & B frames; 720x480 conventional TV Very good quality @ 4-6 Mbps
MPEG-4 <64 kpbs to 4 Mbps Designed to enable viewing, access & manipulation of
objects, not only pixels For digital TV, streaming video, mobile multimedia & games
MPEG Systems and Multiplex
Provides packetization and multiplexing for audio/video elementary streams
Provides timing and error control information MPEG1 systems:
System Streams, long variable size packets, suitable for error-free environments
MPEG2 systems: Transport Streams, short fixed size packets, suitable for
error-prone environments Program Streams, long variable size packets, suitable for
relatively error-free environments
Audioencoder
Videoencoder
Packetizer
Packetizer
Packet stream
multiplexer
Transport stream
multiplexer
Programstream
Transportstream
PESstreams
MPEG systems layer
MPEG-2 Multiplexing
Packetized Elementary Streams (PES) Packet length, presentation & decoding timestamps; bit rate For lip-synch, clock recovery
(for error-free environment)
(for error-prone environment)
Digital Video Summary
Type Method Format Original Compressed
Video Confer-ence
H.261 176x144 or 352x288 pix
@10-30 fr/sec
2-36 Mbps
64-1544 kbps
Full Motion
MPEG2
720x480 pix @30 fr/sec
249 Mbps
2-6 Mbps
HDTV MPEG2
1920x1080 @30 fr/sec
1.6 Gbps
19-38 Mbps