Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1
Introduction
Much of the information is in form of images
Images are handled by machines as a matrix of digital picture elements, or pixels
The appearance of an image depends onimage type
resolution
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2
Types of images & Resolution
bilevel (black & white)e.g. faxes
grayscalecolor
dot per inches (dpi)600 x 600 – actual medium quality laser printer1200 x 1200 – low cost phototypesetter4800 x 4800 – high resolution phototypesetter
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3
Bilevel images: CCITT fax standard
fax: facsimileCCITT Comité Consultatif International Téléphonique et
Télégraphique, it is part of the ITU International
Telecommunication Union, one of the specialized agencies of the United NationsIn the late 70s CCITT starts thinking about a standard for fax transmission1980 CCITT Group 3 standard
group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4
CCITT Group 3 - I
It is the most common standard for fax transmissionIt is accepted worldwide, almost every fax machine supports this standardIt uses compression algorithms for bilevel images
5
CCITT Group 3 - II
Paper size: international A4 (not US letter)standard resolution 204x98 dpi (200x100)high resolution 204x196 dpi (200x200)
1728 bits/line
1188 lines/page
bilevel image 1 bit/pixelimage size: 1728x1188 bits at standard resolution about 2 MbitTransmission rate: 4.8 Kbit/s
today is usually higher, 14.4 – 33.6 Kbit/sAt 4.8Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 6
Run-length coding
Each scan line is composed by sequences of pixel of the same color
Count the number of element of each runExample 3w 4b 9w 2b 2w 6b 5w 2b 5w...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7
G3 1D
Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes runlengths using a predefined Huffman codeIn order to maintain black/white syncronization, each line begins with a white run, eventually of zero length
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8
G3 1D
1000 011 10100 11 0111 0010 ...
predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9
G3 1DAs one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengthsAs shorter runs occur more frequently that longer runs, we code each run length in an additive form
there is a terminating and makeup codewordLengths form 0 to 63 are coded with a single terminating codewordLonger runs are coded with one or more makeup codewords and a terminating codeword
Each line is terminated with a EOL symbol composed of eleven 0 and one 1
10
G3 2DGroup 3 Two-Dimensional coding (G3 2D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate)Many images have a high degree of vertical coherence between consecutive lines
changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11
G3 2D
Nearby means within an interval of radius 3 pixelsIf there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1D)On the opposite if the ref line has a run with no counterpart in the current line special pass code
12
G3 2D
reference line
current line
...vertical mode horizontal mode
pass code
vertical mode
<mode | length of preceding white run | length of black run>
+2 -2-1 0
from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3
0001generated code
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13
G3 2D
Two dimensional coding is more prone to transmission errors
In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codewordHere an error in the reference line is likely propagated in all the other linesFor this reason there are 1 reference line for each klines (i.e. k-1 are coded w.r.t. each ref line)standard resolution k=2high resolution k=4
14
CCITT fax standard compression performances
Standard resolution (~200x100 dpi)G3 1D 0.13 bits/pixel 57s. for A4 at 4.8 KbpsG3 2D (k=2) 0.11 bits/pixel 47s. for A4 at 4.8 Kbps
High resolution (~200x200 dpi)G3 2D (k=4) 0.09 bits/pixel 74s. for A4 at 4.8 Kbps
Compression is very good for office image where run lengths are longIt would be very bad for bilevel natural images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15
Continuous-tone images: why lossless compression?
lossy compression is often preferred to have remarkably more compressed images, with good qualityHowever there are some situations in which using an approximation may not be adequate
medical imageshistorical documentsimages with legal relevance
16
Continuous-tone images: lossless compression
GIF standardPNG standardJPEG-LS
It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’extimation of pixel value using quite simple context: effective and low cost solutionwww.hpl.hp.com/loco
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 17
GIF image format - I
Adopted by CompuServe to minimize the time required to download images over a modem linkThe most widely used lossless image format until 19958-bit pixel description256 color images, but it is possible to use a color map
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 18
GIF image format - II
The color map can be specified for each image or can be omitted
if specified, it is included as an header into image file, in uncompressed formcolor map is composed of 256 24-bit entries, that specify 256 RGB colors
Compression scheme used is LZWAlphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information”code
19
GIF image format - IIIEven if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color mapLZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing itIn 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZWThis catalyzed the development of a new lossless image format, designed for public domain and with the last improvements
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 20
PNG image format - I
Portable Network Graphics (pronounced “ping”)
it uses gzip compression schemethrough some improvements compression obtained is about 10-30% better than GIFBy default it encodes the pixels in raster scan order, but some other methods are available
it is possible to code horizontal difference, i.e. the difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixelaverage difference, the difference with the average of above and next pixel...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 21
PNG image format - II
It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit colorGIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background
It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 22
Continuous-tone images: why lossy compression?
Digital images are yet an approximation of the real analog phenomenonlossy techniques allow to obtain very good compression with a modest lost of detailsThis is useful for storing and trasmitting images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 23
Continuous-tone images: lossy compression
JPEGJPEG2000
a new image coding system that uses state-of-the-art compression techniques based on wavelet technologyfile extension .jp2With very compressed files, if image size is the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 24
JPEG format - I
JPEG is a standard defined by the Joint Photographic Experts Group in 1992It was conceived to transmit images at 64 KbpsIt has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard)With lossy mode it allows to obtain very good quality at about 1 bit/pixelImplementation complexity is reasonable
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 25
JPEG format - II
It could be used with graylevel and color imagesEach channel of the color space (RGB, YUV...) is treated separatelyit allows progressive transmission (that is much better suited for WWW than raster transmission)
Raster vs. progressive transmission
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 27
JPEG Coder - I
BinaryBinaryEncoderEncoder
DiscreteDiscreteCosineCosine
TransformTransformQuantizationQuantization
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 28
JPEG Coder - II
Image is divided in 8x8-pixel squaresPreprocessingApply Discrete Cosine Transform on each squareCoefficient quantizationBit stream encoding
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29
Preprocessing: color space transformation & downsampling
from RGB into YUVThe Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturationHuman eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively
4:4:4 no downsampling4:2:2 horizontal downsampling of a factor 24:2:0 both horizontal and vertical downsampling
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 30
Discrete Cosine Transform - I
The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbersIt is used in JPEG because it is fast and quite easy to implement efficiently
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 31
Discrete Cosine Transform - II
where the block is pixels (in JPEG, 8x8)A(i,j) is the value of pixel of position (i,j)
is the DCT coefficient of positionlow values for corresponds to low vertical frequencies, low values for to low horizontal frequenciesGenerally higher frequencies have very low values
1 2N N×
1 2B(k ,k ) 1 2(k ,k )
1k2k
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 32
Discrete Cosine Transform - III
DCT function basis
each 8x8 square is reduced to 64 coefficient
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 33
Discrete Cosine Transform - IV
Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the squareBut
finite precisionquantization of the coefficients (always)Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible
34
Quantization - I
The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factorthe factor for each component has been decided based on human sensitivity to changes at each frequencyIn practice the matrix of factor is usually
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 35
Quantization - II
Next, all values are rounded to nearest integerThis leads to a quite high number of 0s in the high frequency zone, as factors are bigger
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 36
Zig-zag scan
Low frequency coefficients are transmitted before higher frequency coefficientsThis allows for progressive visualization of this 8x8 block
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 37
Raster vs. progressive transmission
Raster transmissionDCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on
Progressive transmissionfirst all (0,0) coefficients, than all (0,1) and so on, following zig-zag scan in each block
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 38
Binary coding
DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value
For this reason it is convenient to encode the difference from the previous value
Tipically the bit stream is coded with HuffmanIt is possible to use arithmetic scheme, gaining some compression at cost of decoding speed
Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 39
JPEG Decoder
BinaryBinaryDecoderDecoder
DequantizationDequantization
Some values are lost!
Inverse DCTInverse DCT
Good quality, but reconstruction is not exact
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 40
JPEG performances - I
41
JPEG performances - IIOriginal Quality factor 75
Quality factor 20 Quality factor 3