Download - Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1

Introduction

Much of the information is in form of images

Images are handled by machines as a matrix of digital picture elements, or pixels

The appearance of an image depends onimage type

resolution


Types of images & Resolution

bilevel (black & white)e.g. faxes

grayscalecolor

dot per inches (dpi)600 x 600 – actual medium quality laser printer1200 x 1200 – low cost phototypesetter4800 x 4800 – high resolution phototypesetter


Bilevel images: CCITT fax standard

fax: facsimileCCITT Comité Consultatif International Téléphonique et

Télégraphique, it is part of the ITU International

Telecommunication Union, one of the specialized agencies of the United NationsIn the late 70s CCITT starts thinking about a standard for fax transmission1980 CCITT Group 3 standard

group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions


CCITT Group 3 - I

It is the most common standard for fax transmissionIt is accepted worldwide, almost every fax machine supports this standardIt uses compression algorithms for bilevel images

5

CCITT Group 3 - II

Paper size: international A4 (not US letter)standard resolution 204x98 dpi (200x100)high resolution 204x196 dpi (200x200)

1728 bits/line

1188 lines/page

bilevel image 1 bit/pixelimage size: 1728x1188 bits at standard resolution about 2 MbitTransmission rate: 4.8 Kbit/s

today is usually higher, 14.4 – 33.6 Kbit/sAt 4.8Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms


Run-length coding

Each scan line is composed by sequences of pixel of the same color

Count the number of element of each runExample 3w 4b 9w 2b 2w 6b 5w 2b 5w...


G3 1D

Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes runlengths using a predefined Huffman codeIn order to maintain black/white syncronization, each line begins with a white run, eventually of zero length


G3 1D

1000 011 10100 11 0111 0010 ...

predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents


G3 1DAs one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengthsAs shorter runs occur more frequently that longer runs, we code each run length in an additive form

there is a terminating and makeup codewordLengths form 0 to 63 are coded with a single terminating codewordLonger runs are coded with one or more makeup codewords and a terminating codeword

Each line is terminated with a EOL symbol composed of eleven 0 and one 1

10

G3 2DGroup 3 Two-Dimensional coding (G3 2D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate)Many images have a high degree of vertical coherence between consecutive lines

changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line


G3 2D

Nearby means within an interval of radius 3 pixelsIf there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1D)On the opposite if the ref line has a run with no counterpart in the current line special pass code

12

G3 2D

reference line

current line

...vertical mode horizontal mode

pass code

vertical mode

<mode | length of preceding white run | length of black run>

+2 -2-1 0

from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3

0001generated code


G3 2D

Two dimensional coding is more prone to transmission errors

In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codewordHere an error in the reference line is likely propagated in all the other linesFor this reason there are 1 reference line for each klines (i.e. k-1 are coded w.r.t. each ref line)standard resolution k=2high resolution k=4

14

CCITT fax standard compression performances

Standard resolution (~200x100 dpi)G3 1D 0.13 bits/pixel 57s. for A4 at 4.8 KbpsG3 2D (k=2) 0.11 bits/pixel 47s. for A4 at 4.8 Kbps

High resolution (~200x200 dpi)G3 2D (k=4) 0.09 bits/pixel 74s. for A4 at 4.8 Kbps

Compression is very good for office image where run lengths are longIt would be very bad for bilevel natural images


Continuous-tone images: why lossless compression?

lossy compression is often preferred to have remarkably more compressed images, with good qualityHowever there are some situations in which using an approximation may not be adequate

medical imageshistorical documentsimages with legal relevance

16

Continuous-tone images: lossless compression

GIF standardPNG standardJPEG-LS

It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’extimation of pixel value using quite simple context: effective and low cost solutionwww.hpl.hp.com/loco


GIF image format - I

Adopted by CompuServe to minimize the time required to download images over a modem linkThe most widely used lossless image format until 19958-bit pixel description256 color images, but it is possible to use a color map


GIF image format - II

The color map can be specified for each image or can be omitted

if specified, it is included as an header into image file, in uncompressed formcolor map is composed of 256 24-bit entries, that specify 256 RGB colors

Compression scheme used is LZWAlphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information”code

19

GIF image format - IIIEven if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color mapLZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing itIn 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZWThis catalyzed the development of a new lossless image format, designed for public domain and with the last improvements


PNG image format - I

Portable Network Graphics (pronounced “ping”)

it uses gzip compression schemethrough some improvements compression obtained is about 10-30% better than GIFBy default it encodes the pixels in raster scan order, but some other methods are available

it is possible to code horizontal difference, i.e. the difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixelaverage difference, the difference with the average of above and next pixel...


PNG image format - II

It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit colorGIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background

It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF


Continuous-tone images: why lossy compression?

Digital images are yet an approximation of the real analog phenomenonlossy techniques allow to obtain very good compression with a modest lost of detailsThis is useful for storing and trasmitting images


Continuous-tone images: lossy compression

JPEGJPEG2000

a new image coding system that uses state-of-the-art compression techniques based on wavelet technologyfile extension .jp2With very compressed files, if image size is the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images


JPEG format - I

JPEG is a standard defined by the Joint Photographic Experts Group in 1992It was conceived to transmit images at 64 KbpsIt has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard)With lossy mode it allows to obtain very good quality at about 1 bit/pixelImplementation complexity is reasonable


JPEG format - II

It could be used with graylevel and color imagesEach channel of the color space (RGB, YUV...) is treated separatelyit allows progressive transmission (that is much better suited for WWW than raster transmission)

Raster vs. progressive transmission


JPEG Coder - I

BinaryBinaryEncoderEncoder

DiscreteDiscreteCosineCosine

TransformTransformQuantizationQuantization


JPEG Coder - II

Image is divided in 8x8-pixel squaresPreprocessingApply Discrete Cosine Transform on each squareCoefficient quantizationBit stream encoding


Preprocessing: color space transformation & downsampling

from RGB into YUVThe Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturationHuman eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively

4:4:4 no downsampling4:2:2 horizontal downsampling of a factor 24:2:0 both horizontal and vertical downsampling


Discrete Cosine Transform - I

The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbersIt is used in JPEG because it is fast and quite easy to implement efficiently


Discrete Cosine Transform - II

where the block is pixels (in JPEG, 8x8)A(i,j) is the value of pixel of position (i,j)

is the DCT coefficient of positionlow values for corresponds to low vertical frequencies, low values for to low horizontal frequenciesGenerally higher frequencies have very low values

1 2N N×

1 2B(k ,k ) 1 2(k ,k )

1k2k


Discrete Cosine Transform - III

DCT function basis

each 8x8 square is reduced to 64 coefficient


Discrete Cosine Transform - IV

Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the squareBut

finite precisionquantization of the coefficients (always)Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible

34

Quantization - I

The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factorthe factor for each component has been decided based on human sensitivity to changes at each frequencyIn practice the matrix of factor is usually


Quantization - II

Next, all values are rounded to nearest integerThis leads to a quite high number of 0s in the high frequency zone, as factors are bigger


Zig-zag scan

Low frequency coefficients are transmitted before higher frequency coefficientsThis allows for progressive visualization of this 8x8 block


Raster vs. progressive transmission

Raster transmissionDCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on

Progressive transmissionfirst all (0,0) coefficients, than all (0,1) and so on, following zig-zag scan in each block


Binary coding

DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value

For this reason it is convenient to encode the difference from the previous value

Tipically the bit stream is coded with HuffmanIt is possible to use arithmetic scheme, gaining some compression at cost of decoding speed

Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream


JPEG Decoder

BinaryBinaryDecoderDecoder

DequantizationDequantization

Some values are lost!

Inverse DCTInverse DCT

Good quality, but reconstruction is not exact


JPEG performances - I

41

JPEG performances - IIOriginal Quality factor 75

Quality factor 20 Quality factor 3

Download - Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Top Related