video technology 2013-14 - minor ii

8/12/2019 Video Technology 2013-14 - Minor II

1/282

Image &Video Processing

1

B.Tech Sem VIII


2/282

Course Contents

1. Introduction to Image, Video1. Human Visual System (HVS)

2. Colours

Biology, Physics, Technology, Coding

3. Image Processing (examples)

Capture, Preprocessing, 1D and 2D Fourier transformation,

Minor I

1D an D convo ut on, reconstruct on, a as ng, ter ng

Enhancement 1. noise reduction, filter masks

2. edge-detection, histograms,

3. image segmentation4. Image Compression (example)

5. History and Basics of Video Technology (examples)

6. Video Compression

Minor II


3/282

MINOR I


4/282

Images

Images, often calledpictures, are represented bybitmaps.

A bitmap is a spatial two-dimensional matrix made upof individual picture elements calledpixels.

Each pixel has a numerical value calledamplitude.

4

The number of bits available to code a pixel is calledamplitude depth orpixel depth.

A pixel depth may represent

a black or white dot in images

a level of gray in continuous-tone, monochromatic images, or

the color attributes of the picture element in colored pictures.


5/282

An Image is...

(will discuss in later lectures)52 6B 8C 6B 73 5A 635A 6B 73 84 84 73 73

5A 84 84 73 5A 84 84

6B 6B 8C 5A 42 4A 42

42 6B 6B 5A A5 DE D6

5A 6B 5A 42 F7 F7 F7

84 5A 6B 31 DE F7 F7

45 71 82 7D 7D 55 5D

55 75 7D 75 71 6D 65

55 75 7D 6D 61 75 75

made up of pixels

each pixel is a

combination of Red

Green and Blue color

amplitudes

45 71 75 55 41 41 38

34 51 55 55 9E CF CF

51 55 51 38 FF FF FF

71 51 59 34 DB FF FF

42 7B 7B 9C 7B 63 4A

63 7B 7B 73 73 63 63

63 73 7B 63 63 73 7342 73 73 63 42 42 39

31 4A 4A 63 94 DE DE

4A 4A 42 39 FF FF FF

73 42 5A 31 BD FF FF

Represented by I(x,y) of the two spatial coordinates ofthe image plane.

I(x,y) is the intensity of the image at the point (x,y)

on the image plane.

Color image represented by R(x,y), G(x,y), B(x,y)


6/282

Image processing An image processing operation typically defines a new

image g in terms of an existing imagef.

The simplest operations are those that transform each

pixel in isolation. These pixel-to-pixel operations can be

written:

g(x,y)t(f(x,y))

Examples: threshold, RGB grayscale

Note: a typical choice for mapping to grayscale is to apply

the YIQ television matrix and keep the Y.

Y

I

Q

0.299 0.587 0.114

0.596 0.275 0.321

0.212 0.523 0.311

R

G

B


7/282

Pixel movement

Some operations preserve intensities, but movepixels around in the image

( , ) ( ( , ), ( , ))g x y f x x y y x y

7

Examples: many amusing warps of images

[Show image sequence.]


8/282

Recommended Books

Gonzales and R.E Woods, Digital Image Processing, AddisonWesley, 1993, 3rd Edition.

A.K Jain, Digital Image Processing, Prentice Hall India. 1989

K.R. Rao & J.J. Hwang, Techniques & Standards for Image, , , .


9/282

Overview: Image Formats Uncompressed

pgm (portable gray map) or ppm (portable pixel map) Unix,

bmp (gary and color) Windows.

Compressed GIF (Graphics Interchange Format) :

Average Compression Ratio 4:1.

GIF87a and GIF 89a.

9

.

JPEG (Joint Photographic Experts Group) Good for photos, not very good for small image or line arts less

than 100x100 pixels. Compression ratio 10:1 to 100:1.

PNG (Portable Network Graphics) More color depth (up to 48bit) than GIF(8bit). 10 30 % smaller than GIF. Automatic anti-alias. Text based metadata can be added.

tif, tiff, ps, pdf, eps etc.


10/282

Clarifications: Dimension in different

context Dimension of a signal ~ # of index variables

Audio and speech is 1-D signal: over time or sampled time index

Image is 2-D: over two spatial indices (horizontal and vertical)

Video is 3-D: over two spatial indices and one time index

Dimension of an image ~ size of digital image How man ixels alon each row and column: e. . 512x512 ima e Also referred to as the resolution of an image

Dimension of a vector space ~ # of basis vectors in it [ x(1), , x(N) ]T ~ # of elements in the vector


11/282

Colours

Vipan Kakkar 11


12/282

Upon completion of this unit, you

should be able to explain the

colours and describe how it is

Colours

Vipan Kakkar 12

.


13/282

What is colour?

colour

:

The appearance of objects or light

sources described in terms of the

individuals perception of them,

13

involvinghue,brightness, and

saturation.

colour is a sensation -- a perception

of the viewer.Websters New College Dictionary


14/282

The Visible Spectrum


15/282

The colour of Light

15


16/282

Light

Illuminating sources:

emit light (e.g. the sun, light bulb, TV monitors)

perceived colour depends on the emitted freq.

follows additive rule

R+G+B=White

16

Reflecting sources:

reflect an incoming light (e.g. the colour dye, cloth etc

perceived colour depends on reflected freq (=emitted freq-

absorbed freq.) follows subtractive rule

R+G+B=Black


17/282

Wavelength of the light

(RGB)

17


18/282

The colour Sensitivity of Light

visible region of the

Spectrum.

This visible region is a very

narrow segment of this

18

extending from

~ 440nm in the extreme

blue (near ultra violet)

to

~ 690 nm in the red

region--with green in the

middle @ ~ 555 nm.


19/282

Primary colours

19

Red Green Blue


20/282

Food for thought

Why R, G, B?

These are primary colours.

y are ese cons ere pr mary co ours


21/282

Food for thought

Why R, G, B?


Why are these considered primary colours.

Due to laws of physics

Due to laws of biology


22/282

Human Visual System (HVS)

Eyes, optic nerve, parts of the

brain

Transforms electroma netic

energy


23/282

Human Visual System

Image Formation

Cornea (focus, curvature, RI), sclera

(nerves till optic nerve), pupil (point for

light),iris (controls the quantity oflight), ens (focus), re na (image),

fovea (central vision, cones)

Transduction

retina, rods, and cones Processing

optic nerve, brain


24/282

Human Visual System

From a HVS perspective

the retina is composed of

three kinds of cone cells that

have a high sensitivity to .

These are 630nm (red), 530nm

(green) and 450nm (blue).

So, All image/video displays arebased on this theory of vision to

reproduce color.


25/282

The Human Vision System (HVS)

Naturally, an eye can adapt to ahuge range of intensities from

lowest visible light to highest

bearable glare.

The e e uses two t es of discrete

light receptors.

6-7 million centrally located

cones are highly sensitive to

colour and bright light.

75-100 million rods across the

surface of the retina are

sensitive to light but not colour.

Above : A cardiologist examining a coronary angiogram.

Above right: http://www.wiu.edu/users/mfmrb/PSY343/SensPercVision.htm


26/282

Transduction (Retina)

Transform light to neural

impulses

Converts light energy into

electrical impulses

Bipolar cells signal

ganglion cells

Axons in the ganglion cells

form optic nerveRodsBipolar cells

ConesGanglion

Optic nerve


27/282

The Human Vision System (HVS) As in the camera the image is

projected upside down on theback surface.

However, the human vision system(HVS) is an extremelysophisticated multi-stage process.

The first steps in the sensoryprocess of vision involve thestimulation of light receptors to

.

Electrical signals containing thevision information from each eyeare transmitted to the brainthrough the optic nerves.

The image information is

processed in several stages,ultimately reaching the visualcortex of the cerebrum.

We see with our brain not oureyes!


28/282

Rods vs Cones

Contain photo-pigment

Respond to low energy

Enhance sensitivit

Contain photo-pigment

Respond to high energy

Enhance erce tion

Cones Rods

Concentrated in retina, butoutside of fovea

One type, sensitive to

grayscale changes

Concentrated in fovea, existsparsely in retina

Three types, sensitive to

different wavelengths


29/282

Camera and Eye


30/282

Food for thought

Why R, G, B?


Why are these considered primary colours.

Due to laws of physics

Due to laws of biology

Primary colours therefore are primary only becausecones in our eyes are sensitive to those three colours.


31/282

Tri-stimulus Theorem (1/3)

RGB Model: Different intensities ofred, green, and blue are added to generate

various colors.

Luma/ Chroma Representation (to be described later):

The luminance component (Y) contains the gray-scale information

31

. ., .

The chrominance component defines the color (U) and theintensity (V) of the color.

Advantage: The human eye is more susceptible to brightness thancolor.

A compression scheme can use gray-scale information to define detailand allows loss of color information to achieve higher rates of compression(i.e., JPEG).


32/282

Primary colors cannot be obtained by mixing the other two primary

colors.


32

are three types of color receptors in a human eye.


33/282


3 types of cones (6 to 7 million of them) Red = L cones, Green = M cones, Blue = S cones

Ratio differentiates for each person

E.g., Red (64%), Green (32%), rest S cones

E.g., L(75.8%), M(20%), rest S cones

E.g., L(50.6%), M(44.2%), rest S cones Source of information:

See cone cell in wikipedia

www.colorbasics.com/tristimulus/index.php

Each type most responsive to a narrow band red and green absorb most energy, blue the least

Light stimulates each set of cones differently, and the ratiosproduce sensation of color


34/282

Color Specification Systems

(Color Spaces)

Spectral Power Distribution (SPD): A plot of radiant energy of a

color vs wavelength.

The luminance, hue (color), and saturation of a color can be

specified most accurately by its SPD. However, SPD does not

describe the relationship between the physical properties of a

34

color and its visual perception.

International Commission on Illumination (CIE-Commission

Internationale dEclairage) system defines how to map an SPD

to a triple-numeral-component that are mathematical

coordinates in a color space (more details during videoprocessing lectures).


35/282

What is color space?

A model used to define a specified color

R, G and B represent lighting colors of red, green and blue

respectively.

Colour Space

Combining red, green and blue

- with different weights can produce any visible color.

- A numerical value is used to indicate the proportion of each color.

-Drawback:

-3 colors are equally important and should be stored with same

amount of data bits.

-Therefore another color representation may be required (described

later)


36/282

The CIE (Chromaticity System) was established to define an

"average" human observer.

The average human eye is most sensitive to green/yellow light

and least sensitive to reds or blues (slide 11).

The CIE System (1/3)

36

apparatus in order to define a "standard observer". The results are

shown here, and are called "CIE color space".


37/282


CIE 1931 XYZ system

One of the color spaces

The first mathematically defined color space

37

Three parameter:

X, Y, Z

or Y (brightness), x, y (chroma)


38/282


CIE Chromaticity

Diagram

Spectral Locus

38

Parameter x, y


39/282

Refresher: Color Theory

Exam les


40/282

Red Green

colour

40

Blue


41/282

Red

colour

41

Red+ Blue

Magenta

Red+ Blue

Magenta

Blue

Magenta


42/282

Red GreenYellow

colour

42

Green

+ Red

Yellow

Green

+ Red

Yellow


43/282

Green

colour

43

Green+ Blue

Cyan

Green+ Blue

Cyan

Blue

Cyan


44/282

Red Green

colour

44

Red

Green

+ Blue

Red

Green

+ Blue

White

Blue

White


45/282

Additive Theory

Black radiates no light White (sun) radiates all light

Video is the process of capturing andradiating light, therefore it uses Additive(Light) Theory not Subtractive (Pigment)Theory.

The primary colours in Additive Theory are:

Red ( R )

Green ( G )

Blue ( B )

The primary colours add together tomake white

Light Theory is also called AdditiveTheory.

Light Theory is used in Television, theaterlighting, computer monitors, and videoproduction.


46/282

The Colour Wheel

formed:


47/282

colour Perception (colour Theory) Hue

distinguishes named colours,e.g., RGB

dominant wavelength of thelight

Saturation

Perceived intensity of a

Hue Scale

O

rigi

how far colour is from a grayof equal intensity

Brightness (lightness) perceived intensity

Saturation

nal

lightness

Source: Wikipedia


48/282

The Colour Wheel Colours on the wheel

can be described usingthree parameters:

1. Hue: degrees from 0 to

360

2. Saturation: brightnessor dullness

3. Value: lightness or

darkness

(As suggested by Henry Albert Munsell inAColour Notation, 1905)


49/282

The Colour Wheel: Hue Hue or Spectral Colour is

represented as an angle.

Primary Colours: 0 = Red

120 = Green

240 = Blue

Secondary Colours: 60 = Yellow

180 = Cyan

300 = Magenta


50/282

The Colour Wheel: Saturation Saturation or Chroma is the

intensity of a colour.

A highly saturated colour isbright and appears closer tothe ed e of the wheel.

A more unsaturated colouris dull.

A colour with no saturation

is achromatic or in the greyscale.


51/282

The Colour Wheel: Value"the quality by which we

distinguish a light colour

from a dark one."- Albert Henry Munsell

A Colour Notation 1905

Value represents the luminescent

contrast value between black

and white


52/282

The Colour Wheel 3dThree parameters to describe a colour: Hue

Chroma Value


53/282

MANY more scientific models based on different colour theory: (Example: Colour

Tree by American artist Henry Albert Munsell from

A Colour Notation, 1905.)


54/282

Colour SchemesSystematic ways of selecting colours

Monochromatic

Complimentary

Analo ous

Warm

Cool

Achromatic Chromatic Grays


55/282

Colour Schemes: Monochromatic

Monochromatic:One Hue many values of Tintand Shade

Artist: Marc ChagallTitle: Les Amants Sur Le Toit


56/282

Colour Schemes: Complementary (notespelling--NOT complimentary)

Complementary: Colours thatare opposite on the wheel.High Contrast

Artist: Paul Cezanne

Title: La Montage Saint VictoireYear: 1886-88


57/282

Colour Schemes: Analogous

Analogous: A selection ofcolours that are adjacent.Minimal contrast

Artist: Vincent van Gogh

Title: The IrisYear: 1889


58/282

Colour Schemes: Warm

Warm: First half of the wheelgive warmer colours. Thecolours of fire.

Artist: Jan Vermeer

Title: Girl Asleep at a TableYear: 1657


59/282

Colour Schemes: Cool

Cool: Second half of the wheelgives cooler colours

Artist: Pablo Picasso

Title: Femme Allonge LisantYear: 1939


60/282

Colour Schemes:

Achromatic, Chromatic Grays

Achromatic: Black and white with all the

grays in-between.

Chromatic Grays:Also called neutral

relief. Dull colours, low contrast.


61/282

Additive Color Mixing

The mixing oflight

61

Primary: Red, Green, Blue

The complementary color

White means


62/282

Subtractive Color Mixing (1/2..)

The mixing ofpigment

62

Primary: Cyan, Magenta, Yellow

The complementary color

Why black?


63/282

Subtractive Color Mixing (2/2..)

Why?

Pigments absorb light

63

Thinking:

the Color Filters

Question:

Yellow + Cyan=?


64/282

Image Acquisition


65/282

Sensor:Charge-coupled device (CCD)

Special sensor that captures an image

Light-sensitive silicon solid-state device composed of many cells

When exposed to light, each

cell becomes electrically

charged. This charge can

then be converted to a 8-bit

The electromechanical

shutter is activated to expose

the cells to light for a brief

Lens

area

65

value where 0 represents no

exposure while 255

represents very intense

exposure of that cell to light.

Some of the columns are

covered with a black strip of

paint. The light-intensity ofthese pixels is used for zero-

bias adjustments of all the

cells.

moment.

The electronic circuitry, when

commanded, discharges the

cells, activates the

electromechanical shutter,

and then reads the 8-bit

charge value of each cell.These values can be clocked

out of the CCD by external

logic through a standard

parallel bus interface.

Pixel

columns

overe

columns

Electronic

circuitry

Electro-

mechanical

shutter

Pixel

rows


66/282

Sensor:Charge-coupled device (CCD)

Special sensor that captures an image

Light-sensitive silicon solid-state device composed of many cells

When exposed to light, each

cell becomes electrically

charged. This charge can

then be converted to a 8-bit

The electromechanical

shutter is activated to expose

the cells to light for a brief

Lens

area

66

value where 0 represents no

exposure while 255

represents very intense

exposure of that cell to light.

Some of the columns are

covered with a black strip of

paint. The light-intensity ofthese pixels is used for zero-

bias adjustments of all the

cells.

moment.

The electronic circuitry, when

commanded, discharges the

cells, activates the

electromechanical shutter,

and then reads the 8-bit

charge value of each cell.These values can be clocked

out of the CCD by external

logic through a standard

parallel bus interface.

Pixel

columns

overe

columns

Electronic

circuitry

Electro-

mechanical

shutter

Pixel

rows


67/282

Image Sampling And Quantisation

Remember that a digital image is always only

an approximation of a real world scene


68/282

An Image is... (1/2)52 6B 8C 6B 73 5A 63

5A 6B 73 84 84 73 73

5A 84 84 73 5A 84 84

6B 6B 8C 5A 42 4A 42

42 6B 6B 5A A5 DE D6

5A 6B 5A 42 F7 F7 F7

84 5A 6B 31 DE F7 F7

45 71 82 7D 7D 55 5D

55 75 7D 75 71 6D 6555 75 7D 6D 61 75 75

made up of pixels

each pixel is a

combination of Red


amplitudes

45 71 75 55 41 41 38

34 51 55 55 9E CF CF

51 55 51 38 FF FF FF

71 51 59 34 DB FF FF

42 7B 7B 9C 7B 63 4A

63 7B 7B 73 73 63 63

63 73 7B 63 63 73 73

42 73 73 63 42 42 39

31 4A 4A 63 94 DE DE


73 42 5A 31 BD FF FF

Represented by I(x,y) of the two spatial coordinates ofthe image plane.

I(x,y) is the intensity of the image at the point (x,y)

on the image plane. f(x,y) can also be interchangeably used as

A function.



69/282

An Image is... (2/2)

A color image is just three functions pasted together. Wecan write this as a vector-valued function:

( , )

( , ) ( , )

( , )

r x y

f x y g x y

b x y

Well focus in grayscale (scalar-valued) images for now.


70/282

Spatial and Frequency Domains

Spatial domain

refers to planar region of

intensity values at time t

Spatial domain Frequency domain

requency oma n

An image plane as a

sinusoidal function of

changing intensity values

refers to organizing pixels

according to their changing

intensity (frequency)

CS 414 - Spring 2009

f(x,y)

F(sx,sy )


71/282


72/282

Image Transformation

Fourier Transform


73/282

Filtering in the Frequency Domain


74/282

Fourier transforms

We can represent a function as a linear combination

(weighted sum) of sines and cosines.

We can think of a function in two complementary ways:

Spatially in the spatial domain

The Fourier transform and its inverse convert between these

two domains:

Frequencydomain

Spatialdomain

F(s) f(x)e i2sx

dx

f(x) F(s)e i2sx

ds


75/282

2D Fourier transform

Frequency

domain

Spatial

domain


F(sx,sy ) f(x,y)ei2sxx

ei2syydxdy

f(x,y) F(sx,sy )ei2sxx

e i2syydsxdsy

f(x,y)

F(sx ,sy )


76/282

Fourier transforms (contd)

Where do the sines and cosines come in?

Frequency

domain

Spatial

domain

F(s) f(x)e i2

sx

dx

f(x) F(s)ei2sx

ds

f(x) is usually a real signal, but F(s) is generallycomplex:

Iff(x) is symmetric, i.e.,f(x) =f(-x)), then F(s) =Re(s).

F(s) Re(s) i Im(s) F(s) ei2s


77/282

What if f(x,y) were separable? That is,

f x = f x f

)(2e),(f),F( dxdyyxSySx ySxSi yx


)(221 e)()(),F( dxdyyfxfSySx ySxSi yx

)(22)(2

1 e)(e)(),F( dxdyyfxfSySx ySixSi xx

Breaking up the exponential,



78/282

dyyfdxxfSySx ySixSi yx )(22)(2

1 e)(e)(),F(

Separating the integrals,

Fourier transforms (cont d)

, 21 yxyx Using these two,

-the spatial domain image is first transformed into an intermediate image

using N one-dimensional Fourier Transforms.-This intermediate image is then transformed into the final image, again

using N one-dimensional Fourier Transforms.


79/282


80/282

2D Fourier transform

Frequency

domain

Spatial

domain

F(sx,sy ) f(x,y)e i2sxx

ei2syydxdy


e i2syydsxdsy


-The Fourier Transform: used to decompose an

ima e into its sine and cosine com onents.

f(x,y)

F(sx,sy )

-The output of the transformation represents

the image in the frequency domain, while the

input image is the spatial equivalent.

- In the Fourier domain image, each pointrepresents a frequency contained in the spatial

domain image.


81/282

Example - revisitedSpatial domain Frequency domain

f(x,y)

F(sx,sy )

The Fourier image has two basic

component ?


82/282


f(x,y)

F(sx,sy )

The Fourier image is shown in such a

way that the DC-value F(0,0) is

displayed in the center of the image.

The further away from the center an

image point is, the higher is its

corresponding frequency.


83/282


f(x,y)

F(sx,sy )

The Fourier image is shown in such a

way that the DC-value F(0,0) is

displayed in the center of the image.

The further away from the center an

image point is, the higher is its

corresponding frequency.

We can see that the DC-value is by far the

largest component of the image.

However, the dynamic range of the Fouriercoefficients (i.e. the intensity values in the

Fourier image) is too large as obtained in the

image above.


84/282

Why FT

Frequency

domain

Spatial

domain


ei2syydxdy


e i2syydsxdsy


f(x,y)

F(sx,sy )

Fourier Transform contain a set of samples

which is large enough to fully describe the

spatial domain image.

The number of frequencies corresponds to the

number of pixels in the spatial domain

image, i.e. the image in the spatial and Fourier

domain are of the same size.


85/282

2D - FT

Frequency

domain

Spatial

domain


ei2syydxdy


e i2syydsxdsy


Where:

f(x,y)

F(sx,sy )

-f(x,y) is the image in the spatial domain and

- the exponential term is the basis function

corresponding to each point F(sx, sy) in the

Fourier space.

-The equation can be interpreted as: the value

of each point F(sx, sy) is obtained bymultiplying the spatial image with the

corresponding base function and summing the

result.


86/282

1D Fourier examples


87/282

2D Fourier examples

Spatial Frequency

domain

f(x,y)

F(sx ,sy )


88/282

DFT

A

2D Fourier examples

DFT

DFT

0.25 * A

+ 0.75 * B


89/282

Summary

We have looked at:

Human visual system

Light and the electromagnetic spectrum

Colors in imaging Image capture

Image sensing and acquisition

Image representation Fourier and spatial domains

Sampling, quantisation and resolution

Next topic: image enhancement techniques


90/282

Image Pre-processing

M i i fil i d i i


91/282

Motivation: filtering and resizing

Pre-processing

What if we now want to:

smooth an image?

sharpen an image?

shrink an image?

Before we try these operations, lets revisit

and think about images in a moremathematical way


92/282

Image Resolution

How many pixels

Spatial resolution

How many shades of grey/colours

92

How many frames per second

Temporal resolution

Nyquists theorem


93/282

Nyquists Theorem

A periodic signal can be reconstructed if the

sampling interval is half the period

An object can be detected if two samples span

93


94/282

Spatial Resolution

94

n, n/2, n/4, n/8, n/16 and n/32 pixels on a side.


95/282

Amplitude Resolution

Humans can see:

About 40 shades of brightness

About 7.5 million shades of colour

95

Depends on signal to noise ratio

40 dB equates to about 20 shades

Images captured:

256 shades


96/282

Shades of Grey

96

256, 16, 4 and 2 shades.

T l R l ti ( ill b d


97/282

Temporal Resolution (will be done

later)

Nyquists theorem for temporal data

How much does an object move between frames?

Can motion be understood unambiguously?

97


98/282


99/282



100/282



101/282

Some Basic Filters and Their Functions

Multiply all values ofF(u,v) by the filter function (notch filter):

All this filter would do is set F(0,0) to zero (force the average value of

an image to zero) and leave all frequency components of the Fourier

otherwise.1

)2/,2/(),(if0),(

NMvuvuH

trans orm untouc e .

2 l d h


102/282

Basic 2D Filters and Their Functions

Lowpass filter

Highpass filter


103/282

Convolution


104/282

2D Convolution

Lets Try it with Two-Dimensions!


105/282

Let s Try it with Two Dimensions!This image exclusively has 32 cycles

in the vertical direction.

This image exclusively has 8 cycles in

the horizontal direction.

So what is going on here?

The u axis runs from left to right and it represents

the horizontal component of the frequency. The v

axis runs up and down and it corresponds tovertical components of the frequency.

x-y coordinate system

Fourier Transform

You will notice that the second example is a

little more smeared out. This is because the

lines are more blurred so more sine waves are

required to build it. The transform is weighted

so brighter spots indicate sine waves more

frequently used.

The central dot is an average of all the sine waves

so it is usually the brightest dot and used as a

point of reference for the rest of the points.

Since this is inverse space, dots close to the origin

will be further apart in real space than dots that

are far apart on the Fourier Transform. (Again

keeping in mind that these dots refer to the

frequency of a component wave.)

u-v coordinate system

Magnitude vs. Phase


106/282

Magnitude vs. Phase

The Fourier Transform is defined as:

Since Computers dont like infinite integrals a Fast Fourier

Transform makes it simpler:

Where F(w) is original function and f(t) is the transformed function

N

yvxui

eyxFvuf

)**(2*

),(),(

These two images are shifted pi with respect tox y

Where F(x,y) is real and f(u,v) is complex.

So what do we do with this?Well instead of representing the complex numbers as real and

imaginary parts we can represent it as Magnitude and Phase

where they are defined as:

Re

Imarctan)(

ImRe)( 22

fPhase

fMagnitude

Magnitude is telling how much of a certain frequency

component is in the image.

Phase is telling where that certain frequency lies in the

image.

each other.


107/282

Examples

Fourier

Convolution


108/282

Fourier Transform


109/282

2D-Fourier Transform


110/282

Fourier Transform


111/282

Fourier Transform


112/282

Image Pre-processing (2D Convolution)


113/282

Convolution

One of the most common methods for filtering afunction is called convolution.

In 1D, convolution is defined as:

x x * h x

where

h(x)h(x)

f( x)h(x x)d x

f( x)

h( x x)d x


114/282

Convolution properties

Convolution exhibits a number of basic, butimportant properties.

Commutativity:

a(x) b(x)b(x) a(x)

Associativity:

Linearity:

[a(x) b(x)] c(x)a(x) [b(x) c(x)]

a(x) [k b(x)] k [a(x) b(x)]

a(x) (b(x) c(x)) a(x) b(x) a(x) c(x)


115/282

Convolution in 2D

In two dimensions, convolution becomes:

g(x,y) f(x,y) h(x,y)

f( x, y)h(x x)(y y)d x d y

where

* =

f(x,y) h(x,y) g(x,y)

h(x,y)h(x,y)

,

l


116/282

Discrete convolution

For a digital signal, we define discrete convolution as:

g[i] f[i] h[i]

f[i]h[i i]

116

where

i

f[i]

h[i i]i

h[i] h[i]

l h l


117/282

1D convolution theorem example

i l i i 2


118/282

Discrete convolution in 2D

Similarly, discrete convolution in 2D becomes:

g[i,j] f[i,j] h[i,j]

f[i , j]h[i i ,j j]

i

118

where

f[i , j]

h[i i, j j]j

i

h[i,j] h[i,j]

2D l i h l


120/282

Reconstruction filters in 2D

We can perform reconstruction in 2D

Example problem

Find the Fourier transform of


121/282


Example problem: Answer.



122/282


f(x) = (x/4) (x/2) + .5(x)

Using the Fourier transforms of and and the linearity and scaling properties,

F(u) = 4sinc(4u) - 2sinc2(2u) + .5sinc2(u)

Example problem: Alternative Answer.



123/282


f(x) = (x/4) 0.5(((x/3) * (x))

*

Using the Fourier transforms of and and the linearity and scaling and convolution properties ,

F(u) = 4sinc(4u) 1.5sinc(3u)sinc(u)

2 1 0 1 2 1 -.5 0 .5 1

Plane waves

Lets get an intuitive feel for the plane wave )(2e vyuxi


124/282

Let s get an intuitive feel for the plane wave e

The period; the distance betweensuccessive maxima of the waves

defines the direction

Lines of constant phase undulation

in the complex plane

of the undulation.

Plane waves: sine and cosine waves


125/282

sin(2**x)

cos(2**x)

Plane waves: sine waves in the complex plane.


126/282

sin(10**x)

sin(10**x +4*pi*y)

Two-Dimensional Fourier Transform


127/282

Where in f(x,y), xand yare real, not complex variables.

)(2e),(f),F( dxdyyxvu vyuxi

Two-Dimensional Fourier Transform:

Two-Dimensional Inverse Fourier Transform:

)(2e),F(),( dudvvuyxf vyuxi

amplitude basis functions

and phase of

required basis functions

Separable Functions


128/282

What if f(x,y) were separable? That is,

f x = f x f

)(2e),(f),F( dxdyyxvu vyuxi

Two-Dimensional Fourier Transform:

)(221 e)()(),F( dxdyyfxfvu vyuxi

)(22)(2

1 e)(e)(),F( dxdyyfxfvu vyiuxi

Breaking up the exponential,

Separable Functions


129/282

)(22)(2

1 e)(e)(),F( dxdyyfxfvu vyiuxi

dyyfdxxfvu vyiuxi )(22)(2

1 e)(e)(),F(

Separating the integrals,

)()(),( 21 vFuFvuF


130/282

Fourier Transform

f(x,y) = cos(10x)*1

F(u,v) = 1/2 [(u+5,0) +(u-5,0)]

u

v

v

-0.5

0

0.5

Real [F(u,v)]

u

vImaginary [F(u,v)]


131/282


132/282

Fourier Transform

f(x,y) = sin(40x)

F(u,v) = i/2 [(u+20,0) - (u-20,0)]

u

v

v

-0.5

0

0.5

Real [F(u,v)]

u

vImaginary [F(u,v)]


133/282

Fourier Transform

f(x,y) = sin(20x + 10y)

F(u,v) = i/2 [(u+10,v+5) - (u-10,v-5)]

u

v

v

-0.5

0

0.5

Real [F(u,v)]

u

vImaginary [F(u,v)]



134/282


We can perform reconstruction in 2D


135/282

Image Pre-processing (Sampling)

Image Sampling And Quantisation (cont )


136/282

Image Sampling And Quantisation (cont)

Remember that a digital image is always only

an approximation of a real world scene

Sampling


137/282

Now, we can talk about sampling.

Sampling

The Fourier spectrum gets replicatedby spatial sampling!

How do we recover the signal?

Reconstruction filters


138/282


The sinc filter, while ideal, has two drawbacks:

It has large support (slow to compute)

It introduces ringing in practice

We can choose from many other filters



139/282


The sinc filter, while ideal, has two drawbacks:

It has large support (slow to compute)

It introduces ringing in practice

We can choose from many other filters



140/282


We can also perform reconstruction in 2D


141/282

MINOR I


142/282

Image Pre-processing (Aliasing)

Aliasing


143/282

Aliasing

Sampling rate is too low

Aliasing


144/282

Aliasing

What if we go below the Nyquist frequency?

Anti-aliasing


145/282

Anti aliasing

Anti-aliasing is the process ofremoving the frequencies

before they alias.

Anti-aliasing by analytic prefiltering


146/282

We can fill the magic box with analytic pre-filteringof the signal:

Why may this not generally be possible?


147/282

MINOR II


148/282

Image processing/Enhancement (Noise reduction)

What Is Image Enhancement?


149/282

What Is Image Enhancement?

Image enhancement is the process of making

images more useful

The reasons for doing this include:

g g ng n eres ng e a n mages

Removing noise from images

Making images more visually appealing

NoiseI i l i i i f d i bl b bl f


150/282

In signal processing, it is often desirable to be able to perform somekind of noise reduction on an image or signal.

Image processing is also useful for noise reduction and edgeenhancement. We will focus on these applications for the remainder ofthe lecture

Common types of noise:

Salt and pepper noise:contains random

150

occurrences of black andwhite pixels

Impulse noise: containsrandom occurrences ofwhite pixels

Gaussian noise:

variations in intensity drawnfrom a Gaussian normaldistribution

Ideal noise reduction


151/282


151



152/282


152

Practical noise reduction


153/282

Practical noise reduction

How can we smooth away noise in a single image?

153

Example revisited: 2D Convolution

k ( filt )


154/282

mask (average filter)

Effect of average filters


155/282

g

155

Median Filtering


156/282

Median Filtering

The median filter is another digital filtering technique, often used to remove

noise. Median filtering is very widely used in digital image processing because it

preserves edges while removing noise.

Median filters


157/282

It replaces the value of the center pixel with the median of the intensityvalues in the neighborhood of that pixel.

Median filtering is an operation often used in image processing to reduce

"salt and pepper" noise. A median filter is more effective than convolution

when the goal is to simultaneously reduce noise and preserve edges.

Median filters are particularly effective in the presence ofimpulse noise,

also called salt and pepper noise because of its appearance as white and

black dots superimposed on an image.

For every pixel, a 3x3 neighborhood with the pixel as center is considered.

In median filtering, the value of the pixel is replaced by the median of the

pixel values in the 3x3 neighborhood.

Median Filtering Median filtering is useful for removing


158/282

Median filtering is useful for removingnoise but usefully preserves edges.

The median is the central value in arange

Median {4,2,0,1,3,0,5} = ?

Median filtering is a popular low-passfiltering method. Pixel values are sortedand the median (middle value) is output.

Median filtering removes sparse outliers.

Sparse outliers appear as salt andpepper noise in images, i.e., dark pixelsin light areas and light pixels in darkareas. This type of noise was commonin old televisions.

You may use some simple filters in thelaboratory (using matlab). A medianfilter will be used to remove noise.

Passing a 3x3 median filter over

the image pixels shown above

on the right produces the

output on the right.

Notice how the outlier (the 150)

is removed.

Median Filtering Median filtering is useful for removing


159/282

Median filtering is useful for removingnoise but usefully preserves edges.

The median is the central value in arange

Median {4,2,0,1,3,0,5} = 2

Median filtering is a popular low-passfiltering method. Pixel values are sortedand the median (middle value) is output.

Median filtering removes sparse outliers.

Sparse outliers appear as salt andpepper noise in images, i.e., dark pixelsin light areas and light pixels in darkareas. This type of noise was common inanalogue television.

You will use some simple filters in thelaboratory. A median filter will be usedto remove noise.

Passing a 3x3 median filter over

the image pixels shown above

on the right produces the

output on the right.

Notice how the outlier (the 150)

is removed.

Effect of median filters


160/282

160

Mean and Median Filtering


161/282

g

X1 X2 X3

X4 X0 X5

X6 X7 X8

X1 X2 X3

X4 X0 X5

X6 X7 X8

Replace the X0 by the

mean of X0~X8 is

called mean filtering

Replace the X0 by the

median of X0~X8 is

called median filtering

Gaussian filters


162/282

Gaussian filters weigh pixels based on their distance fromthe center of the convolution filter. In particular:

2 2 2( ) /(2 )

[ , ]i j

eh i j

C

162

This does a decent job of blurring noise while preservingfeatures of the image.

What parameter controls the width of the Gaussian?

What happens to the image as the Gaussian filter kernel

gets wider? What is the constant C? What should we set it to?

Effect of Gaussian filters


163/282

163

Comparison: Gaussian noise


164/282

p

164

Comparison: salt and pepper noise


165/282

165


166/282

Image processing/Enhancement (Edge Detection)

Edge detection


167/282

One of the most important uses of imageprocessing is edge detection:

Really easy for humans

Really difficult for computers

167

Fundamental in computer vision

Important in many graphics applications

What is an edge?


168/282

168

Q: How might you detect an edge in 1D?

Gradients


169/282

The gradient is the 2D equivalent of the derivative:

( , ) ,f f

f x yx y

169

Properties of the gradient

Its a vector

Points in the direction of maximum increase off

Magnitude is rate of increase How can we approximate the gradient in a discrete image?

Less than ideal edges


170/282

170

Edge Properties


171/282

Edge has twoproperties:

how steep it is

direction, ie, is it

A(x)

x

or right?

A(x)

x

Edge PropertiesEdge Properties gradientgradient


172/282

Consider a 1-dcontinuous image of

an edge, denoted by

Edge PropertiesEdge Properties--gradientgradient

0

100

200

A(x)

Edge properties canbe obtained from the

gradient = A /x

gradient=dA/dxasx0.

5 10 15 20 25 30-20

0

20

5 10 15 20 25 30-20

0

20

xA

A

x

Edge Properties


173/282

Gradient has two

properties magnitude direction

dx

dAgradient 11

dx

dAgradient 22

Magnitude, or

steepness, given by

|dA/dx|

Direction, left or

right, given by sign ofdA/dx

dxdA

dxdA

dx

dA

dx

dA

12

12

sgnsgn

Edge PropertiesEdge Properties--gradientgradient


174/282

Gradient given by firstderivative dA /d x.

Second derivative,

dge Propertiesdge Properties gradientgradient

5 10 15 20 25 300

100

200 xA

x ,generatestwo peaks at

beginning and end of

edge.

Called ringing.

5 10 15 20 25 30-20

0

20

5 10 15 20 25 30-20

0

20

2

2

dx

Ad

dx

Edge Properties-discrete gradient


175/282

5 10 15 20 25 30

0

100

200

20

iAxA B=[-1 1]

5 10 15 20 25 30-20

0

5 10 15 20 25 30-20

0

20

iiii

iiii

ii

iii

AAAA

AAAA

AAdx

Ad

AAAdx

2

12

112

12

2

1

2

B=[1 -2 1]

Steps in edge detection


176/282

Edge detection algorithms typically proceedin three or four steps:

Filtering: cut down on noise

Enhancement: amplify the difference between

176

e ges an non-e ges Detection: use a threshold operation

Localization (optional): estimate geometry ofedges beyond pixels

22--d Gradient Operatord Gradient Operator


177/282

dy

dA

dx

dAA

22

ji

j

dA/dy

dxdA

dydA

dydx

1-

tannOrientatio

Magn tu e

i

x

Review


178/282

Image Enhancement / (Pre)Processing

Noise reduction

Information Detection

Histograms

Image Segmentation

Image Compression (Next)

Review


179/282

Image Enhancement / (Pre)Processing

Noise reduction

Using Filters (coefficients of the filter mask (hi,j) to be

computed ???)

Information Detection

Edge Detection

Histograms

Image SegmentationThreshold technique

Image Compression (Next)

Review


180/282

Image Enhancement / (Pre)ProcessingNoise reduction

Using Filters (coefficients of the filter mask to be computedusing rectangular, gaussian, triangular techniques etc.)

Avera e Filter mask

Median Filter

Gaussian Filter etc.

Information DetectionEdge Detection

HistogramsImage Segmentation

Threshold technique


181/282


182/282

Image gradient The gradient of an image:


183/282

The gradient points in the direction of most rapid change in intensity

The gradient direction is given by:

how does this relate to the direction of the edge?

The edge strength/magnitude is given by the gradient magnitude

Neighbourhood Operators


184/282

First derivative can be calculated by

convolving with mask ______

Second derivative can be calculated by

________

Edge Properties-discrete gradient


185/282

5 10 15 20 25 30

0

100

200

20

iAxA B=[-1 1]

take derivative

5 10 15 20 25 30-20

0

5 10 15 20 25 30-20

0

20

iiii

iiii

ii

iii

AAAA

AAAA

AAdx

Ad

AAAdx

2

12

112

12

2

1

2

B=[1 -2 1]



186/282

First derivative can be calculated by

convolving with mask B=[-1 1].

Second derivative can be calculated by

- .

Discrete 2-d gradient operator


187/282

jiijiji

ji

AAAx

A

AyxA

,,,1

,,

1

Neighbour hood

operators

ji jijjiiji

jijjiji

AAA

AAAy

A

,,,

,,1,

11

1

j

i

B

Gradient Operators for Images


188/282

Second-order gradient

denoted by 2A.

in an image.

Scalar. 22

2

yxA

Laplacian Operator



189/282

jijijiji

jiijiijii

ji

AAAA

AAAx

A

AyxA

,,1,1,2

,

2

,,12

2

,,

1

2

ii BAA

jijiji

jijijiji

jijjijjij

jijiji

AAA

AAAA

AAAy

A

AAA

,1,2,

,1,1,2,

,

2

,1,2

2

,,1,2

2

2

121

1

2

j

jj

B

BAA



190/282

222

ji

ji

BABA

AAA

010

141

010

ji BBB

B*A

What is Edge-enhancement?


191/282

Physcophysical experiments indicate that an

image with accentuated or crispened edges is

often more subjectively pleasing than the original

image.

Edge enhancement


192/282

Laplacian

Canny edge detector

Laplacian Image


193/282

2AL

010

141B

LaplacianLaplacian


194/282

Add Laplacian to

2Ad

xA5 10 15 20 25 30

0

100

200

20

. A(x)+L

Overshoot below

and above edge.

22

2

2

2

dx

AdxA

dx

Ad

dx

5 10 15 20 25 30-20

0

5 10 15 20 25 30-20

0

20

0 5 10 15 20 25 30

100

200

300

Neighbourhood Operations


195/282

010000

, LyxA

Laplacian :

mask

010151

010

010000

This is enhanced mask BB

(A FILTER)(A FILTER)

Laplacian


196/282

A(x)+L

Edge

Enhancement

x


197/282

Original imageOriginal image enhanced with

laplacian

Edge detection


198/282

original

Edge detector


199/282

thinning

(non-maximum suppression)

Effect of (Gaussian kernel size)


200/282

Canny with Canny withoriginal

The choice of depends on desired behavior

large detects large scale edges

small detects fine features

Gaussian Mask


201/282

Uses Gaussian Mask

Sigma is a value chosen by DESIGNER

22 2/ xe

X and Y are the distances awayfrom target pixel

Range from positive to negative

Mask Radius

Example: Gaussian


202/282


203/282

Image Enhancement (Histogram)

Image Histograms


204/282

The histogram of an image shows us thedistribution of grey levels in the image

Massively useful especially in segmentation

Grey Levels

Freq

uencies

Histogram Examples (cont)


205/282

talImageProcessing(2002)

Imagestaken

fromGonzalez&Woods,Dig

i



206/282


Imagestaken

fromGonzalez&Woods,Digi



207/282


Imagestaken




208/282


Imagestaken




209/282

A selection of images andtheir histograms

Notice the relationships


their histograms

Note that the high contrast

image has the most

evenly spaced histogramImagestaken


VARIANCE and STANDARD DEVIATION


210/282

of the histogram tell us about theaverage contrast of the image !

So, from previous figure:

higher the VARIANCE = higher the

STANDARD DEVIATION,

higher will be the images contrast !


211/282

Image Enhancement (Image Segmentation)

Image Segmentation


212/282

Segmentation divides an image into itsconstituent regions or objects.

Segmentation of images is a difficult task in

. . Segmentation allows to extract objects in

images.

What it is useful for


213/282

segmenting the image, gives the contours of objects, whichcan be extracted using edge detection and/or border

following techniques.

Shape of objects can be described.

, , . Image segmentation techniques are extensively used in

similarity searches, e.g.:

http://elib.cs.berkeley.edu/photos/blobworld/

Segmentation Algorithms


214/282

Segmentation algorithms are based on one of twobasic properties of color, gray values, or texture:discontinuity and similarity.

First category is to partition an image based on

,image.

Second category are based on partitioning an imageinto regions that are similar according to apredefined criteria. Histogram thresholding approachfalls under this category.

Clustering in Color Space


215/282

1. Each image point is mapped to a point in a color space,e.g.:

Color(i, j) = (R (i, j), G(i, j), B(i, j))

It is many to one mapping.

2. The points in the color space are grouped to clusters.

3. The clusters are then mapped back to regions in the image.

Displaying objects in the Segmented Image


216/282

The objects can be distinguished by assigningan arbitrary pixel value or average pixel value

to the pixels belonging to the same clusters.

Thus, one needs clustering algorithms

for image segmentation.


217/282

Homework (still preparing):

Implement in Matlab and test on some example images the

clustering in the color space.

Use Euclidean distance in RGB color space.

You can use k-means, PAM, or some other clustering algorithm.Links to k-means, PAM, data normalization

Test images: rose, plane, car, tiger, landscape


218/282

Gray Scale Image Example


219/282

Image of a Finger Print with light background

Histogram


220/282

Segmented Image


221/282

Image after Segmentation

Thresholding Bimodal Histograms


222/282

Basic Global Thresholding:1)Select an initial estimate for T

2)Segment the image using T. This will produce two groups ofpixels. G1 consisting of all pixels with gray level values >T andG2 consisting of pixels with values


223/282

Image of rice with black background


224/282

Basic Adaptive Thresholding:

Images having uneven illumination makes it difficult to


225/282

g gsegment using histogram,this approach is to divide the original image

into sub images

and use the thresholding process

to each of the sub ima es.

Multimodal Histogram


226/282

If there are three or more dominant modes in theimage histogram, the histogram has to be partitionedby multiple thresholds.

u t eve t res o ng c ass es a po nt x,y asbelonging to one object class

if T1 < (x,y) T2and to the background

if f(x,y)


227/282

227

Low noise image Thresholded at T=some value

Greylevel thresholding


228/282

0.02

p(x)

228

0.00

0.01

x

T

Object

Background

Greylevel thresholding


229/282

Comparison with different thresholding

229

High noise circle image Optimum threshold

Relaxation - 20 i

Example: Image Enhancement


230/282

Example: Image Enhancement


231/282


232/282

Assi nment

Color Image


233/282

Given Image

Statement


234/282

To perform an imagesegmentation:

1. Skin color

Assignment to besubmitted:

latest by March 28,

.

Tools to be used:

Matlab

Report:

Matlab code

Output images

(results) with captions

Example result: Segmented Image


235/282

Example result: Segmented image, skin color is shown


236/282

Assi nment

Review

h / ( )


237/282

Image Enhancement / (Pre)ProcessingNoise reduction

Using Filters (coefficients of the filter mask to be computed usingrectangular, gaussian, triangular techniques etc.)

Average Filter (mask)

Median Filter

.

Information DetectionEdge Detection (to get salient features)

High Pass Filters

Laplacian mask

HistogramsDistribution of intensity levels

Image Segmentation (to extract different objects)Threshold technique

Next: Image Compression


238/282

Image Compression

Again: An Image is...

52 6B 8C 6B 73 5A 63

5A 6B 73 84 84 73 73

5A 84 84 73 5A 84 84

made up of pixels


239/282

5A 84 84 73 5A 84 84

6B 6B 8C 5A 42 4A 42

42 6B 6B 5A A5 DE D6

5A 6B 5A 42 F7 F7 F7

84 5A 6B 31 DE F7 F7

45 71 82 7D 7D 55 5D

55 75 7D 75 71 6D 65

55 75 7D 6D 61 75 75

each pixel is a

combination of Red


amplitudes

45 71 75 55 41 41 38

34 51 55 55 9E CF CF

51 55 51 38 FF FF FF

71 51 59 34 DB FF FF

42 7B 7B 9C 7B 63 4A

63 7B 7B 73 73 63 63

63 73 7B 63 63 73 73

42 73 73 63 42 42 39

31 4A 4A 63 94 DE DE


73 42 5A 31 BD FF FF

Represented by I(x,y) of the two spatial coordinates of

the image plane.

I(x,y) is the intensity of the image at the point (x,y)on the image plane.


Image Compression: JPEG

S S


240/282

Summary: JPEG Compression

DCT

Quantization

Zi -Za Scan

Sources: The JPEG website:

http://www.jpeg.org

240

RLE and DPCM

Entropy Coding

Why Compression?

The compression ratio of lossless methods (e.g., Huffman,Arithmetic, LZW) is not high enough for image and video

i


241/282

compression. JPEG uses transform coding, it is largely based on the

following observations:

Observation 1: A large majority of useful image contents changerelatively slowly across images, i.e., it is unusual for intensity

241

va ues to a ter up and down severa times in a sma area, orexample, within an 8 x 8 image block.A translation of this fact into the spatial frequency domain,implies, generally, lower spatial frequency components containmore information than the high frequency components whichoften correspond to less useful details and noises.

Observation 2: Experiments suggest that humans are moreimmune to loss of higher spatial frequency components than lossof lower frequency components.

Entropy Coding: DC Components (Contd..)

SIZE C d C d

DC components are differentially coded as (SIZE,Value)

The code for a SIZE is derived from the following table

Ex mpl : If DC mp t is 40


242/282

SIZE CodeLength

Code

0 2 00

1 3 010

2 3 011

Example: If a DC component is 40and the previous DCcomponent is 48. Thedifference is -8. Therefore itis coded as:

1010111

0111: The value for re resentin 8

242

4 3 101

5 3 110

6 4 1110

7 5 11110

8 6 111110

9 7 1111110

10 8 11111110

11 9 111111110

(see Size_and_Value table)101: The size from the same table

reads 4. The correspondingcode from the table at left is101.

Huffman Table for DC component SIZE field


243/282

A TV till i (f ) i I di h 720 483 i l

Why Compression?


244/282

A TV still image (frame) in India has 720 x 483 pixels

Hence the total amount of data for the image

= 720 x 483 x 3 bytes

= 1043280 bytes

Considering there are 25 such frames in a second, the data rate

= 1043280 x 8 x 25 bits/sec

= 208656000 bits/sec

= 208 Mbits/sec !!!

But, the cable that comes to our house can carry data rates of upto only 40Mbits/sec :-((

Compression

Store more images


245/282

Store more images Transmit images in less time

JPEG (Joint Photographic Experts Group) Popular standard format for representing digital images in a

compressed form

Provides for a number of different modes of operation Mode used in this class provides high compression ratios using DCT

(discrete cosine transform)

Image data divided into blocks of 8 x 8 pixels

3 steps performed on each block

DCT

Quantization

Huffman encoding

Compression

L l l ( id l d)


246/282

Lossless or lossy(widely used)

Color Transform (RGB to YCbCr)

Downsampling (4:2:2 to 4:2:0)

Noise reduction

Image captured

(R,G,B)

and digitised

246

Color Transform RGB to YCbCr

for Video class

C i f RGB


247/282

Conversion from RGB: Y=0.299(R-G) + G + 0.114(B-G)

Cb=0.564(B-Y)

Cr=0.713(R-Y)

247

The Matrix form:

0.299 0.587 0.114

0.168636 0.232932 0.064296

0.499813 0.418531 0.081282

Y R

Cb G

Cr B

Downsampling in Y-Cr-Cb space

for Video class


248/282

Compression



249/282


Noise reduction

Image Enhancement

Image captured

(grayscale)

and digitised

249

Example: JPEG Coding

YCbCr

DCTf(i, j)

8 x 8

F(u, v)

8 x 8

QuantizationFq(u, v) Steps Involved:

1. Discrete Cosine

Transform of each 8x8i l


250/282

Coding

QuantTables

8 x 8 8 x 8

Zi Za

Transform of each 8x8pixel arrayf(x,y) TF(u,v)

2. Quantization using atable or using a constant

3. Zig-Zag scan to exploitredundanc

250

DPCM

RLC

EntropyCoding

HeaderTables

Data

Scan 4. Differential Pulse CodeModulation(DPCM) onthe DC component andRun length Coding of theAC components

5. Entropy coding(Huffman) of the final

output

Exercise - DCT : Discrete Cosine Transform

DCT converts the information contained in a block(8x8) of pixels fromspatialdomain to thefrequencydomain.

A simple analogy: Consider a unsorted list of 12 numbers between 0 and3 > (2 3 1 2 2 0 1 1 0 1 0 0) C id t f ti f th li t


251/282

A simple analogy: Consider a unsorted list of 12 numbers between 0 and3 -> (2, 3, 1, 2, 2, 0, 1, 1, 0, 1, 0, 0). Consider a transformation of the listinvolving two steps: (1.) sort the list (2.) Count the frequency ofoccurrence of each of the numbers ->(????). : Through thistransformation we lost the spatial information but captured thefrequency information.

There are other transformations which retain the spatial information.

251

E.g., Fourier transform, DCT etc. Therefore allowing us to move back andforth between spatial and frequency domains.

1-D DCT: 1-D Inverese DCT:

F()a(u)2 f(n)cos(2n1)

16n 0

N1

a(0) 12

a(p) 1 p 0

f'(n)a(u)2 F()cos(2n1)

16 0

N1

Exercise - DCT : Discrete Cosine Transform

DCT converts the information contained in a block(8x8) of pixels fromspatialdomain to thefrequencydomain.

A simple analogy: Consider a unsorted list of 12 numbers between 0 and3 > (2 3 1 2 2 0 1 1 0 1 0 0) Consider a transformation of the list


252/282

A simple analogy: Consider a unsorted list of 12 numbers between 0 and3 -> (2, 3, 1, 2, 2, 0, 1, 1, 0, 1, 0, 0). Consider a transformation of the listinvolving two steps: (1.) sort the list (2.) Count the frequency ofoccurrence of each of the numbers ->(4,4,3,1). : Through thistransformation we lost the spatial information but captured thefrequency information.

There are other transformations which retain the spatial information.

252

E.g., Fourier transform, DCT etc. Therefore allowing us to move back andforth between spatial and frequency domains.

1-D DCT: 1-D Inverese DCT:

F()a(u)2 f(n)cos(2n1)

16n 0

N1

a(0) 12

a(p) 1 p 0

f'(n)a(u)2 F()cos(2n1)

16 0

N1

transforming the data from the spatial domain to spatial

frequency domain

Transform coding for images


253/282

Discrete Cosine Transform -

1-D DISCRETE COSINE TRANSFORM

DCT

1 )12()()()(

N uxfC


254/282

1

0 2)12(cos)()()(

N

x NuxxfuauC

,,,

1,,1

2

01

)(

NuN

uN

ua

2-D DISCRETE COSINE TRANSFORM

DCT


255/282

N

vy

N

uxyxfvauavuC

N

x

N

y 2

)12(cos

2

)12(cos),()()(),(

1

0

1

0

Nvy

NuxvuCvauayxf

u v 2cos

2cos),()()(),(

0 0

1,,1,0, Nvu

2-D DCT

Images are two-dimensional; How do you perform 2-D DCT? Two series of 1-D transforms result in a 2-D transform as demonstrated in

the figure below


256/282

1-D 1-D

j)f(i,

256

Row-wise Co umn-wise

8x8 8x8 8x8

v)F(u,

r F(0,0) is called the DC component and the rest of F(i,j) are calledAC components


257/282

Quality


258/282

258


259/282

259


260/282

DCT step

Transforms original 8 x 8 block into a cosine-frequency domain


261/282

Transforms original 8 x 8 block into a cosine-frequency domain Upper-left corner values represent more of the essence of the image

Lower-right corner values represent finer details Can reduce precision of these values and retain reasonable image quality

orwar ormu a C(h) = if (h == 0) then 1/sqrt(2) else 1.0 Auxiliary function used in main function F(u,v)

F(u,v) = x C(u) x C(v) x=0..7 y=0..7 fxy x cos((2u + 1)u/16) x cos((2y + 1)v/16) Gives encoded pixel at row u, column v

fxy is original pixel value at row x, column y

IDCT (Inverse DCT) Reverses process to obtain original block (not needed for this design)

What happens when DCT is

performed?


262/282

Example: 2D signal


263/282

Energy concentrated

in low-frequency

region (using DCT)

Basis of DCT


264/282

2-D Basis Functions N=4

0 1 2 3

u

v


265/282

0

1

2

3

Quantization step

Achieve high compression ratio by reducing imagequality


266/282

quality Reduce bit precision of encoded data

Fewer bits needed for encoding One way is to divide all values by a factor of n

Simple right shifts can do this

Dequantization would reverse process fordecompression

1150 39 -43 -10 26 -83 11 41

-81 -3 115 -73 -6 -2 22 -5

14 -11 1 -42 26 -3 17 -38

2 -61 -13 -12 36 -23 -18 5

44 13 37 -4 10 -21 7 -8

36 -11 -9 -4 20 -28 -21 14

-19 -7 21 -6 3 3 12 -21

-5 -13 -11 -17 -4 -1 7 -4

144 5 -5 -1 3 -10 1 5

-10 0 14 -9 -1 0 3 -1

2 -1 0 -5 3 0 2 -5

0 -8 -2 -2 5 -3 -2 1

6 2 5 -1 1 -3 1 -1

5 -1 -1 -1 3 -4 -3 2

-2 -1 3 -1 0 0 2 -3

-1 -2 -1 -2 -1 0 1 -1

After being decoded using DCT After quantization

Divide each cells

value by 8

Quantization

Why? -- To reduce number of bits per sample

F(u,v) = round(F(u,v)/q(u,v))

Example: 101101 = 45 (6 bits).Truncate to 4 bits: 1011 = 11. (Compare 11 x 4 =44 against 45)


267/282

( p g )Truncate to 3 bits: 101 = 5. (Compare 8 x 5 =40 against 45)Note, that the more bits we truncate the more precision we lose

Quantization error is the main source of the Lossy Compression.

Uniform Quantization:

267

q(u,v) is a constant.

Non-uniform Quantization -- Quantization Tables

Eye is most sensitive to low frequencies (upper left corner in frequencymatrix), less sensitive to high frequencies (lower right corner)

Custom quantization tables can be put in image/scan header.

JPEG Standard defines two default quantization tables, one each forluminance and chrominance.

Fact : Human Visual System (HVS) does not distinguish fine details

Thresholding & Quantisation


268/282

y ( ) gbelow a certain luminance level

Fact : HVS is less sensitive to high spatial frequency changes

Therefore,

replace values below a certain threshold (a function of

frequency) by 0

quantize the resultant values with an accuracy decreasing withincreasing spatial frequencies

The result of Thresholding and

Quantization


269/282

The zig-zag scan


270/282

(129) ; 1 ; 0 ; 0 ; 1 ; 1 ; 1 ; 1 ; 0 ; 0 ; -1 ; -1 ; 0 ; -1 ; 2 ; 1 ; -1 ; 0 ; 0 ; 0 ; 1 ; -1 ;

The resultant sequence


271/282

0 ; 1 ; 1 ; 0 ; 0 ; 0 ; 0 ; 0 ; -1 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 1 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ;0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 1 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0

Huffman Coding

Huffman coding is the most popular technique


272/282

Huffman coding is the most popular techniquefor removing coding redundancy.

Unique prefix property

Instantaneous decodin ro ert

Optimality

JPEG(fixed, not optimal)

272


273/282

Huffman encoding example

Pixel frequencies on left

Pixel value 1 occurs 15 times

Pixel value 14 occurs 1 time

Build Huffman tree from bottomup


274/282

Create one leaf node for eachpixel value and assignfrequency as nodes value

Create an internal node byoinin an two nodes whose

23

5

6

4-1 15x

0 8x

-2 6x

-1 00

0 100

-2 110

Pixel

frequenciesHuffman tree

Huffman

codes

sum is a minimal value This sum is internal nodesvalue

Repeat until complete binarytree

Traverse tree from root to leaf toobtain binary code for leafs pixel

value Append 0 for left traversal, 1

for right traversal

Huffman encoding is reversible

No code is a prefix of anothercode

144

5 3 2

1 0 -2

-1

-10 -5 -3

-4 -8 -9614

1 1

2

1 1

2

1

22

4

3

5

4

65

9

5

1

0

5

1

15

1

4

6

17

8

181

5

9

1

x

2 5x3 5x

5 5x

-3 4x

-5 3x

-10 2x

144 1x

-9 1x

-8 1x

-4 1x

6 1x14 1x

2 1110

3 1010

5 0110

-3 11110

-5 10110

-10 01110

144 11111

-9 11111

-8 10111

video technology 2013-14 - minor ii

Documents