computer vision and graphics (ee2031) - university of...
TRANSCRIPT
Computer Vision and Graphics (ee2031) Digital Image Processing I
Dr John Collomosse [email protected]
Centre for Vision, Speech and Signal Processing University of Surrey
Learning Outcomes After attending this lecture, and doing the reading and labwork, you should be able to:
• Describe the basic framework for performing linear filtering on a digital image (convolution)
• Implement image blurring and sharpening operations.
• Compare and contrast several low-pass filters and describe their operation in the context of image processing.
• Define the Fourier transform in both continuous and discrete terms for the 1D and 2D cases.
• Describe the convolution theorem, its links to the Fourier transform, and its implications for digital image processing.
Credit: Some images in these slides from Noah Snavly (Cornell). David Lowe (Columbia). Steve Seitz (Washington). Various creative commons sources.
Further reading:
What is an Image?
An image is a rectangular grid (raster) of picture cels (= pixels)
= 255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 20 0 255 255 255 255 255 255 255
255 255 255 75 75 75 255 255 255 255 255 255
255 255 75 95 95 75 255 255 255 255 255 255
255 255 96 127 145 175 255 255 255 255 255 255
255 255 127 145 175 175 175 255 255 255 255 255
255 255 127 145 200 200 175 175 95 255 255 255
255 255 127 145 200 200 175 175 95 47 255 255
255 255 127 145 145 175 127 127 95 47 255 255
255 255 74 127 127 127 95 95 95 47 255 255
255 255 255 74 74 74 74 74 74 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
Typically 1 byte (8 bits) per pixel. 0=black, 255=white.
Greyscale (Y)
Colour Images
An image is a rectangular grid (raster) of picture cels (= pixels) Colour images use 3 rasters (Red, Green, Blue)
Y= 0.30 (R) + 0.59 (G) + 0.11 (B)
For this introduction we process a colour image simply by processing R, G and B rasters independently, as you would a greyscale image.
Image as a function
We can think of a greyscale image as a function R2 → R
=255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 20 0 255 255 255 255 255 255 255
255 255 255 75 75 75 255 255 255 255 255 255
255 255 75 95 95 75 255 255 255 255 255 255
255 255 96 127 145 175 255 255 255 255 255 255
255 255 127 145 175 175 175 255 255 255 255 255
255 255 127 145 200 200 175 175 95 255 255 255
255 255 127 145 200 200 175 175 95 47 255 255
255 255 127 145 145 175 127 127 95 47 255 255
255 255 74 127 127 127 95 95 95 47 255 255
255 255 255 74 74 74 74 74 74 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
f(x,y) = the intensity of pixel (x,y)
A digital image f(x,y) only has compact support
Image as a function
When we “do image processing” we can think of transforming the function f(x,y) to form a new function g(x,y).
These lectures will focus on a class of transform called
Linear transforms Because they:
1) are useful (e.g. Noise reduction, finding edges, sharpen detail)
2) can be performed efficiently via convolution
g (x,y) = f (x,y) + 20 g (x,y) = f (-x,y)
Noise reduction
Suppose we take a photo of a stationary scene.
f(x,y) = I(x,y) + N(0,σ)
If noise obeys central limit theorem, then we can take many photos and average them to obtain a less noisy result.
1 10 100 1000
image = signal + noise
Gaussian
In that example we used a Gaussian distribution to model noise
N(µ,σ)
A Gaussian distribution is a generalised normal distribution with any mean (here µ=0) and standard deviation (here σ=10).
N(0,1) is the Normal distribution
Noise reduction
What if we only have one photo (i.e. typical image processing)?
A pixel is very similar to its neighbours (spatial coherence)
Average groups of neighbouring pixels together.
The window centres on each pixel and computes mean value of pixels beneath it. The result is written to a new image.
0 4 0 0 0 0
0 0 0 0 0 0
0 0 0 3 0 0
0 0 0 0 0 0
0 2 0 0 0 0
0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 1 0 0
0 0 0 0
Input f(x,y) Output g(x,y)
Another way of saying the same...
Consider a window containing a set of values.
For each pixel in the input:
1. Window values are multiplied with image beneath
2. The sum of these products is written to output image.
e.g. (0 x 1) + (4 x 1) + (0 x 1) +... = 4/9
0 4 0 0 0 0
0 0 0 0 0 0
0 0 0 3 0 0
0 0 0 0 0 0
0 2 0 0 0 0
0 0 0 0 0 0
1 1 0 0
0 0 0 0
0 1 0 0
0 0 0 0
1 1 1
1 1 1
1 1 1
1/9 x
Convolution
Input f(x,y) Output g(x,y)
Terminology Image f(x,y) was transformed into g(x,y) via convolution.
Each pixel was “replaced” by a linear combination of its neighbours. This is called linear filtering.
The weightings for each pixel were defined by the window
Input f(x,y)
1 1 1
1 1 1
1 1 1
Output g(x,y)
* =
“Window” = “Template” = “Kernel” = “Filter” = “Mask”=...
Convolution operator, not multiplication!
i.e. the prescription for a linear filter is the values in the window
Closer look at the box filter
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9 “Box filter” / “Box blur”
“Mean filter”
Any filter is itself a signal h(x,y)
Can be padded with zeros to match image size
Example of Box Filter
Blocky / square artifacts
More on this later...
Closer look at convolution process
Convolu,on expressed using * operator
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9
side 2k+1
i.e. k=1
Other choices of filter
We can put different values in the window to create different effects (i.e. produce different linear filters):
0 0 0
0 1 0
0 0 0
Original Identical image
* =
Other choices of filter:
We can put different values in the window to create different effects (i.e. produce different linear filters):
0 0 0
0 0 1
0 0 0
Original Shifted left By 1 pixel
* =
Other choices of filter:
Original
1 1 1
1 1 1
1 1 1
0 0 0
0 2 0
0 0 0 - Sharpening filter
(accentuates edges)
= *
We can put different values in the window to create different effects (i.e. produce different linear filters):
Example of Image Sharpening
More on blurring
We get better results blurring with a “Gaussian” filter vs. the “box” filter.
0.11 0.11 0.11
0.11 0.11 0.11
0.11 0.11 0.11
0.06 0.13 0.06
0.13 0.24 0.13
0.06 0.13 0.06
3x3 box filter
3x3 Gaussian
Original
Box Gaussian
2D Gaussian
0.06 0.13 0.06
0.13 0.24 0.13
0.06 0.13 0.06
In this case x and y are measured as offsets from the centre of the template. The standard deviation is fixed. The window truncates the Gaussian function beyond a certain distance.
Convolution - Topics
Convolution is a versatile filtering mechanism, but:-
1) As described, it is slow O(nm) and will take ages to process modern digital images e.g. multi-megapixel
2) We don’t yet understand why particular sets of values in the filters have the result they do....
To answer both we need to understand Fourier’s theorem.
Fourier’s Theorem
“Any periodic signal can be synthesised by summing (possibly infinitely) many sine and cosine waves of various amplitudes and frequencies”
(or equivalently: many cosine waves with various phase, ampltiude and frequency)
Fourier Synthesis Example Adding sine waves of increasing frequency (with decreasing amplitude) to make a square wave:
y = sin(t); y = sin(t) + sin(3*t)/3; y = sin(t) + sin(3*t)/3 + sin(5*t)/5 + sin(7*t)/7 + sin(9*t)/9;
Fourier Transform (Terminology) The Fourier Transform (FT) is a piece of mathematics that decomposes a real signal into its individual frequency components.
Spatial domain
Frequency domain
FT ( Analysis)
IFT ( Synthesis)
You can convert a signal in the spatial domain to the frequency domain, and back again, with no loss of information.
Fourier Transform (Continuous) 1D Fourier transform: 1D Inverse Fourier Transform
f(x) is the signal.
F(u) is the “response” at frequency ‘u’.
The response comprises a magnitude (r) and a phase (φ).
Complex numbers
Fourier Transform (Continuous) 1D Fourier transform: 1D Inverse Fourier Transform
f(x) is the signal.
F(u) is the “response” at frequency ‘u’. Complex numbers
Normalisation required because u is angular frequency (u=2πv)
Discrete Fourier Transform
1D Discrete Fourier Transform: 1D Inverse DFT:
Because a digital signal has compact support, the DFT is used on digital signals. It is near-identical in form to continuous FT.
The Fast Fourier Transform (FFT) is a fast way of computing DFT, that works only when N is a power of 2. (Cooley et al. 1965)
Demo
2D DFT FT and DFT also work over 2D (or any-D) signals. The 2D case is very important, because images are 2D signals - recall f(x,y)
2D DFT:
2D IDFT:
Recall that converting to/from frequency domain is loss-less.
2D DFT / IDFT allows us to manipulate image in frequency domain.
Image Frequency domain
Manipulate frequencies
Result image
FT IFT
2D DFT – Implementation The 2D DFT is a “separable” transform.
Separability makes 2D DFT fast to compute.
If image has side length in powers of 2, can use FFT instead of DFT to speed up even further (equivalently you can pad the image with a border of zeros until it has sides of power 2).
... is computable by running 1D DFT over each image row, and then running each column of the result through its own 1D DFT.
2D DFT What does an image “look” like in frequency domain?
Visualising |F(u,v)| (i.e. amplitude of frequencies)
Demo
Origin (i.e. dc component)
Lower frequencies
Higher frequencies
f(x,y)
F(u,v)
2D DFT Simpler examples
What will |F(u,v)|
look like?
f(x,y)
F(u,v)
2D DFT Simpler examples
Although the result is predominantly what you would expect, there are additional high frequencies introduced.
This is because the signal isn’t periodic (most images aren’t)
What will |F(u,v)|
look like?
2D DFT Image processing by manipulating frequency domain F(u,v):-
“Ideal” Low-pass filter “Ideal” High-pass filter
Recall: Convolution
Convolu,on expressed using * operator
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9
side 2k+1
i.e. k=1
Slow O(nm)
Convolution Theorem Convolution can be performed faster by converting both the image and filter into the frequency domain (2D DFT), multiplying them together, and converting the result back (2D IDFT).
f(x,y) image FT
FT h(x,y) filter
IFT
F[.] indicates FT of function.
By considering convolution in this way, we can also understand why filters behave the way they do.
Recall: Why is Gaussian better?
We get better results blurring with a “Gaussian” filter vs. the “box” filter.
0.11 0.11 0.11
0.11 0.11 0.11
0.11 0.11 0.11
0.06 0.13 0.06
0.13 0.24 0.13
0.06 0.13 0.06
3x3 box filter
3x3 Gaussian
Original
Box Gaussian
Fourier Analysis of Common Filters Visualisations of 1D box and Gaussian filters.
FT
FT
Fourier Analysis of Common Filters
Visualisations of 2D box and Gaussian filters.
FT
FT
Gaussian
Box Sinc
Gaussian
Observations
The “Ideal” low pass filter is “sinc”: sinc (x) = sin(x)/x
The FT of a box is a sinc scaled according to the size of the box.
The FT of a Gaussian of σ is a Gaussian of 1/σ
The opposite holds too (i.e. FT of a sinc is a box, etc.).
Recall: Box vs. Gaussian blur
Can you explain the artifacts in the box filtered image?
Original
Box Gaussian
Box
Gaussian
Question
How do we produce this “ideal” low-pass filtering scenario:
f(x,y) image FT
FT h(x,y) filter
IFT
?
Answer: Use 2D sinc filter (“ideal low-pass filter”) But sinc is an infinite series and thus cannot be represented in digital images, because they have compact support
Ideal low-pass filter
Sinc is an oscillating, infinite series and unsuitable for digital images, because they have compact support
A truncated sinc signal in spatial filter creates artifacts in the frequency domain and thus ringing artifacts in the image.
FT
FT
IFT
Gaussian low-pass filter
A Gaussian does not have this problem. Although it is an infinite series it does not oscillate, and is “well behaved”
(FT of a Gaussian σ is a Gaussian 1/σ).
So, 1/σ determines the bandwidth of the frequencies passed.
FT
IFT
FT
Back to sharpening:
Original
1 1 1
1 1 1
1 1 1
0 0 0
0 2 0
0 0 0 - Sharpening filter
(accentuates edges)
= *
unfiltered
filtered
Back to sharpening What does the blurring take out?
original smoothed (5x5)
–
detail
=
sharpened
=
original detail
+ α
Source: S. Lazebnik
Boost detail and add it back :
Sharpening
Gaussian scaled impulse Laplacian of Gaussian
image blurred image
1 1 1 1 1 1 1 1 1
0 0 0 0 2 0 0 0 0 -
Fourier analysis - LoG
Laplacian of Gaussian (LoG)
FT
Similar to Gaussiam, the FT of LoG is a LoG. What will this do to the high frequencies?
Example of Image Sharpening
Summary After attending this lecture, and doing the reading and labwork, you should be able to:
• Describe the basic framework for performing linear filtering on a digital image (convolution)
• Implement image blurring and sharpening operations.
• Compare and contrast several low-pass filters and describe their operation in the context of image processing.
• Define the Fourier transform in both continuous and discrete terms for the 1D and 2D cases.
• Describe the convolution theorem, its links to the Fourier transform, and its implications for digital image processing.