cs 4487/9587 algorithms for image analysis

The University of

Ontario

CS 4487/9587

Algorithms for Image Analysis

Basic Image Segmentation

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

CS 4487/9587 Algorithms for Image Analysis Basic Image Segmentation

Segmentation examples• unsupervised: background subtraction, recognition• supervised: photo-shop, medical image analysis

Segmentation features and “naïve” methods• intensities, colors ← thresholding, likelihood ratio test • contrast edges ← region growing, watersheds

Clustering techniques and segmentation• parametric methods: K-means, GMM• non-parametric: mean-shift• RGB and RGB+XY spaces

Other readings: Sonka at.al. Ch. 5 Gonzalez and Woods, Ch. 10

Szeliski, Sec 5.3

Szeliski, Sec 5.2

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Goal:

find coherent “blobs” or specific “objects”lower level tasks

(e.g. “superpixels”)higher level tasks

(e.g. cars, humans, or organs)large grey area

in-between

accurate boundary delineation is often required

Segmentation

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Simplest way to define blob coherence is as similarity in brightness or color:

The tools become blobsThe house, grass, and sky make different blobs

Coherent “blobs”

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioWhy is this useful?

AIBO RoboSoccer(VelosoLab)

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioIdeal Segmentation

can recognize objectswith knownsimplecolormodels

???

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Result of a segmentation method(first learn how to get this, then how to get better results)

even if knownsimplecolor

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioBasic ideas

intensities, colors ← thresholding, likelihood ratio test

contrast edges ← region growing, watersheds

basic features basic (naïve) methods

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioBasic ideas

intensities, colors ← thresholding, likelihood ratio test

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

(segmentation ← intensities/colors)Thresholding

Basic segmentation operation:mask(x,y) = 1 if im(x,y) > T mask(x,y) = 0 if im(x,y) < T

T is threshold• user-defined• or automatic

Same as histogram partitioning:

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioSometimes works well…Virtual colonoscopy,bronchoscopy, etc.

from real device to non-invasive virtual test

a) threshold CT volume -> binary mask

b) extract surface mash from binary mask(fast marching cubes method)

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioSometimes works well…

TThresholding could be derived as

statistical decision: likelihood ratio test

)()(

log:0

1

p

pp IP

IPr

0pr

0pr pixel p is objectpixel p is background

P1 and P2 are object and background

known color models

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioSometimes works well…

Thresholding could be derived as statistical decision: likelihood ratio test

)()(

log:0

1

p

pp IP

IPr

0pr

0pr pixel p is objectpixel p is background

P1 and P2 are object and background

known color models

),( 11 NP

),( 22 NP

T

Example: assume known probability distributions

μ1μ2

TIr 0

221

T

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

- =I= Iobj - Ibkg

Sometimes works well…

background subtraction

T

),0(1 NP

UP 2

0 255

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

- =I= Iobj - Ibkg

Threshold intensities below T problems when color models have overlapping support

Sometimes works well… ?

background subtraction

),0(1 NP

UP 2

T0 255

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario…but more often not

Adaptive thresholding

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioBasic ideas

contrast edges ← region growing, watersheds

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

(segmentation ← contrast edges) Region growing

Method stops at contrast edges

• Start with initial set of pixels K (initial seed(s))• Add to pixels p in K their neighbors q if |Ip-Iq| < T • Repeat until nothing changes

contrast edges

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioRegion growing

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

What can go wrongwith region growing ?

Region growth may “leak” through a single week spot

in the boundary

http://www.uwo.ca/

http://www.uwo.ca/

The University of


- =I= Iobj - Ibkg

| Ip|

Breadth First Search (seeds) :| Ip| < T

seeds

http://www.uwo.ca/

http://www.uwo.ca/

The University of


See region leaks into sky due to a weak boundary

between them

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioWatersheds

1. gradient magnitudes

image

2. find catchment basins

3. copy over original image

need tricks to build dambs closing gaps

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Due to Pedro Felzenszwalb and Daniel Huttenlocher

Motivating example

This image has three perceptually distinct regions

difference along border between A and B is less then differences within C

A CB Q: Where would image thresholding fail?

Q: Where would region growing fail?

A: Region A would be divided in two sets and region C will be split into a large number of arbitrary small subsets A: Either A and B are merged or region C is split into many small subsets Also, B and C are merged

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioTowards better segmentation

- Formulate segmentation quality function (objective or energy)

E(S) = C(S) + B(S) +…

- Optimize = find the best solution S

(soon)

Combine color and boundary edge information. How?

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

First, more complex decisions in color space

How to move away from manual decision boundaries

(i.e. user-set threshold) in color space ? some standard solutions:

1. use known probability appearance models, if

available (leads to likelihood ratio tests, as shown earlier)

a. K-means, GMM (parametric)b. mean shift, medoid-shift (non-parametric)c. kernel-K-means, normalized cuts (non-parametric)

2. automatic clustering methodsyields complex decision boundaries

0)()(

log0

1 p

p

IPIP

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Decision boundaries in feature spaces

color quantizationsuperpixels

should break colors (in RGB or LUV space) into multiple clusters

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

General Grouping or Clustering(a.k.a. unsupervised learning)

• Have data points (samples, also called feature vectors, examples, etc. ) x1,…,xn

• Cluster similar points into groups

• points are not pre-labeled• think of clustering as ‘discovering’ labels

horror movies

documentaries

sci-fi movies

slides from Olga Veksler

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

How does this Relate to Image Segmentation?

• Represent image pixels as feature vectors x1,…,xn

• For example, each pixel can be represented as• intensity, gives one dimensional feature vectors • color, gives three-dimensional feature vectors• color + coordinates, gives five-dimensional feature vectors

• Cluster them into k clusters, i.e. k segments

84

2 5

5 8

32

7

92

4 7

1 3

88

6

95

4 2

3 9

14

4

input image feature vectors for clustering based on color

[9 4 2] [7 3 1] [8 6 8]

[8 2 4] [5 8 5] [3 7 2]

[9 4 5] [2 9 3] [1 4 4]

RGB (or LUV) space clustering

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

How does this Relate to Image Segmentation?

84

2 5

5 8

32

7

92

4 7

1 3

88

6

95

4 2

3 9

14

4

input imagefeature vectors for clustering based on

color and image coordinates

[9 4 2 0 0] [7 3 1 0 1] [8 6 8 0 2]

[8 2 4 1 0] [5 8 5 1 1] [3 7 2 1 2]

[9 4 5 2 0] [2 9 3 2 1] [1 4 4 2 2]

RGBXY (or LUVXY) space clustering

• Represent image pixels as feature vectors x1,…,xn

• For example, each pixel can be represented as• intensity, gives one dimensional feature vectors • color, gives three-dimensional feature vectors• color + coordinates, gives five-dimensional feature vectors

• Cluster them into k clusters, i.e. k segments

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means Clustering: Objective Function• Probably the most popular clustering algorithm

• assumes the number of clusters is given - k• optimizes (approximately) the following objective function for variables Di and µi

k

i Dxik

i

xSSEE1

2

D1 D2

D3

3

1

2

SSE + +

sum of squared errors from cluster center µi

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means Clustering: Objective Function

D1 D2

D3

3

1

2

SSE + +

D1

D2

D3

3

1

2

Good (tight) clusteringsmaller value of SSE

Bad (loose) clusteringlarger value of SSE

SEE + +

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means Clustering: Algorithm• Initialization step

1. pick k cluster centers randomly

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario


1. pick k cluster centers randomly2. assign each sample to closest center

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario



• Iteration steps1. compute means in each cluster

i

iDx

Di x||1

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario



• Iteration steps1. compute means in each cluster2. re-assign each sample to the closest mean

i

iDx

Di x||1

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario




• Iterate until clusters stop changing

i

iDx

Di x||1

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario




• Iterate until clusters stop changing

• This procedure decreases the value of the objective function

k

i Dxik

i

xDE1

2),( ),...,( 1 kDDD ),...,( 1 k

optimization variables

block-coordinate descent: step 1 optimizes µ, step 2 optimizes D

i

iDx

Di x||1

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioK-means: Approximate Optimization• K-means is fast and often works well in practice• But can get stuck in a local minimum of objective Ek

• not surprising, since the problem is NP-hard

global minimumconverged to local min

initialization

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

μ1μ2

In this case K-means (K=2) automatically finds good threshold (between 2 clusters)

T

K-means clustering examples:Segmentation

K-means finds compact clusters

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

k = 3

k = 10k = 5

K-means clustering examples:Segmentation?

(random colors are used to better show segments/clusters)

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means clustering examples:Color Quantization

NOTEbias toequal-sizeclusters

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means clustering examples:Adding XY features

RGB features

color quantization

RGBXY features

superpixels

XY features only

Voronoi cells

related to HW 1

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Apply K-means to RGBXY features

K-means clustering examples:Superpixels

[SLIC superpixels, Achanta et al., PAMI 2011]

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

slide from Naotoshi Seo

…..

),...,( 1 Npp ffp

K-meansfor points in RN

instead of R3

responses to N filters (convolution)

Q: what is µi ?

K-means clustering examples:Texture Analysis

texton weighted

combination of filters

[Zhu et al. ECCV’02]

),...,( 1iN

ii

coordinates of patch Pp at p w.r.t. basis vectors

F (if filters F are orthonormal)

i

iipp FfFP

F - bank of N filters for texture, e.g. Gabor filters

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means Properties

• Works best when clusters are spherical (blob like)

• Fails for elongated clusters • SSE is not an appropriate objective function in this case

• Sensitive to outliers

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means as parametric clustering

maximum likelihood (ML) fitting of parameters μi (means) of Gaussian distributions

k

i Dxik

i

xE1

2

k

i Dxik

i

constxPE1

)|(log~

2

2

21

2||||exp)|(

i

ixxPGaussian distribution

equivalent (easy to check)

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means as non-parametric clustering

k

i Dyx ik

iDyxE

1 ,

2

||2||||

equivalent (easy to check)

k

i Dxik

i

xE1

2

i

ii

iDyx

DDx

iDi yxxD,

2||2

12||

1 ||||||||)var( 2sample variance:

just plug-inexpression

i

iDy

Di y||1

1

Di

Di

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means as variance clustering criteria

i

ii

iDyx

DDx

iDi yxxD,

2||2

12||

1 ||||||||)var( 2sample variance:

)var(||1

i

k

iik DDE

1

Di

Di

both formulas can be written as

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

• Disadvantages• Only a local minimum is found (sensitive to initialization)• May fail for non-blob like clusters • Sensitive to outliers• Sensitive to choice of k

K-means Summary

• Advantages• Principled (objective function) approach to clustering• Simple to implement (the approximate iterative optimization)• Fast

K-means fits Gaussian modelsQuadratic errors are such

Can add sparsity term and make k an additional variable

||1

2 kxEk

i Dxi

i

Akaike Information Criterion (AIC) orBayesian Information Criterion (BIC)

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

K-means – common extensions

k

i Dxdi

i

x1

• Parametric methods with arbitrary likelihoods P( ˑ|θ) (probabilistic K-means)

k

i Dxi

i

xP1

)|(log

Examples of P (ˑ|θ) : Gaussian, gamma, exponential, Gibbs, etc.

• Parametric methods with arbitrary distortion measure (distortion clustering)

d||||

k

i Dyx i

d

iD

yx

1 , ||

• Non-parametric methods: pair-wise clustering with arbitrary distortion measure (kernel K-means, normalized cuts, average association, average distortion)

d||||

Examples of : quadratic absolute truncated (K-means) (K-medians) (K-modes)

d||||

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario


k

iiiiGMM xNxP

1

),|()|( GMM distribution:

• Soft clustering using Gaussian Mixture Model (GMM)

- no “hard” assignments of points to distinct (Gaussian) clusters Di

- all points are used to estimate parameters of one complex multi-modal distribution (GMM)

optionalmaterial

1

1

2

2

3

3

x

)|( xPGMM

three Gaussian modes (k=3)of the mixture PGMM

simple1D example:

GMMs estimate “true” data distributions(continuous density analog of histograms)

mixing coefficients

)1|,,( kiiiiGMM

means and variances ofGaussian modes

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario


k

iiiiGMM xNxP

1





optionalmaterial

1

1

2

2

3

3

x

)|( xPGMM

three Gaussian modes (k=3)of the mixture PGMM

x

GMMGMM xPE )|(log)( approximate optimization

via EM algorithm

see SzeliskiSec. 5.3.1

or Christopher Bishop

“Pattern Recogn and Machine Learning”,

Ch.9

like K-means,sensitive to

local minima

maximum likelihood(ML) objective

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

r

g

b


k

iiiiGMM xNxP

1





x

GMMGMM xPE )|(log)(

optionalmaterial

maximum likelihood(ML) objective

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

GMM(elliptic) K-meanscolor indicates locally strongest mode color indicates assigned cluster

Gaussian clusters/modes in: K-means vs. GMM

optionalmaterial

k=6 k=6

hard assignment to clusters- separates data points into multiple

Gaussian blobs

only estimates means μi

soft mode searching- estimates data distribution with

multiple Gaussian modes

estimates both mean μi and (co)variance Σi for each mode- Σi can also be added as a cluster

parameter (elliptic K-means)

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario


optionalmaterial

hard clustering may not work wellwhen clusters overlap

k=4 k=4

(generally not a problem in segmentation, since objects do not “overlap” in RGBXY)

While this is an optimal GMM,it is hard to find for standard EMalgorithm due to local minima.


Gaussian blobs


- Σi can also be added as a cluster parameter (elliptic K-means)



estimates both mean μi and (co)variance Σi for each mode

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario




estimates both mean μi and (co)variance Σi for each mode

more expensive (EM algorithm, Szeliski Sec.5.3.1)

sensitive to local minima

does not scale to higher dimensions


Gaussian blobs


- Σi can also be added as a cluster parameter (elliptic K-means)

computationally cheap(block-coordinate descent)

sensitive to local minima

scales to higher dimensions (kernel K-means)

optionalmaterial

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

A simple non-parametric alternative: mean-shift clustering

Formulates clustering as histogram partitioning• Also looks for modes in data histograms

data points data histogram and its modes clustering

• Does not assume the number of clusters known

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioFinding Modes in a Histogram

How Many Modes Are There?• Easy to see, hard to compute

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Mean Shift[Fukunaga and Hostetler 1975, Cheng 1995, Comaniciu & Meer 2002]

1. Initialize random seed, and fixed window2. Calculate center of gravity ‘x’ of the window (the“mean”)3. Translate the search window to the mean4. Repeat Step 2 until convergence

Iter

ativ

e M

ode

Sear

ch

x

o

x x

mode

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario

Mean Shift[Fukunaga and Hostetler 1975, Cheng 1995, Comaniciu & Meer 2002]

http://www.uwo.ca/

http://www.uwo.ca/

http://www.caip.rutgers.edu/~comanici/clusterDemo.html

The University of

Ontario

Mean-shift results for segmentation

RGB+XY clustering[Comaniciu & Meer 2002]

5D featuresadding XY helps

“coherence”in the image domain

http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario



http://www.uwo.ca/

http://www.uwo.ca/

The University of

Ontario



works well for color quantization

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioIssues: Window size (kernel bandwidth) selection is critical

- can not be too small or too large - indirectly controls the number of clusters (k)- different width in RGB and XY parts of the space

Color may not be sufficient(e.g. color overlap between object and background)

Integrating detailed boundary cues • contrast edges• explicit shape priors (smoothness, curvature, convexity, atlas)

more on this later

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioSome terminology

ClusteringPartitioning

Segmentation

Q: differences between these terms?One answer: not really. But, segmentation often implies

image features (pixels) and importance of geometric component (XY).

http://www.uwo.ca/

http://www.uwo.ca/

The University of

OntarioHW assignment 1 RGB and RGB+XY clustering

• relative weights w for XY part:• from color quantization to superpixels

Parametric methods (K-means)• fixed K (estimate k for extra credit)• sensitivity to initialization (use seeds to initialize)• weighted errors (different weight in RGB and XY)

Non-parameric methods (mean- or medoid-shift)• extra credit for undergrads, required for grads• sensitivity to bandwidth

],,,,[ , ppppp YwXwBGR

http://www.uwo.ca/

http://www.uwo.ca/

cs 4487/9587 algorithms for image analysis

Documents

university of ontariocs

university of ontariogoal

university of ontarioresult

likelihood ratio test

objectpixel p

colors thresholding

pixels p

y tmaskx