visual’codebook - university of...

Visual Codebook

Tae-‐Kyun Kim

Sidney Sussex College

1

Visual WordsVisual words are base elements to describe an image.

Interest points are detected from an imageCorners, Blob detector, SIFT detector

Image patches are represented by descriptorSIFT (Scale-‐Invariant Feature Transform) or Raw pixel intensity

2

dim

ensi

on Dor raw pixels

Building a Visual Codebook (Dictionary)

Visual words (real-‐valued vectors) can be compared using Euclidean distance:

These vectors are divided into groups which are similar, essentially clustered together, to form a codebook.

3

CodebookVisual Codewords

K-‐means ClusteringThe two steps repeated until no vector changes membership:

1. Compute a cluster center for each cluster as the mean of the cluster members

2. Reassign each data point to the cluster whose center is nearest

The cluster centers (mean values) now form a visual dictionary.

4

K codewords

Histogram of Visual Words

5

Nearest NeighbourMatching

freq

uency

Every visual word is compared with codewords and assigned to the nearest codeword.

Histogram bins are codewords and each bin counts the number of words assigned to the codeword.

K

categorydecision

Learning

feature detection& representation

codewords dictionary

image representation

category models(and/or) classifiers

Recognition

6

Categorisation

CategorisationHistogram matching of a pair of images P and Q is

Based on HMK, we can use various classifiers such as kNN, SVM, Naïve-‐Bayes classifiers or pLSA, LDA.

categorydecision

kNNClassifier

Class assignment by

Class 1 Class C

P Q

Summary of K-‐means Codebook

The histogram is 1-‐D, a highly compact and robust representation of images.

The histogram representation greatly facilitates modelling images and their categories.

Codeword assignment (quantisation process) by K-‐means is time-‐demanding.

K-‐means is an unsupervised learning method.

8

Case Study: Video (action) Categorisation

by RF Codebook

In collaboration with Tsz Ho Yu

9

Demo video: Action categorisation

10

11

categorydecision

Learning

Interest point detection

Bag of semantic textons

Recognition

FASTEST Corner

Spatiotemporal Semantic Texton Forest

kNN Classifier

Descriptor/Codebook

Classification

FASTEST Corner

Powerful Discriminative CodebookExtremely Fast

Relation to the previous

12

Input Images Videos

Interest Point Corners 3D corners

Descriptor Patches Space-‐time volumes

SIFT Raw pixels

CodebookK-‐means

Randomised Decision Forests

Representation Histogram of visual words

Classifier kNN classifierSkip

Relation to the previous-‐Time Volumes

13

Image (2D)

Patch

Video (3D)

Cuboid

or

Data Set and Interest Point

14

Action data set

15

http://www.nada.kth.se/cvap/actions/

25 subjects, 2391 sequences the largest public action DB

walking jogging running boxingHand waving

Hand clapping

s1

s2

s3

s4

http://www.nada.kth.se/cvap/actions/

Space-‐Time Interest Point

16

In Dollar et al. 2005, separable linear filters are applied.

The response function is

is 2D Gaussian smoothing kernel

hev / hod are a pair of 1D Gabor filters temporallyapplied as

Space-‐Time Interest Point

17

For multiple scales, it runs the detector over a set of spatial and temporal scales.

Interest points are detected as local maxima of the response-‐function.

Matlab Demo:Space-‐Time Interest Points

18

Background: Randomised Decision Forest

19

Binary Decision Trees

1

2 3

6 74

9

5

8

category c

split nodesleaf nodesv

10 11 12 13

14 15 16 17

feature vector vsplit functions fn(v)thresholds tnClassifications Pn(c)

<

<

Slide credits to J. Shotton

Toy Learning Example

x

y

feature vectors are x, y coordinates: v = [x, y]Tsplit functions are lines with parameters a, b: fn(v) = ax + bythreshold determines intercepts: tnfour classes: purple, blue, red, green

Try several lines, chosen at random

Keep line that best separates datainformation gain

Recurse


x

y



Keep line that best separates datainformation gain

Recurse



x

y



Keep line that best separates data

information gain

Recurse



Generalisation

Overfit Reasonably Smooth

vv

Generalisation

Reasonably Smooth

x

tree t1 tree tTtree t2

Train data:

Recursive algorithmset In of training examples that reach node n is split:

Features f and thresholds t chosen at randomChoose f and t to maximize gain in information

Randomized Tree Learning

left split

right split

category c

In

Il Ir

category c category c

Forest is an ensemble of decision trees

Bagging (Bootstrap AGGregatING)For each set, it randomly draws examples from the uniform dist. allowing duplication and missing.

A Forest of Trees : Learning

tree t1 tree tT

DS1 DST

DS

Forest is an ensemble of decision trees

classification is

A Forest of Trees : Recognition

tree t1 tree tT

category ccategory c

split nodes

leaf nodes

v v


Fast Evaluation

6

8

11

16

13

2

2

3

2

1

4

7

5 times speed up

Flat structure

Tree structure

v v

Demo video: Fast evaluation

31

Random ForestCodebook

32

Video Cuboid Features

pi

Video pixel i gives cuboid p(13x13x19 pixels in experiments)

tree split function

i

If f(p) threshold>goto Left

Elsegoto Right

Randomized Forest Codebook

543

1

2

8 96 7

42 61 3

98

tree t1 tree tT

75

Example Cuboids

6

Example Cuboids

6

FASTEST Corner

Histogram of Randomized Forest Codewords

543

1

2

8 96 7

42 61 3

98

tree t1 tree tT

75

freq

uency

tree t1 tree tT

node index

Extremely FastPowerful Discriminative

Codebook

36

Matlab Demo:Random Forest Codebook

Action Categorisation Result

37

Action categorization accuracy (%) on KTH dataset6 types of actions, 25 subjects of 4 scenarios by leave-‐one-‐out protocol

38

Matlab Demo:Action Categorisation

Take-‐home Message

Use of visual codebook greatly facilitates image and video categorisation tasks.The RF codebook is extremely fast as it only uses a small number of features in trees.

c.f. K-‐means codebook is in a flat structure. It compares an input with every codewords, which is time-‐demanding.

The RF codebook is also a powerful discriminative codebook. It combines multiple decision trees showing good generalisation.

c.f. K-‐means is an unsupervised learning method.

39

Questions?

40

Action Recognition Result

41

quantisation effect!

Action categorization accuracy (%) on KTH dataset6 types of actions, 25 subjects of 4 scenarios by leave-‐one-‐out protocol

Pyramid Match Kernel

PMK acts on semantic texton histogrammatches P and Q in learned hierarchical histogram spacedeeper node matches are more important

depth weight

increased similarity at depth d

d=1

d=2

d=3

split nodesleaf nodes

CategorisationHistogram matching of a pair of videos P and Q is

Based on HMK, we can use various classifiers such as kNN, SVM, Naïve-‐Bayes classifiers or pLSA, LDA.

categorydecision

kNNClassifier

NN classification by

Class 1 Class M

P Q

Action Recognition ResultAction categorization accuracy (%) on KTH dataset

6 types of actions, 25 subjects of 4 scenarios by leave-‐one-‐out protocol

visual’codebook - university of...

Documents