visual’codebook - university of...

44
Visual Codebook TaeKyun Kim Sidney Sussex College 1

Upload: others

Post on 16-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Visual  Codebook

Tae-­‐Kyun Kim

Sidney  Sussex  College

1

Page 2: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Visual  WordsVisual  words  are  base  elements  to  describe  an  image.  

Interest  points  are  detected  from  an  imageCorners,  Blob  detector,  SIFT  detector

Image  patches  are  represented  by  descriptorSIFT  (Scale-­‐Invariant  Feature  Transform)  or  Raw  pixel  intensity

2

dim

ensi

on Dor  raw  pixels

Page 3: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Building  a  Visual  Codebook  (Dictionary)

Visual  words  (real-­‐valued  vectors)  can  be  compared  using  Euclidean  distance:

These  vectors  are  divided  into  groups  which  are  similar,  essentially  clustered  together,  to  form  a  codebook.

3

CodebookVisual  Codewords

Page 4: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

K-­‐means  ClusteringThe  two  steps  repeated  until  no  vector  changes  membership:

1.  Compute  a  cluster  center for  each  cluster  as  the  mean  of  the  cluster  members

2.  Reassign  each  data  point  to  the  cluster  whose  center is  nearest

The  cluster  centers (mean  values)  now  form  a  visual  dictionary.

4

K  codewords

Page 5: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Histogram  of  Visual  Words

5

Nearest  NeighbourMatching

freq

uency

Every  visual  word  is  compared  with  codewords and  assigned  to  the  nearest  codeword.

Histogram  bins  are  codewords and  each  bin  counts  the  number  of  words  assigned  to  the  codeword.  

K

Page 6: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

categorydecision

Learning

feature  detection&  representation

codewords dictionary

image  representation

category  models(and/or)  classifiers

Recognition

6

Categorisation

Page 7: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

CategorisationHistogram  matching  of  a  pair  of  images  P and  Q is  

Based  on  HMK,  we  can  use  various  classifiers  such  as  kNN,  SVM,  Naïve-­‐Bayes classifiers  or  pLSA,  LDA.  

categorydecision

kNNClassifier

Class  assignment  by

Class  1 Class  C

P Q

Page 8: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Summary  of  K-­‐means  Codebook

The  histogram  is  1-­‐D,  a  highly  compact  and  robust  representation  of  images.  

The  histogram  representation  greatly  facilitates  modelling  images  and  their  categories.  

Codeword  assignment  (quantisation  process)  by  K-­‐means  is  time-­‐demanding.

K-­‐means  is  an  unsupervised  learning  method.  

8

Page 9: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Case  Study:  Video  (action)  Categorisation  

by  RF  Codebook

In  collaboration  with  Tsz Ho  Yu

9

Page 10: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Demo  video:  Action  categorisation

10

Page 11: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

11

categorydecision

Learning

Interest  point  detection

Bag  of  semantic  textons

Recognition

FASTEST  Corner

Spatiotemporal  Semantic  Texton  Forest

kNN Classifier

Descriptor/Codebook

Classification

FASTEST  Corner

Powerful  Discriminative  CodebookExtremely  Fast

Page 12: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Relation  to  the  previous

12

Input Images Videos

Interest  Point Corners   3D  corners

Descriptor Patches   Space-­‐time  volumes

SIFT Raw  pixels

CodebookK-­‐means

Randomised  Decision  Forests

Representation Histogram  of  visual  words

Classifier kNN classifierSkip

Page 13: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Relation  to  the  previous-­‐Time  Volumes

13

Image  (2D)

Patch

Video  (3D)

Cuboid

or

Page 14: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Data  Set  and  Interest  Point

14

Page 15: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Action  data  set

15

http://www.nada.kth.se/cvap/actions/

25  subjects,  2391  sequences   the  largest  public  action  DB

walking jogging running boxingHand  waving

Hand  clapping

s1

s2

s3

s4

Page 16: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Space-­‐Time  Interest  Point

16

In  Dollar  et  al.  2005,  separable  linear  filters  are  applied.  

The  response  function  is  

is  2D  Gaussian  smoothing kernel

hev /  hod are  a  pair  of  1D  Gabor  filters  temporallyapplied as

Page 17: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Space-­‐Time  Interest  Point

17

For  multiple  scales,  it  runs  the  detector  over  a  set  of  spatial  and  temporal  scales.

Interest  points  are  detected  as  local  maxima of  the  response-­‐function.

Page 18: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Matlab Demo:Space-­‐Time  Interest  Points

18

Page 19: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Background:  Randomised  Decision  Forest

19

Page 20: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Binary  Decision  Trees

1

2 3

6 74

9

5

8

category  c

split  nodesleaf  nodesv

10 11 12 13

14 15 16 17

feature  vector   vsplit  functions   fn(v)thresholds   tnClassifications Pn(c)

<

<

Slide  credits  to  J.  Shotton

Page 21: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Toy  Learning  Example

x

y

feature  vectors  are  x,  y coordinates: v = [x, y]Tsplit  functions  are  lines  with  parameters  a,  b: fn(v)  = ax + bythreshold  determines  intercepts: tnfour  classes:  purple,  blue,  red,  green

Try  several  lines,  chosen  at  random

Keep  line  that  best  separates  datainformation  gain

Recurse

Slide  credits  to  J.  Shotton

Page 22: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

x

y

feature  vectors  are  x,  y coordinates: v = [x, y]Tsplit  functions  are  lines  with  parameters  a,  b: fn(v)  = ax + bythreshold  determines  intercepts: tnfour  classes:  purple,  blue,  red,  green

Try  several  lines,  chosen  at  random

Keep  line  that  best  separates  datainformation  gain

Recurse

Toy  Learning  Example

Slide  credits  to  J.  Shotton

Page 23: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

x

y

feature  vectors  are  x,  y coordinates: v = [x, y]Tsplit  functions  are  lines  with  parameters  a,  b: fn(v)  = ax + bythreshold  determines  intercepts: tnfour  classes:  purple,  blue,  red,  green

Try  several  lines,  chosen  at  random

Keep  line  that  best  separates  datainformation  gain

Recurse

Toy  Learning  Example

Slide  credits  to  J.  Shotton

Page 24: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

x

y

feature  vectors  are  x,  y coordinates: v = [x, y]Tsplit  functions  are  lines  with  parameters  a,  b: fn(v)  = ax + bythreshold  determines  intercepts: tnfour  classes:  purple,  blue,  red,  green

Try  several  lines,  chosen  at  random

Keep  line  that  best  separates  data

information  gain

Recurse

Toy  Learning  Example

Slide  credits  to  J.  Shotton

Page 25: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Generalisation

Overfit Reasonably  Smooth

vv

Page 26: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Generalisation

Reasonably  Smooth

x

tree  t1 tree  tTtree  t2

Page 27: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Train  data:  

Recursive  algorithmset  In of  training  examples  that  reach  node n is  split:

Features  f and  thresholds  t chosen  at  randomChoose  f and  t to  maximize  gain  in  information

Randomized  Tree  Learning

left  split

right  split

category  c

In

Il Ir

category  c category  c

Page 28: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Forest  is  an  ensemble  of  decision  trees

Bagging  (Bootstrap  AGGregatING)For  each  set,  it  randomly  draws  examples  from  the  uniform  dist.  allowing  duplication  and  missing.

A  Forest  of  Trees  :  Learning

tree  t1 tree  tT

DS1 DST

DS

Page 29: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Forest  is  an  ensemble  of  decision  trees

classification  is

A  Forest  of  Trees  :  Recognition  

tree  t1 tree  tT

category  ccategory  c

split  nodes

leaf  nodes

v v

Slide  credits  to  J.  Shotton

Page 30: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Fast  Evaluation

6

8

11

16

13

2

2

3

2

1

4

7

5  times  speed  up

Flat  structure  

Tree  structure

v v

Page 31: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Demo  video:  Fast  evaluation

31

Page 32: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Random  ForestCodebook

32

Page 33: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Video  Cuboid Features

pi

Video  pixel  i gives  cuboid p(13x13x19 pixels  in  experiments)

tree  split  function

i

If f(p) threshold>goto Left

Elsegoto Right

Page 34: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Randomized  Forest  Codebook  

543

1

2

8 96 7

42 61 3

98

tree  t1 tree  tT

75

Example  Cuboids

6

Example  Cuboids

6

FASTEST  Corner

Page 35: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Histogram  of  Randomized  Forest  Codewords

543

1

2

8 96 7

42 61 3

98

tree  t1 tree  tT

75

freq

uency

tree  t1 tree  tT

node  index

Extremely  FastPowerful  Discriminative  

Codebook

Page 36: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

36

Matlab  Demo:Random  Forest  Codebook

Page 37: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Action  Categorisation  Result

37

Action  categorization  accuracy  (%)  on  KTH  dataset6  types  of  actions,  25  subjects  of  4  scenarios  by  leave-­‐one-­‐out  protocol  

Page 38: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

38

Matlab  Demo:Action  Categorisation

Page 39: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Take-­‐home  Message

Use  of  visual  codebook  greatly  facilitates  image  and  video  categorisation  tasks.The  RF  codebook  is  extremely  fast  as  it  only  uses  a  small  number  of  features  in  trees.  

c.f.  K-­‐means  codebook    is  in  a  flat  structure.  It  compares  an  input  with  every  codewords,  which  is  time-­‐demanding.  

The  RF  codebook  is  also  a  powerful  discriminative  codebook.  It  combines  multiple  decision  trees  showing  good  generalisation.  

c.f.  K-­‐means  is  an  unsupervised  learning  method.  

39

Page 40: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Questions?

40

Page 41: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Action  Recognition  Result

41

quantisation  effect!

Action  categorization  accuracy  (%)  on  KTH  dataset6  types  of  actions,  25  subjects  of  4  scenarios  by  leave-­‐one-­‐out  protocol  

Page 42: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Pyramid  Match  Kernel

PMK  acts  on  semantic  texton histogrammatches  P and  Q in  learned hierarchical  histogram  spacedeeper  node  matches  are  more  important

depth  weight

increased  similarity  at  depth  d

d=1

d=2

d=3

split  nodesleaf  nodes

Page 43: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

CategorisationHistogram  matching  of  a  pair  of  videos  P and  Q is  

Based  on  HMK,  we  can  use  various  classifiers  such  as  kNN,  SVM,  Naïve-­‐Bayes classifiers  or  pLSA,  LDA.  

categorydecision

kNNClassifier

NN  classification  by

Class  1 Class  M

P Q

Page 44: Visual’Codebook - University of Cambridgemi.eng.cam.ac.uk/~cipolla/lectures/PartIB/old/IB-visualcodebook.pdf · Relationtotheprevious 12 î W ï Input Images Videos InterestPoint

Action  Recognition  ResultAction  categorization  accuracy  (%)  on  KTH  dataset

6  types  of  actions,  25  subjects  of  4  scenarios  by  leave-­‐one-­‐out  protocol