learning an attribute dictionary for human action classification qiang qiu, zhuolin jiang, and rama...

Learning an Attribute Dictionary for Human Action Classification

Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation and Recognition of Action Attributes”, ICCV 2011

Qiang Qiu

Action Feature Representation

2

Shape

Motion

HOG

Action Sparse Representation

3

=

000.640.53-0.400.35000

0.430.6300-0.330-0.3600

=0.43 × + 0.63 × - 0.33 × - 0.36 ×

= 0.64× + 0.53× - 0.40 × +0.35 ×

Action Dictionary

Sparse code

K-SVD

4

=

Y

y2d1 d2 d3 …

000.640.53-0.400.35000

0.430.6300-0.330-0.3600

x1 x2

y1

D

X K-SVD

Input: signals Y, dictionary size, sparisty T Output: dictionary D, sparse codes X

arg min |Y- DX|2 s.t. i , |xi|0 ≤ TD,X

Input signals Dictionary

Sparse codes

[1] M. Aharon and M. Elad and A. Bruckstein, K-SVD: An Algorithm for Designing Overcomplete Dictionries for Sparse Representation, IEEE Trans. on Signal Process, 2006

5

CompactDiscriminativeand

Dictionary.

Learn a

Objective

Probabilistic Model for Sparse Representation

A Gaussian Process Dictionary Class Distribution

6

7

0.430.6300-0.330-0.36000000

000.640.53-0.400.350000000

00000000-0.280.6980.370.250

0000-0.4200000.420.4700.32

=

y2y1 y4y3

l1l1 l2l2

d1 d2 d3 …

x1 x2 x3 x4

xd1

l1l1 l2l2

More Views of Sparse Representation

8

y2y1 y4y3

l1l1 l2l2

xd10.43 0 0 0

x1 x2 x3 x4

0.63 0 0 0

0 0.64 0 0

0 0.53 0 0

-0.33 -0.40 0 -0.42… …

xd2

xd3

xd4

xd5

l1 l1 l2 l2

d1

d2

d3

d4

d5

A Gaussian Process• Covariance function entry: K(i,j) = cov(xdi, xdj)• P(Xd*|XD*) is a Gaussian with a closed-form conditional variance

A Gaussian Process

9

y2y1 y4y3

l1l1 l2l2

xd10.43 0 0 0x1 x2 x3 x4

0.63 0 0 0

0 0.64 0 0

0 0.53 0 0

-0.33 -0.40 0 -0.42… …

xd2

xd3

xd4

xd5

l1 l1 l2 l2

d1

d2

d3

d4

d5

Dictionary Class Distribution• P(L|di), L [1, M]• aggregate |xdi| based on class labels to obtain a M sized vector• P(L=l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348• P(L=l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37

Dictionary Class Distribution

Dictionary Learning Approaches Maximization of Joint Entropy (ME) Maximization of Mutual Information

(MMI) Unsupervised Learning (MMI-1) Supervised Learning (MMI-2)

10

11

Maximization of Joint Entropy (ME)- Initialize dictionary using k-SVD

Do =

- Start with D* = - Untill |D*|=k, iteratively choose d* from Do\D*,

d* = arg max H(d|D*)dME dictionary

D

- A good approximation to ME criteriaarg max H(D)

where

12

Maximization of Mutual Information for Unsupervised Learning (MMI-1)

- Initialize dictionary using k-SVD


MMI dictionary

- Closed form:

- A near-optimal approximation to MMI

arg max I(D; Do\D) within (1-1/e) of the optimum

D

d* = arg max H(d|D*) - H(d|Do\(D*

d)) d

Diversity Coverage

13

Dictionary Class Distribution• P(L|di), L [1, M]• aggregate|xdi|based on class labels to obtain a M sized vector• P(l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348• P(l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37

• P(Ld) = P(L|d) • P(LD) = P(L|D) , where

y2y1 y4y3

l1l1 l2l2

xd10.43 0 0 0

x1 x2 x3 x4

0.63 0 0 0

0 0.64 0 0

0 0.53 0 0

-0.33 -0.40 0 -0.42… …

xd2

xd3

xd4

xd5

l1 l1 l2 l2

d1

d2

d3

d4

d5

Revisit

14

Maximization of Mutual Information for Supervised Learning (MMI-2)- Initialize dictionary using k-SVD


dd* = arg max [H(d|D*) - H(d|Do\(D*

d))] + λ[H(Ld|LD*) – H(Ld|LDo\

(D* d))]

- MMI-1 is a special case of MMI-2 with λ=0.

Keck gesture dataset

15

Representation Consistency

16[1] J. Liu and M. Shah, Learning Human Actions via Information Maximization, CVPR 2008

[1]

Recognition Accuracy

17

The recognition accuracy using initial dictionary Do: (a) 0.23 (b) 0.42 (c) 0.71

Recognizing Realistic Actions

18

• 150 broadcast sports videos.• 10 different actions.

• Average recognition rate: 83:6%• Best reported result 86.6%

learning an attribute dictionary for human action classification qiang qiu, zhuolin jiang, and rama...

Documents