unsupervised learning of human actions using spatial

Unsupervised Learning of Human Actions Using Spatial-Temporal Words

Juan Carlos Niebles1,2, Hongcheng Wang1, Li Fei-Fei11University of Illinois at Urbana – Champaign, Urbana, IL 61801, USA 2Universidad del Norte, Barranquilla, Colombia

Ref: [1] J.C. Niebles, H. Wang, L. Fei-Fei. Unsupervised Learning of Human Actions Using Spatial-Temporal Words. Submitted. 2006.

[2] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, 1999.

Feature representationFeature description: • Histogram of brightness gradientObtaining Codebook:• K-means clustering of video word descriptors

Representation:• Histogram of video words from the codebook

SummaryProblem statement: identifying and localizing different human actions in video sequences with moving background and moving camera. Contributions:• Unsupervised learning of actions using “bag of video words” representation• Multiple action localization and categorization in a single video.• Best reported performance on standard dataset.

Training data• KTH human motions data (6 classes)• SFU figure skating data (3 classes)

Algorithm

New sequence

Feature extraction and description

Decide on best model

Recognition

Class 1

Class N

Feature extraction

Class 1

Class N

Model 1

Model N

Feature extractionand description

Learning

Represent each sequence as a bag of video words

Input videosequences

Learn a modelfor each class

Form Codebook

Feature extraction• Separable linear filters (2D Gaussian + 1D Gabor filters)• A small video cube is extracted around each interest point

Long and complex videos

3-class model: Results on a long video sequence

6-class model: Results on a complex natural video

Exp II: 3-action classes of

figure skating dataset

stand-spin camel-spinsit-spin

Only the words assigned to the most likely action category are shown.

Classification

Given a new video and the learnt model, we can classify it as belonging to one of the action categories.

( ) ( ) ( ) zw PdzP dwPK

k ktestktest ∑ ==

( )testkk

dzP |maxargcategoryaction =

Localization

Given a new video, we can localize multiple motions:

( )testk dzP |Select topics with high

( )kzwP |

Assign words to topics according to

Cluster words

from selected

topics according to their spatial position

Exp I: 6-action classes of KTH dataset

Testing with the Caltech dataset

Performance:

walking running jogging

boxing handclapping handwaving

Colored words according to their most likely action category.

Only the words assigned to the most likely action category are shown.

62.96Ke et al.

71.72Schuldt et al.

81.17Dollar et al.

81.50Our method

Recognition

Accuracy %

Method

• Best Performance• Multiple Actions• Unlabeled training

We deploy a pLSA model for video analysis.

K = Number of Action

unsupervised learning of human actions using spatial

Documents

unsupervised learning clustering algorithms - it -...

unsupervised learning of metric representations with slow...

spatio-temporal modeling of grasping actions€¦ · study...

watch-n-patch: unsupervised understanding of actions and...

software development for unsupervised approach to...

law on spatial development and construction …. law on...

unsupervised spatial event detection in targeted...

outline facial attributes analysis animated pose...

applying unsupervised learning - mathworks · applying...

07 unsupervised

strengthening urban-rural linkages to reduce spatial...

robust action recognition using multi-scale spatial ... ·...

unsupervised learning. supervised learning vs. unsupervised...

predicting spatial self-organization with statistical...

unsupervised spatial event detection in targeted domains...

spatio-temporal modeling of grasping actions -...

lecture 11 - csd.uwo.ca · lecture 11 unsupervised learning...

unsupervised learning of hierarchical spatial structures

unsupervised prsentationssssdas

auditory observation of stepping actions ... -...