beyond actions: discriminative models for contextual group activities tian lan school of computing...
Post on 19-Dec-2015
217 views
TRANSCRIPT
Beyond Actions: Discriminative Models for Contextual Group
Activities
Tian LanSchool of Computing Science
Simon Fraser University August 12, 2010
M.Sc. Thesis Defense
Outline
• Group Activity Recognition with Context– Structure-level (latent structures)– Feature-level (Action Context descriptor)
• Experiments
• Introduction
Activity Recognition• Goal Enable computers to analyze and understand
human behavior.
Answering a phone Kissing
Action vs. Activity Activity: a group of
people forming a queue Action: Stand
in a queue and facing left
Activity Recognition
• Activity Recognition is important
• Activity Recognition is difficult intra-class variation, background clutter, partial
occlusion, etc.
SurveillanceEntertainment
SportHCI
Group Activity Recognition
• Motivation human actions are rarely performed in
isolation, the actions of individuals in a group can serve as context for each other.
• Goal explore the benefit of contextual information
in group activity recognition in challenging real-world applications
Group Activity Recognition
• Two types of ContextTalk
… …
group-person interaction
person-person interaction
Latent Structured Model
y
h1 h2 yh
x1 x2 xn image
action class
activity class
x0
…
Activity
Action
Feature
Hidden layer
y
h1 h2 yhn
x1 x2 xn
image
action class
activity class
x0
…
Latent Structured Modelgroup-person
Interaction
person-person Interaction
Structure-level
Feature-level
Difference from Previous Work
• Group Activity Recognition
Previous Work• Single-person action recognition Schuldt et al. icpr 04• Relative simple activity recognition Vaswani et al. cvpr 03• Dataset in controlled conditions
Our work• Group activity recognition in realistic videos• Two new types of contextual information• A unified framework
Difference from Previous Work
• Latent Structured Models
Our work latent structure for the hidden layer, automatically infer it during learning and inference.
Previous worka pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08)
Outline
• Group Activity Recognition with Context– Structure-level (latent structures)– Feature-level (Action Context descriptor)
• Experiments
• Introduction
y
h1 h2 yhn
x1 x2 xn
image
action class
activity class
x0
…
Structure-level Approach
person-person Interaction
Structure-level
Feature-level
Model Formulation
y
h1 h2 yhn
x1 x2 xn
x0
…
Image-ActivityImage-Action Action-Activity
Action-Action
Input: image-label pair (x,h,y)
y
h1 h2 yhn
x1 x2 xn
image
action class
activity class
x0
…
Feature-level Approach
person-person Interaction
Structure-level
Feature-level
Feature-level Approach
• Model
y
h1 h2 yh
x1 x2 xn image
action class
activity class
x0
…Action Context
Descriptor
Action Context Descriptor
Feature Descriptor
Multi-class SVM
action class
scor
e
action class
scor
e
…action class
scor
e
max
action classsc
ore
e.g. HOG by Dalal & Triggs
Outline
• Group Activity Recognition with Context– Structure-level (latent structures)– Feature-level (Action Context descriptor)
• Experiments
• Introduction
Dataset
• Collective Activity Dataset (Choi et al. VS 09)
• 5 action categories: crossing, waiting, queuing, walking, talking. (per person)
• 44 video clips
Dataset
• Nursing Home Dataset• activity categories: fall, non-fall. (per image)• 5 action categories: walking, standing, sitting,
bending and falling. (per person)• In total 22 video clips (2990 frames), 8 clips for
test, the rest for training. 1/3 are labeled as fall.
Baselines• root (x0) + svm (no structure)• No connection• Min-spanning tree• Complete graph within r
h1
h2
h3
h4
h1
h2
h3
h4rh1
h2
h3
h4
h1
h2
h3
h4
Structure-level approach
Hidden layer
System Overview
Person
DetectorPerson
DescriptorVideo
u
v
Model
• Pedestrian Detection by Felzenszwalb et al.• Background Subtraction
• HOG by Dalal & Triggs • LST by Loy et al. at cvpr 09
Conclusion
• A discriminative model for group activity recognition with context.
• Two new types of contextual information:– group-person interaction– person-person interaction• structure-level: Latent structure• Feature-level: Action Context descriptor
• Experimental results demonstrate the effectiveness of the proposed model
Future Work
• Modeling Complex Structures– Temporal dependencies among action
• Contextual Feature Descriptors– How to encode discriminative context?
• Weakly supervised Learning– e.g. multiple instance learning for fall detection
Person Detectors
• Collective Activity Dataset: • Pedestrian Detector (Felzenszwalb et al., CVPR 08)
• Nursing Home Dataset
BackgroundSubtraction
Moving RegionsVideo
Person Descriptors
• Collective Activity Dataset: • HOG
• Nursing Home Dataset• Local Spatial Temporal (LST) Descriptor (Loy et al.,
ICCV 09)
u
v