![Page 1: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/1.jpg)
Recognizing Human Figures and Actions
Greg MoriSimon Fraser University
![Page 2: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/2.jpg)
Goal
• Action recognition– Where are the people?– What are they doing?
• Applications– Image understanding, image retrieval and search– HCI– Surveillance– Computer Graphics
![Page 3: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/3.jpg)
• 3-pixel man
• Blob tracking
• 300-pixel man
• Find and track limbs
Far field
Near field
Medium field• 30-pixel man
• Coarse-level actions
![Page 4: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/4.jpg)
Outline
• Human figures in motion– Action Recognition
• Localizing joint positions– Exemplar-based approach– Parts-based approach
• Motion Synthesis– Novel graphics application
![Page 5: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/5.jpg)
Appearance vs. Motion
Jackson PollockNumber 21 (detail)
QuickTime™ and a decompressorare needed to see this picture.
![Page 6: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/6.jpg)
Action Recognition
• Recognize human actions at a distance– Low resolution, noisy data– Moving camera, occlusions– Wide range of actions (including non-periodic)
QuickTime™ and a decompressor
are needed to see this picture.
![Page 7: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/7.jpg)
Our Approach
• Motion-based approach– Classify a novel motion by finding the most similar
motion from the training set– Use large amounts of data (“non-parametric”)
• Related Work– Periodicity analysis
• Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis; Collins et al.
– Model-free • Temporal Templates [Bobick & Davis]
• Orientation histograms [Freeman et al; Zelnik & Irani]
• Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]
![Page 8: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/8.jpg)
Gathering action data
• Tracking – Simple correlation-based tracker
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
![Page 9: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/9.jpg)
Figure-centric Representation
• Stabilized spatio-temporal volume– No translation information– All motion caused by person’s
limbs• Good news: indifferent to camera
motion• Bad news: hard!
• Good test to see if actions, not just translation, are being captured
![Page 10: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/10.jpg)
input sequence
Remembrance of Things Past
• “Explain” novel motion sequence by matching to previously seen video clips– For each frame, match based on some temporal
extent
Challenge: how to compare motions?
run
walk leftswing
walk rightjog
database
![Page 11: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/11.jpg)
How to describe motion?
• Appearance – Not preserved across different clothing
• Gradients (spatial, temporal)– same (e.g. contrast reversal)
• Edges– Unreliable at this scale
• Optical flow– Explicitly encodes motion
– Least affected by appearance
– …but too noisy
![Page 12: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/12.jpg)
Spatial Motion Descriptor
Image frame Optical flow
Fx,y
yx FF , yyxx FFFF ,,, blurred
yyxx FFFF ,,,
![Page 13: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/13.jpg)
Spatio-temporal Motion Descriptor
t
…
…
…
…
Sequence A
Sequence B
Temporal window w
Bframe-to-frame
similarity matrix
A
motion-to-motionsimilarity matrix
A
B
I matrix
w
w
blurry I
w
w
![Page 14: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/14.jpg)
Soccer
• Real actions, moving camera, poor video
• 8 classes of actions
• 4500 frames of labeled data
• 1-nearest-neighbor classifier
![Page 15: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/15.jpg)
Classifying Ballet Actions16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.
![Page 16: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/16.jpg)
Classifying Tennis Actions
6 actions; 4600 frames; 7-frame motion descriptorWoman player used as training, man as testing.
![Page 17: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/17.jpg)
Classifying Tennis
• Red bars show classification results
QuickTime™ and a decompressor
are needed to see this picture.
![Page 18: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/18.jpg)
Outline
• Human figures in motion– Action Recognition
• Localizing joint positions– Exemplar-based approach– Parts-based approach
• Motion Synthesis– Novel graphics application
![Page 19: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/19.jpg)
![Page 20: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/20.jpg)
![Page 21: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/21.jpg)
Human Figures in Still Images
• Detection of humans is possible for stereotypical poses– Standing– Walking– (Viola et al., Poggio et al.)
• But we want to do more– Wider variety of poses– Localize joint positions
![Page 22: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/22.jpg)
Problem
![Page 23: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/23.jpg)
Shape Matching For Finding People
Database of Exemplars
![Page 24: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/24.jpg)
Shape Contexts• Deformable template approach
– Shapes represented as a collection of edge points
• Two stages– Fast pruning
• Quick tests to construct a shortlist of candidate objects
• Database of known objects could be large
– Detailed matching• Perform computationally expensive comparisons on
only the few shapes in the shortlist
• Publications– Mori et al., CVPR 2001
– Mori and Malik, CVPR 2003• Featured in New York Times Science section
![Page 25: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/25.jpg)
Results: Tracking by Repeated Finding
QuickTime™ and aCinepak decompressor
are needed to see this picture.
![Page 26: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/26.jpg)
Multiple Exemplars
• Parts-based approach– Use a combination of keypoints or
limbs from different exemplars– Reduces the number of exemplars needed
• Compute a matching cost for each limb from every exemplar
• Compute pairwise “consistency” costs for neighbouring limbs
• Use dynamic programming to find best K configurations
![Page 27: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/27.jpg)
Combining Exemplars
![Page 28: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/28.jpg)
Finding People (II): Parts-based Approach
• Bottom-up
• Segmentation as preprocessing
• Detect half-limbs and torsos
• Assemble partial configurations– Prune using global constraints
• Extend partial configurations to full human figures
![Page 29: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/29.jpg)
Segmentation for Recognition
• Window-scanning (e.g. face detection)– O(N M S)
SUPERPIXELS
SEGMENTS
• Segmentation– Support masks for
computation of
features
– Efficiency
– Scalability
– 600K pixels 300 superpixels, 50 segments
– O(N) + O(log(M))
![Page 30: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/30.jpg)
Limb/Torso Detectors• Learn limb and torso
detectors from hand-labeled data
• Cues:– Contour
• Average edge strength on boundary
– Shape• Similarity to rectangle
– Shading• x,y gradients, blurred
– Focus• Ratio of high to low frequency
energies
![Page 31: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/31.jpg)
Assembling Partial Configurations
• Combinatorial search over sets of limbs and torsos– 3 half-limbs plus a torso
configurations
• Prune using global constraints– Proximity– Relative widths– Maximum lengths– Symmetry in colour
• Complete half-limbs– 2 or 3-limbed people
• Sort partial configurations– Use limb, torso, and segmentation scores
• Extend final limbs of best configurations
![Page 32: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/32.jpg)
Results
![Page 33: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/33.jpg)
Results
Rank 3
Rank 3
![Page 34: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/34.jpg)
Outline
• Human figures in motion– Action Recognition
• Localizing joint positions– Exemplar-based approach– Parts-based approach
• Motion Synthesis– Novel graphics application
![Page 35: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/35.jpg)
“Do as I Do” Motion Synthesis
• Matching two things:– Motion similarity across sequences– Appearance similarity within sequence
• Dynamic Programming
input sequence
synthetic sequence
![Page 36: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/36.jpg)
Smoothness for Synthesis
• is similarity between input and target frames
• is appearance similarity within target frames
• For input frames {i}, find best target frames { } by maximizing following cost function:
• Optimize using dynamic programming: – N frames in input sequence– M target frames in database
![Page 37: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/37.jpg)
“Do as I Do” SynthesisTarget Frames Input Sequence
Result
3400 Frames
QuickTime™ and a decompressor
are needed to see this picture.
![Page 38: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/38.jpg)
“Do as I Say” Synthesis
• Synthesize given action labels– e.g. video game control
run walk left swing walk right jog
synthetic sequence
run
walk leftswing
walk rightjog
![Page 39: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/39.jpg)
“Do as I Say”
• Red box shows when constraint is applied
QuickTime™ and a decompressor
are needed to see this picture.
![Page 40: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/40.jpg)
Frame 9½
Putting It All Together
• Can we do a better job of splicing clips together?
Frame 9 Frame 10
YES… if we can find the joints!
![Page 41: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/41.jpg)
Morphed Transitions
QuickTime™ and a decompressor
are needed to see this picture.
![Page 42: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/42.jpg)
8 Transitions
QuickTime™ and a decompressor
are needed to see this picture.
![Page 43: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/43.jpg)
Morphed Transitions
QuickTime™ and a decompressor
are needed to see this picture.
![Page 44: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/44.jpg)
3 Transitions
QuickTime™ and a decompressor
are needed to see this picture.
![Page 45: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/45.jpg)
Actor Replacement
• Rendering new character into existing footage
• Algorithm– Track original character– Find matches from new character– Erase original character– Render in new character
• Need to worry about occlusions
![Page 46: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/46.jpg)
Show the impressive video
![Page 47: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/47.jpg)
Future Directions
• Much remains to be done!
• Action Recognition– Using joint positions, shape: the “morpho-kinetics” of
action recognition– Better models of activities
• Detecting and localizing figures– Combining top-down exemplar methods with bottom-up
segmentation methods– Exploiting temporal cues
![Page 48: Recognizing Human Figures and Actions Greg Mori Simon Fraser University](https://reader038.vdocument.in/reader038/viewer/2022110213/56649e995503460f94b9c82d/html5/thumbnails/48.jpg)
Acknowledgements
• References– Mori, Belongie, and Malik, “Shape Contexts Enable Efficient Retrieval
of Similar Shapes”, CVPR 2001– Mori and Malik, “Estimating Human Body Configurations using Shape
Context Matching”, ECCV 2002– Efros, Berg, Mori, and Malik, “Recognizing Action at A Distance”
ICCV 2003– Mori and Malik, “Recognizing Objects in Adversarial Clutter: Breaking
a Visual CAPTCHA”, CVPR 2003– Mori, Ren, Efros, Malik, “Recovering Human Body Configurations:
Combining Segmentation and Recognition” CVPR 2004
• Thank you!