learning realistic human actions from...
TRANSCRIPT
![Page 1: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/1.jpg)
Learning realistic human actions from movies
by Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld
PRESENTATION BY KERRY SEITZ
1
![Page 2: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/2.jpg)
The Problem
Recognize natural human actions
Realistic videos
Getting out of a car
Answering a phone
Performing CPRKissing
2[LAPTEV ET AL. 2008]
![Page 3: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/3.jpg)
Challenges
Lack of datasets
Variations in:◦ Expression, posture, motion, and clothing
◦ Camera motion and perspective
◦ Illumination
◦ Occlusion and surroundings
3[LAPTEV ET AL. 2008]
![Page 4: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/4.jpg)
Automatic Annotation of Human Actions
Use movie scripts
Problems◦ No time information
◦ Script and movie don’t always match
◦ Variations in phrasing
4
![Page 5: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/5.jpg)
Script-to-Video Alignment
5[LAPTEV ET AL. 2008]
![Page 6: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/6.jpg)
Script-to-Video Alignment
Alignment score (a) for each scene◦ Script-subtitle misalignment
◦ a = (# matched words) / (# all words)
Types of errors when a=1◦ Misaligned in time (10%)
◦ Outside the field of view (10%)
◦ Missing in the video (10%)
6[LAPTEV ET AL. 2008]
![Page 7: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/7.jpg)
Text Retrieval of Human Actions
Phrasing variations◦ “Will gets out of the Chevrolet.”
◦ “A black car pulls up. Two army officers get out.”
◦ “Erin exits her new truck.”
False positives◦ “About to sit down, he freezes.”
Keyword search is insufficient!
7
![Page 8: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/8.jpg)
Text Retrieval of Human Actions
Train classifier for each action (bag of features model)◦ Words
◦ Adjacent pairs of words
◦ Pairs of words within a window of N words (2 ≤ N ≤ 8)
Regularized perceptron◦ Equivalent to SVM
◦ Trained on manually labeled scene descriptions
◦ Tuned using validation set
8
![Page 9: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/9.jpg)
Text Retrieval of Human Actions
9[LAPTEV ET AL. 2008]
![Page 10: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/10.jpg)
The Datasets
Manual and Test Sets◦ Manually annotated scripts
◦ Manually selected visually-correct action samples
Automatic Set◦ Automatically annotated scripts
◦ Automatically selected action samples
◦ a > 0.5
◦ Length < 1,000 frames
10[LAPTEV ET AL. 2008]
![Page 11: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/11.jpg)
KTH Dataset
11[LAPTEV ET AL. 2008]
![Page 12: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/12.jpg)
Action Recognition
Sparse space-time features◦ Compact representation
◦ Tolerant to background clutter, occlusions, and scale changes
Interest point detection – Harris operator◦ Multiple levels of spatio-temporal scales
12
![Page 13: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/13.jpg)
Interest Point Detection
13[LAPTEV ET AL. 2008]
![Page 14: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/14.jpg)
Features at the Interest points
Histogram of descriptors of space-time volumes◦ Volumes divided into (nx, ny, nt) grid of cuboids
◦ Compute histogram of oriented gradients (HoG)
◦ Compute histogram of optic flow (HoF)
14[IKIZLER ET AL. 2008]
![Page 15: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/15.jpg)
Spatio-Temporal Bag-of-Features
k-means with 4,000 clusters
Different grid sizes
Classify with non-linear SVM
15[LAPTEV ET AL. 2008]
![Page 16: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/16.jpg)
Evaluation ofSpatio-Temporal Grids
16[LAPTEV ET AL. 2008]
![Page 17: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/17.jpg)
Evaluation ofSpatio-Temporal Grids
17[LAPTEV ET AL. 2008]
![Page 18: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/18.jpg)
Comparison to theState-of-the-Art
KTH Dataset Divided into:◦ Training/validation set (8+8 people)
◦ Test set (9 people)
Use best performing channel combination
18[LAPTEV ET AL. 2008]
![Page 19: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/19.jpg)
Confusion Matrix
19[LAPTEV ET AL. 2008]
![Page 20: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/20.jpg)
Noise in Training Data
20[LAPTEV ET AL. 2008]
![Page 21: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/21.jpg)
Results for Real-World Videos
21[LAPTEV ET AL. 2008]
![Page 22: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/22.jpg)
Examples
22[LAPTEV ET AL. 2008]
![Page 23: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/23.jpg)
Summary
Automatic annotation using movie scripts
Action recognition performs better than state-of-the-art
System tolerant to errors in training data
23
![Page 24: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/24.jpg)
Future Work
Improve script-to-video alignment
Improve tolerance of classifier◦ Iterative learning
Experiment with other space-time low-level features
24
![Page 25: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/25.jpg)
Questions?
25[LAPTEV ET AL. 2008]
![Page 26: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin](https://reader031.vdocument.in/reader031/viewer/2022031307/5bf800eb09d3f2e7208b8e5a/html5/thumbnails/26.jpg)
References
Learning Realistic Human Actions from Movies. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. CVPR 2008.
Human Action Recognition with Line and Flow Histograms. N. Ikizler, G. Cinbis, and P. Duygulu. ICPR 2008.
26