a videography analysis framework for video retrieval and ...€¦ · shots sub-segments (camera...
TRANSCRIPT
A Videography Analysis Framework for
Video Retrieval and Summarization
Kang Li*1,3, Sangmin Oh*2, Amitha Perera2, and Yun Fu1,3
(* Indicates equal contribution)
1SUNY at Buffalo, 2Kitware Inc., 3Northeastern University
Presenter: Yun Fu
Research sponsored by IARPA ALADDIN Program
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Project leading us to this paper
TRECVID Multimedia Event Detection 2011
• Very large collection of web videos
• Complex events – 5 Training + 10 Test events:
• Wedding, changing a tire, woodworking project etc.
– Full clips: Includes stitching, severe camera motion, significant temporal and spatial clutter; 0.5-60 minutes duration
• Ratio of known events to clutter is 1:99 in test data
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Dataset Samples
Examples from Event ‘Board Trick’
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Dataset Samples
Examples from Event ‘Landing a fish’
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Dataset Samples
Examples from Event ‘Flash Mob’
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Framework
(c) Quantization & Learning (b) Videography Dictionary
(a) Videography Feature Extraction
Motion Intention (Correlation)
Shots Boundary Detection
Camera Operation Classification
Zoom Pan Tilt Static
Foreground Motion Style
Camera Motion Type
Foreground Scale
Step 2: Videography Feature Extraction Step 1: Clip Decomposition
Videos with related contents e.g., Skate board videos
Time
(e) Adaptive Summarization
(d) Retrieval
o x x Training
Videography Features
FG/BG Motion
Separation
Test (Query) Collection Features
Archive
…
Zoom-in + Small Face
…… Zoom-in +
Large Face
Pan + Small FG motion
Pan + Large FG motion
…
Tilt + No Face
Tilt + Large Face
……
……
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Outline
• Videography features
• Videography dictionary and analysis
• Experiment results
1) Video Retrieval
2) Video Summarization
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Outline
• Videography features
• Videography dictionary and analysis
• Experiment results
1) Video Retrieval
2) Video Summarization
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography features
Step 1. Clip Decomposition
– Two-level motion analysis, mainly based on densely computed KLT[17] tracks
Shots Boundary Detection
Camera Operation Classification
Zoom Pan Tilt Static
Level 1. Clip Shots
(shot boundary detection [23])
– Cut
– Fade-Out-In
Level 2. Shots Sub-segments
(camera motion estimation [14, 24])
– Static
– Pan/Tilt
– Zoom-in/Zoom-out
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography features
Camera motion estimation
• FG/BG separation
• Use BG motion to estimate camera parameters
• Red: background
• Green: foreground
• Yellow: Corrected by
camera motion
compensation
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography features
Camera motion estimation • Examples from Event ‘Getting vehicle unstuck’
• Red:
background
• Green:
foreground
• White arrows:
Camera
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography features
Camera motion estimation • Examples from Event ‘Parkour’
• Red:
background
• Green:
foreground
• White arrows:
Camera
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography features
Step 2. Feature extraction for each segment
BG
Motion
FG
Motion
Face
Scale
FG/BG
Correlation
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Outline
• Videography features
• Videography dictionary and analysis
• Experiment results
1) Video Retrieval
2) Video Summarization
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography dictionary and analysis …
Zoom-in + Small Face
…… Zoom-in +
Large Face
Pan + Small FG motion
Pan + Large FG motion
…
Tilt + No Face
Tilt + Large Face
……
……
Samples
Samples
BG FG Corr Face
Features
Videography dictionary
(VD)
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography dictionary and analysis
• Green:
foreground
• Orange:
face bounding
box
• White arrows:
Camera
Pan-left + Large FG motion
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography dictionary and analysis
Zoom-in + Small Face
• Green:
foreground
• Orange:
face bounding
box
• White arrows:
Camera
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography dictionary and analysis
The computed VD will be used to quantize video clips into sequences of videography words (VWs).
Time
…
Zoom-in + Small Face
…… Zoom-in +
Large Face
Pan + Small FG motion
Pan + Large FG motion
…
Tilt + No Face
Tilt + Large Face
……
……
Videography dictionary (VD) Quantized video clips based on VD
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography dictionary and analysis
Test our core intuition:
– Correlations between VWs and particular visual content
Mutual information between events and words
High
L
ow
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Videography dictionary and analysis
Examples
– Intuitively understand videography styles of each event
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Outline
• Videography features
• Videography dictionary and analysis
• Experiment results
1) Video Retrieval
2) Video Summarization
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Outline
• Videography features
• Videography dictionary and analysis
• Experiment results
1) Video Retrieval
2) Video Summarization
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Video Retrieval
Average Precision (%) of video retrieval results on MED corpus, for 15 events.
Mean average precision (%) of video retrieval on MED corpus.
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Outline
• Videography features
• Videography dictionary and analysis
• Experiment results
1) Video Retrieval
2) Video Summarization
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Video Summarization
Detect salient contents and summarize video clip as key frames collection.
– Compute a saliency score to each segment.
– High correlation indicates salient segments.
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Sample summarization result
Proposed method Baseline
Board trick
Birthday party
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
More..
Proposed method
Parkour
Proposed method captured key contents of video
– For dynamic events, such as “Parkour”, “Board trick”,
highlight moments are extracted.
– For static events, such “Birthday party”, key scenes with
key objects are extracted.
Baseline
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Conclusion
• The introduced features and data-driven VD learning helps to
identify characteristic videography among videos from the same
events.
• Proposed method benefits high-level video content analysis
– Video retrieval
– Video summarization
• Videography features capture unique aspects of videos and can be
jointly used with other features
Videography Analysis - BMVC2012 Kang Li, Sangmin Oh, Amitha Perera, and Yun Fu
Thank you!