visuelle perzeption für mensch- maschine schnittstellen€¦ · comparison for each shape location...
TRANSCRIPT
Interactive Systems Laboratories, Universität Karlsr uhe (TH)Edgar Seemann, 04.12.09 1
Visuelle Perzeption für Mensch-Maschine Schnittstellen
Vorlesung, WS 2009
Prof. Dr. Rainer StiefelhagenDr. Edgar Seemann
Institut für AnthropomatikUniversität Karlsruhe (TH)
http://[email protected]
Interactive Systems Laboratories, Universität Karlsr uhe (TH)Edgar Seemann, 04.12.09 2
Computer Vision:
People Detection II
WS 2009/10
Dr. Edgar Seemann
Interactive Systems Laboratories, Universität Karlsr uhe (TH)Edgar Seemann, 04.12.09 3
Termine (1)
TBA21.12.2009
Stereo and Optical Flow18.12.2009
Termine Thema19.10.2009 Introduction, Applications
23.10.2009 Basics: Cameras, Transformations, Color
26.10.2009 Basics: Image Processing
30.10.2009 Basics: Pattern recognition
02.11.2009 Computer Vision: Tasks, Challenges, Learning, Performance measures
06.11.2009 Face Detection I: Color, Edges (Birchfield)
09.11.2009 Project 1: Intro + Programming tips13.11.2009 Face Detection II: ANNs, SVM, Viola & Jones
16.11.2009 Project 1: Questions
20.11.2009 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM
23.12.2009 Face Recognition II
27.11.2009 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention
30.11.2009 People Detection I
04.12.2009 People Detection II
07.12.2009 Project 1: Student Presentations, Project 2: Intro11.12.2009 People Detection III (Part-Based Models)
14.12.2009 Scene Context and Geometry
Edgar Seemann, 04.12.09 4
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciToday
� Last time: Global approaches� Compute a singe feature vector / representation
for the complete body
� Today:� Silhouette matching (global approach)
� Part-based approaches
Edgar Seemann, 04.12.09 5
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci
Silhouette Matching
Edgar Seemann, 04.12.09 6
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciChamfer Matching [Gavrila & Philomin ICCV’99]
Goal
• Align known object shapes with image
Object shapesReal-world image of object
Requirements for an alignment algorithm
• High detection rate
• Few false positives
• Robustness
• Computationally inexpensive
Edgar Seemann, 04.12.09 7
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci
� Used to compare/align two (typically binary) shapes
1. Compute for each pixel the distance to the next edge pixel� Here the eculidean distances are
approximated by the 2-3 distance
Distance Transform
Shape 1 Shape 2
Distance = ?
Distance transform
20
3
36
0
85222025
2024
630
2
520002
42222
4333
555
668
522024
30024
2024
235244
Edgar Seemann, 04.12.09 8
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciDistance Transform
2. Overlay second shape over distance transform
3. Accumulate distances along shape 24. Find best matching position by an exhaustive search
� Distance is not symmetric� Distance has to be normalized w.r.t. the length of the
shapes
Distance transform
20
3
36
0
85222025
2024
630
2
520002
42222
4333
555
668
522024
30024
2024
235244
Distance = 14
Edgar Seemann, 04.12.09 9
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciChamfer Matching
Distance measureDistance transform of a real-world image
Distance transform
20
3
36
0
85222025
2024
630
2
520002
42222
4333
555
668
522024
30024
2024
235244
Binary image
• Compute distance transform (DT)
• For each possible object location
• Position known object shape over DT
• Accumulate distances along the
contour
Chamfer Matching
Edgar Seemann, 04.12.09 10
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciEfficient implementation
� The distance transform can be efficiently computed by two scans over the complete image
� Forward-Scan� Starts in the upper-left corner and moves from left to right, top to
bottom
� Uses the following mask
� Backward-Scan� Starts in the lower-right corner and moves from right to left,
bottom to top
� Uses the following mask
3 2 3
2 0
3 2 3
20
Edgar Seemann, 04.12.09 11
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci
3
2
? ? ?
?
?
Forward scan
� We can choose different values for the filter mask
� The local distances, d, s and c, in the mask are added to the pixel values of the distance map and the new value of the zero pixel is the minimum of the five sums
� Example:
d s d
s c
2+2 2+3 3+5
2+0 2
3 2 3
2 0 ?
? ? ?
5
?
?
Edgar Seemann, 04.12.09 12
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciAdvantages and Disadvantages
� Fast� Distance transform has to be computed only once
� Comparison for each shape location is cheap
� Good performance on uncluttered images (with few background structures)
� Bad performance for cluttered images
� Needs a huge number of people silhouettes� But computation effort increases with the number of
silhouettes
Edgar Seemann, 04.12.09 13
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciTemplate Hierachy
� To reduce the number of silhouettes to consider, silhouettes can be organized in a template hierarchy
� For this, the shapes are clustered by similarity
Edgar Seemann, 04.12.09 14
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciSearch in the hierarchy
� Matching the shapes, then corresponds to a traversal of the template hierarchy
� How can we prune search branches to speed up matching?� Thresholds depend on:
� Edge detector (likelihood of gaps)
� Silhouette sizes
� Hierarchy level
� Allowed shape variation
� Thresholds are set statisticallyduring training
Edgar Seemann, 04.12.09 15
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciExample Detections
Edgar Seemann, 04.12.09 16
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciVideo
Edgar Seemann, 04.12.09 17
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciCoarse-To-Fine Search
� Goal: Reduce search effort by discarding unlikely regions with minimal computation
� Idea:� Subsample image and search
first at a coarse scale
� Only consider regions with alow distance when searchingfor a match on finer scales
� Again, we have to findreasonable thresholds
Level 1
Level 2
Level 3
Edgar Seemann, 04.12.09 18
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciProtector System (Daimler)
Edgar Seemann, 04.12.09 19
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciAdding edge orientation
� So far edge orientation has been completely ignored
� Idea: Consider edge orientation for each pixel
Distance = small
Edgar Seemann, 04.12.09 20
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciEdge orientation - The math
� Given two shapes S, C, we can express the chamfer distance in the following manner
� The orientation correspondence between two points is then measured by
� The combined distance measure:
Edgar Seemann, 04.12.09 21
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciStatistical Relevance
� Adding statistical relevance of silhouette regions further improves the results[Dimitrijevic06]
Edgar Seemann, 04.12.09 22
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciSpatio-Temporal templates
� Use multiple successive frames to build a spatio-temporal template (T={T1,…,TN})
� Allow spatial variations of dx, dy (due to motion or camera movements)
Edgar Seemann, 04.12.09 23
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciExample: single-frame vs. 3 frames
Edgar Seemann, 04.12.09 24
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciQuantitative Results
� Red: spatio-temporal templates + statistical relevance
Edgar Seemann, 04.12.09 25
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciVideo
� Restrict detection to a single articulation (when legs are in a v-shaped position)
� Spatio-Temporal templates:� Allows more reliable detection of motion direction
� Avoids confusions and some false positive detections
Edgar Seemann, 04.12.09 26
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciAlternatives: Earth Mover’s Distance
� Originally developed to compare histograms
� Idea: Find the minimal ‘flow’ to transform one histogram to another
� Example:
Edgar Seemann, 04.12.09 27
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciEarth mover’s distance (EMD)
Example detection
• Detect edges in image
• For each possible object location
• Optimize correspondences between
known shape and edge image
EMD MatchingChamfer:
EMD:
Distance measure basis
Edgar Seemann, 04.12.09 28
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciEMD – The math
� Variant of the transportation problem (possible solutions: Stepping Stone Algorithm, Transportation-simplex method)
� Constraints
� EMD-Distance
Edgar Seemann, 04.12.09 29
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciAdvantages and Disadvantages
� Optimizes matching between silhouette and edge structure in image
� Enforces one-to-one matchings (unlike chamfer)
� Allows partial matches
� Can deal with arbitrary features
� High computational complexity� Approximation is possible [Graumann, Darrel CVPR’94]
Edgar Seemann, 04.12.09 30
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci
Part-Based Models
Edgar Seemann, 04.12.09 31
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciPart-Based Models
� Fischler & Elschlager1973
� Model has two components� parts
(2D image fragments)
� structure (configuration of parts)
Edgar Seemann, 04.12.09 32
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciConfiguration of parts
� Fixed Spatial Layout� Local parts have a mostly fixed position and orientation
with respect to the center object center or detection window
� Flexible Spatial Layout� Local parts are allowed to shift in location and scale� Able to compensate for deformations or articulation
changes� Well suited for non-rigid objects � Spatial relations often modeled probabilistically
Edgar Seemann, 04.12.09 33
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci
K. Grauman, B. Leibe
Different Connectivity Structures
Fergus et al. ’03Fei-Fei et al. ‘03
Leibe et al. ’04, ‘08Crandall et al. ‘05Fergus et al. ’05
Crandall et al. ‘05 Felzenszwalb & Huttenlocher ‘05
Bouchard & Triggs ‘05 Carneiro & Lowe ‘06Csurka ’04Vasconcelos ‘00
from [Carneiro & Lowe, ECCV’06]
O(N6) O(N2) O(N3) O(N2)
Edgar Seemann, 04.12.09 34
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci
Fixed Spatial Layout
Edgar Seemann, 04.12.09 35
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciMotivation
� Breakdown overall variability into more manageable pieces
� Pieces can be classified by a less complex classifier
� Apply prior information by (manually) grouping the global feature vector into meaningful parts
Edgar Seemann, 04.12.09 36
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciExample: Sashua et al. IVS’04
� Divide person into 9 overlapping parts
� 4 pair combinations
Edgar Seemann, 04.12.09 37
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciFeature Representation
� Edge orientation histograms (similar to HOG)
� Each part is divided into 2x2 regions� i.e. 4 histograms per part
� 8 gradient orientations
� Features are insensitive to small local shifts
Edgar Seemann, 04.12.09 38
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciPart Classifier and Combination
� Parts are classified by a linear regression� i.e. distance to a separating hyperplane
� One score with respect to each of the 9 parts� Separate head vs. leg feature, head vs. upper body etc.
� The values of the individual part detectors are combined by AdaBoost
Edgar Seemann, 04.12.09 39
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciAttention mechanism
� Attention mechanism� Considers scene geometry, camera perspective
� Filters out regions with lack of texture properties
� Only 75 image windows are considered for pedestrian classification per frame
Edgar Seemann, 04.12.09 40
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciResults
� Lower curve:Global SVM
� Middle curve:Mohan et al.
� Upper curve:Sashua et al.
Edgar Seemann, 04.12.09 41
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciResults
Edgar Seemann, 04.12.09 42
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciExample: Mohan et al.
� Body divided into 4 parts� Face/Shoulder
� Legs
� Right/Left arm
� Detection� Sliding window
� 64x128 pixels
Edgar Seemann, 04.12.09 43
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciGeometric constraints
� Body parts are not always at the exact same position
� Allow local shifts� Position
� Scale
� Best location has to befound for each detectionwindow
Edgar Seemann, 04.12.09 44
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciTraining Data
� MIT pedestrian database
Edgar Seemann, 04.12.09 45
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciFeatures
� Based on wavelets� Multi-Resolution representation
� Haar wavelets
� Quadrupledensity
Edgar Seemann, 04.12.09 46
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciDiscrete Wavelet Transform (DWT)
Motivation
� Representation with basis functions
� Given: Signal [9 7 3 5]
� Disadvantages:� Coefficients provide only
local information
� No information aboutglobal signal change
http://nt.eit.uni-kl.de/wavelet/transformation.html
Edgar Seemann, 04.12.09 47
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciHaar Wavelet Basis
� Scaling function provides information about the mean value or signal level
� Representation inwavelet basis:� [12 4 sqrt(2) -sqrt(2)]
Edgar Seemann, 04.12.09 48
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci1D Wavelet Transformation
� Step wise transformation with orthogonal
spaces
� Basis function for Vj� Scaling function
� Basis function for Wj
� Wavelet function
Edgar Seemann, 04.12.09 49
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciOur Example
� Project signal onto subspaces
V1 W1
V2 W2
Edgar Seemann, 04.12.09 50
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ci2D Haar-Wavelets
� Same princple as in 1D
� Quadruple densityi.e. overcompleterepresentation
Edgar Seemann, 04.12.09 51
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciFeature Discussion
� Haar wavelets represent pixel/region differences
� Strong responses of wavelet coefficients on image boundaries, i.e. coefficients encode shape
� Similar to gradients at different resolutions
� Overcomplete basis allows good spatial resolution
� Fine scales are discarded as they represent noise
� Coarse scales are discarded as they represent global intensity levels and not the shape
Edgar Seemann, 04.12.09 52
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciPerformance: Part Detectors
Edgar Seemann, 04.12.09 53
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciClassifier Combination
� Voting� E.g. majority of part detectors, classify detection
window as person
� SVM combination� Feed SVM scores of part detectors in a second stage
SVM
� Referred to as Adaptive Classifier Combination
Edgar Seemann, 04.12.09 54
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciCombination Performance
Edgar Seemann, 04.12.09 55
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciResults
Edgar Seemann, 04.12.09 56
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(T
H)
cv:h
ciOcclusion