overview - pennsylvania state university
TRANSCRIPT
![Page 1: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/1.jpg)
Overview • Recall last class: Boos1ng is a way of genera1ng a strong classifier as a weighted ensemble of weak ones
• Today: Support Vector Machine (SVM) training generates a strong classifier directly
• Case Study: Dalal and Triggs pedestrian detector
![Page 2: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/2.jpg)
Support Vector Machines SVM slides from Kristen Grauman, UT-‐Aus1n
Other good resources: Presentation slides from Christoph Lampert: https://sites.google.com/site/christophlampert/teaching/kernel-methods-for-object-recognition Simple tutorial document by Chris Williams http://www.inf.ed.ac.uk/teaching/courses/iaml/docs/svm.pdf Video lecture by Pat Winston: https://www.youtube.com/watch?v=_PwhiWxHK8o
![Page 3: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/3.jpg)
Linear classifiers
Find linear function to separate positive and negative examples
![Page 4: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/4.jpg)
Lines in R2
0=++ bcyax
⎥⎦
⎤⎢⎣
⎡=ca
w ⎥⎦
⎤⎢⎣
⎡=yx
xLet
![Page 5: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/5.jpg)
Lines in R2
0=+⋅ bxw
⎥⎦
⎤⎢⎣
⎡=ca
w ⎥⎦
⎤⎢⎣
⎡=yx
x
0=++ bcyax
Let
w
![Page 6: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/6.jpg)
Linear classifiers • Find linear function to separate positive and
negative examples
0:negative0:positive
<+⋅
≥+⋅
bb
ii
ii
wxxwxx
Which line is best?
![Page 7: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/7.jpg)
Support Vector Machines (SVMs)
• Discriminative classifier based on optimal separating line (for 2d case)
• Maximize the margin between the positive and negative training examples
![Page 8: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/8.jpg)
Support vector machines • Want line that maximizes the margin.
1:1)(negative1:1)( positive−≤+⋅−=
≥+⋅=
byby
iii
iii
wxxwxx
Margin Support vectors
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
For support, vectors, 1±=+⋅ bi wx
![Page 9: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/9.jpg)
Lines in R2
0=+⋅ bxw
⎥⎦
⎤⎢⎣
⎡=ca
w ⎥⎦
⎤⎢⎣
⎡=yx
x
0=++ bcyax
Let
w
( )00 , yx
D
![Page 10: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/10.jpg)
Lines in R2
0=+⋅ bxw
⎥⎦
⎤⎢⎣
⎡=ca
w ⎥⎦
⎤⎢⎣
⎡=yx
x
0=++ bcyax
Let
w
( )00 , yx
D
wxw b
ca
bcyaxD +
=+
++=
Τ
22
00 distance from point to line
![Page 11: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/11.jpg)
Lines in R2
0=+⋅ bxw
⎥⎦
⎤⎢⎣
⎡=ca
w ⎥⎦
⎤⎢⎣
⎡=yx
x
0=++ bcyax
Let
w
( )00 , yx
D
wxw b
ca
bcyaxD +
=+
++=
Τ
22
00 distance from point to line
![Page 12: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/12.jpg)
Support vector machines • Want line that maximizes the margin.
1:1)(negative1:1)( positive−≤+⋅−=
≥+⋅=
byby
iii
iii
wxxwxx
Margin M Support vectors
For support, vectors, 1±=+⋅ bi wx
Distance between point and line: ||||
||wwx bi +⋅
www211
=−
−=Mwwxw 1±
=+ bΤ
For support vectors:
![Page 13: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/13.jpg)
Support vector machines • Want line that maximizes the margin.
1:1)(negative1:1)( positive−≤+⋅−=
≥+⋅=
byby
iii
iii
wxxwxx
Margin Support vectors
For support, vectors, 1±=+⋅ bi wx
Distance between point and line: ||||
||wwx bi +⋅
Therefore, the margin is 2 / ||w||
![Page 14: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/14.jpg)
Finding the maximum margin line 1. Maximize margin 2/||w|| 2. Correctly classify all training data points:
Quadratic optimization problem: Minimize
Subject to yi(w·xi+b) ≥ 1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
wwT21
1:1)(negative1:1)( positive−≤+⋅−=
≥+⋅=
byby
iii
iii
wxxwxx
One constraint for each training point. Note sign trick.
![Page 15: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/15.jpg)
Finding the maximum margin line • Solution:
∑= i iii y xw α
Support vector
learned weight
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 16: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/16.jpg)
Finding the maximum margin line • Solution:
b = yi – w·xi (for any support vector)
• Classification function:
• Notice that it relies on an inner product between the test point x and the support vectors xi
• (Solving the optimization problem also involves computing the inner products xi · xj between all pairs of training points)
∑= i iii y xw α
bybi iii +⋅=+⋅ ∑ xxxw α
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
f (x) = sign (w ⋅x+ b)
= sign αiyii∑ xi ⋅x+ b( )If f(x) < 0, classify as negative, if f(x) > 0, classify as positive
![Page 17: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/17.jpg)
Ques1ons • What if the features are not 2d? • What if the data is not linearly separable? • What to do for more than two classes?
![Page 18: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/18.jpg)
Ques1ons • What if the features are not 2d?
– Generalizes to d-‐dimensions – replace line with “hyperplane”
• What if the data is not linearly separable? • What to do for more than two classes?
![Page 19: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/19.jpg)
Planes in R3
0=+++ dczbyax
0=+⋅ dxw
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
=
cba
w⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
=
zyx
xLet w
wxw d
cba
dczbyaxD +
=++
+++=
Τ
222
000 distance from point to plane
( )000 ,, zyx
D
![Page 20: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/20.jpg)
Hyperplanes in Rn
02211 =++++ bxwxwxw nn…
Hyperplane H is set of all vectors which satisfy:
nR∈x
0=+Τ bxw
wxwx bHD +
=Τ
),(distance from point to hyperplane
![Page 21: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/21.jpg)
Ques1ons • What if the features are not 2d? • What if the data is not linearly separable?
![Page 22: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/22.jpg)
Nonlinear SVMs
Slide from Andrew Zisserman
![Page 23: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/23.jpg)
Nonlinear SVMs
Slide from Andrew Zisserman
![Page 24: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/24.jpg)
Nonlinear SVMs
Slide from Andrew Zisserman
![Page 25: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/25.jpg)
The Kernel Trick
• Recall we transformed linear regression into nonlinear regression using a feature vector Φ(x) and, ul1mately, the “kernel trick.”
• We also use the kernel trick here to transform a linear classifier into nonlinear one.
![Page 26: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/26.jpg)
Example Kernel
Slide from Andrew Zisserman
![Page 27: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/27.jpg)
Example Kernels
Slide from Andrew Zisserman
![Page 28: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/28.jpg)
Nonlinear SVMs • The kernel trick: instead of explicitly computing
the lifting transformation φ(x), define a kernel function K such that
K(xi , xjj) = φ(xi ) · φ(xj)
• This gives a nonlinear decision boundary in the original feature space:
bKyi
iii +∑ ),( xxα
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 29: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/29.jpg)
Ques1ons • What if the features are not 2d? • What if the data is not linearly separable? • What to do for more than two classes?
![Page 30: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/30.jpg)
Mul1-‐class SVMs • Achieve mul1-‐class classifier by combining a number of binary
classifiers
• One vs. all – Training: learn an SVM for each class vs. the rest – Tes1ng: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value
• One vs. one – Training: learn an SVM for each pair of classes – Tes1ng: each learned SVM “votes” for a class to assign to the test example
![Page 31: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/31.jpg)
Software for SVMs
![Page 32: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/32.jpg)
SVMs for recognition 1. Define a vector representation for
each example.
2. Select a kernel function.
3. Compute pairwise kernel values between labeled examples
4. Given this “kernel matrix” to SVM optimization software to identify support vectors & weights.
5. To classify a new example: compute kernel values between new input and support vectors, apply weights, check sign of output.
![Page 33: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/33.jpg)
Case Study: Pedestrian Detector Navneet Dalal and Bill Triggs,
“Histograms of Oriented Gradients for Human DetecGon,” CVPR 2005
![Page 34: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/34.jpg)
Dalal and Triggs CVPR’05 • Detect upright pedestrians • Histogram of oriented gradient feature vector • Linear SVM classifier; sliding window detector
64X128 HoG descriptor
![Page 35: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/35.jpg)
HoG Feature Extrac1on
![Page 36: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/36.jpg)
HoG Feature Extrac1on: Cells 64x128
Compute gradients
Each cell contains a histogram of gradient orientations, weighted by gradient magnitude
![Page 37: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/37.jpg)
HoG Feature Extrac1on: Blocks
“Each scalar cell response contributes several components to the final descriptor vector, each normalized with respect to a different block. This may seem redundant but good normalization is critical and including overlap significantly improves the performance.” Dalal&Triggs CVPR’05
2x2 block of cells normalize [ , , , , ... , ]
![Page 38: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/38.jpg)
38
HoG Design Choices Parameters • Gradient scale • Orienta1on bins • Block overlap area
ε+←2
2/ vvv
Other choices n RGB or Lab, Color/gray n Block normalization
L2-hys,
or L1-sqrt,
Cel
l Cen
ter
bin
Block
R-H
OG
/SIF
T C-
HO
G
)/(1ε+← vvv
![Page 39: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/39.jpg)
Parameter / design choices were guided by extensive experimentation to determine empirical effects on detector performance (e.g. miss rate)
![Page 40: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/40.jpg)
Dalal&Triggs Detector • Default detector configura1on:
– RGB colour space with no gamma correc1on ; – [−1, 0, 1] gradient filter with no smoothing ; – linear gradient vo1ng into 9 orienta1on bins in 0◦–180◦; – 16×16 pixel blocks of four 8×8 pixel cells; – Gaussian spa1al window with σ = 8 pixel; – L2-‐Hys (Lowe-‐style clipped L2 norm) block normaliza1on; – block spacing stride of 8 pixels (hence 4-‐fold coverage of each cell) ;
– 64×128 detec1on window ; – linear SVM classifier.
![Page 41: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/41.jpg)
41
Detector Architecture
Learn binary classifier
Encode images into feature vectors
Create normalised training data set
Object/Non-object decision
Fuse multiple detections in 3-D position & scale space
Run classifier to obtain object/non-object decisions
Scan image at all scales and locations
Object detections with bounding boxes
Learning Phase Detection Phase
![Page 42: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/42.jpg)
Posi1ve and nega1ve examples
+ thousands more…
+ millions more…
![Page 43: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/43.jpg)
![Page 44: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/44.jpg)
![Page 45: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/45.jpg)
Person detec1on with HoG & linear SVM
[Dalal and Triggs, CVPR 2005]
Soft (C=0.01) linear SVM trained with SVMLight.
![Page 46: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/46.jpg)
To detect people at all locaGons and scales:
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
![Page 47: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/47.jpg)
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
To detect people at all locaGons and scales:
![Page 48: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/48.jpg)
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
To detect people at all locaGons and scales:
![Page 49: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/49.jpg)
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
To detect people at all locaGons and scales:
![Page 50: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/50.jpg)
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
To detect people at all locaGons and scales:
![Page 51: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/51.jpg)
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
To detect people at all locaGons and scales:
![Page 52: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/52.jpg)
• Sliding window using learnt HOG template
• Post-‐processing using non-‐maxima suppression
To detect people at all locaGons and scales:
![Page 53: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/53.jpg)
Non-maximum Suppression across Scales
![Page 54: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/54.jpg)
![Page 55: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/55.jpg)
![Page 56: Overview - Pennsylvania State University](https://reader030.vdocument.in/reader030/viewer/2022020620/61e4ab02b91fe92f4d7487de/html5/thumbnails/56.jpg)
Dalal and Triggs Summary • HoG feature representa1on • Linear SVM classifier; sliding window detector • Non-‐maximum suppression across scale • Use of detector performance metrics to guide turning of system parameters
• Detec1on rate 90% at 10-‐4 FP per window • Slower than Viola-‐Jones detector