decision forests and discriminant analysis
Post on 10-May-2015
1.648 Views
Preview:
DESCRIPTION
TRANSCRIPT
Randomised Decision Forests
Tae-Kyun Kim
http://www.iis.ee.ic.ac.uk/icvl/
1
BMVA 2014
Computer Vision Summer School
Randomised Forests in the field
[Moosmann et al., 06]
[Shotton et al., 08]
water
boat chair
tree
road
Visual Words
[Lepetit et al., 06]
Keypoint Recognition
[Gall et al., 09]
Hough Forest
[Shotton et al., 11]
KINECT
3
Randomised Forests @ ICL
ICCV13 (oral),
CVPR14 (oral) CVPR13
CVPR14 ICCV13
http://www.iis.ee.ic.ac.uk/icvl/publication.htm
4
Randomised Forests @ ICL
Ongoing, partially at IJCV12
BMVC12 BMVC10
ECCV 10,
ICCV 11
http://www.iis.ee.ic.ac.uk/icvl/publication.htm
CVPRW14
ECCV14
Resource
• ICCV09 Tutorial on Boosting and Random Forest (by T-
K Kim et al)
http://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html
• MVA13 Tutorial on Randomised Forests and Tree-
structured Algorithms (by T-K Kim)
http://www.iis.ee.ic.ac.uk/icvl/res_interest.htm
• ICCV11 Tutorial on Decision Forests (by Criminisi and
Shotton) http://research.microsoft.com/en-
us/projects/decisionforests/
• ICCV13 Tutorial on Decision Forests and Field (by
Nowozin and Shotton) http://research.microsoft.com/en-
us/um/cambridge/projects/iccv2013tutorial/
5
Resource
• For Random Forest Matlab codes – (i) P. Dollar’s toolbox
http://vision.ucsd.edu/~pdollar/toolbox/doc/
– (ii) http://code.google.com/p/randomforest-matlab/
– (iii) http://www.mathworks.co.uk/matlabcentral/fileexchange/31036-random-forest
– (iv) Karpathy’s toolbox https://github.com/karpathy/Random-Forest-Matlab.
• For open-source visualisation tool for machine learning algorithms, including RF – MLDEMOS http://mldemos.epfl.ch/
6
Boosting
Cascade
7
x
x
x
…… t1 tT t2
Boosting
Decision tree Randomised Decision Forest
[IJCV 2012]
[NIPS 2008]
Part I
8
f(x)
ΔE
In
Il Ir
Split node
x
……
y
Tree 1 Tree 2 Tree N
p(c)
Leaf node
…
More discriminative split functions
BMVC12
Multiple split criteria /adaptive weighting
CVPR13 ICCV13
Capturing structural info. BMVC10
Probabilistic outputs
ICCV11 ECCV10
Part II
Tutorial Structure
• Part I: Tree-structured Algorithms • Computational Demands
• Boosting
– Speed-up, Multiple-classes
• Committee Machine
• Randomised Decision Forest (Basics)
9
Criminisi and Shotton, ICCV11 tutorial
• Part II: Randomised Forest • Classification forest
• Regression forest
• Density/Manifold forest
• Semi-supervised forest
Tutorial Structure
10
f(x)
ΔE
In
Il Ir
Split node
p(c)
Leaf node
…
• Part II: Randomised Forests (Case studies) • Capturing structural information
• Probabilistic interpretation
of outputs
• More discriminative
split functions
• Multiple split criteria / adaptive weighting
• Part I: Tree-structured Algorithms • Computational Demands
• Boosting
– Speed-up, Multiple-classes
• Committee Machine
• Randomised Decision Forest (Basics)
PART I
Tree-structured Algorithms
11
Extremely fast classification by
tree structure algorithms
12
Object Detection
13
Object Detection
14
15
Object Detection
Number of windows
16
Number of Windows: 747,666
x # of scales
# of pixels
Time per window
17
or raw pixels
……
dim
ensi
on D
Num of feature
vectors: 747,666
…
Classification:
Time per window
18
or raw pixels
……
dim
ensi
on D
Num of feature
vectors: 747,666
…
Time per window (or vector):
0.00000134 sec
In order to finish the task in 1 sec
Neural Network?
Nonlinear SVM?
Real-Time Object Tracking
by Fast (Re-) Detection
19
From
Time t to
t+1
Detection again
Search region
• Online discriminative feature selection [Collins et al 03], Ensemble tracking [Avidan 07]
Previous object
location
Semantic Segmentation
• Requiring pixel-wise classification
20
Input Output
• Integrating Visual Cues [Darrell et al IJCV 00]
– Face pattern detection output (left).
– Connected components recovered from stereo range data (mid).
– Flesh hue regions from skin hue classification (right).
21
More traditionally…
Narrow down the search space
Since about 2001…
Boosting Simple Features [Viola &Jones 01]
• Adaboost classification
• Weak classifiers: Haar-basis like functions (45,396
in total)
22
Weak classifier
Strong classifier
Boosting Simple Features [Viola and Jones CVPR 01]
• Integral image
– A value at (x,y) is the sum of the
pixel values above and to the left of
(x,y).
• The sum of original image
values within the rectangle can
be computed: Sum = A-B-C+D
23
Boosting as an optimisation
framework
24
• Input/output:
• Init:
• For t=1 to T
– Learn ht that minimises
– Set the hypothesis weight
– Update the sample weights
• Break if
AdaBoost [Freund and Schapire 04]
25
via min
Boosting Iteratively reweighting training samples.
Higher weights to previously misclassified samples.
26
1 round 2 rounds 3 rounds 4 rounds 5 rounds 50 rounds Slide credit to J. Shotton
Unified Boosting Framework
AnyBoost : unified framework [Mason et al 00]
• Most boosting algorithms have in common that they iteratively update sample weights and select the next hypothesis based on the weighted samples.
• Input: , Loss ftn , Output:
• Init:
• For t=1 to T
– Find that minimises the weighted error
– Set
– Update
28
AnyBoost an example
• Loss ftn:
• Weight samples:
29
AnyBoost related
30
• Multi-component boosting [Dollar et al ECCV08]
• MP boosting [Babenko et al ECCVW08]
• MCBoost [Kim and Cipolla NIPS08]
Noisy-OR boosting for multiple
instance learning [Viola et al
NIPS06]
Boosting as a Tree-structured
Classifier
Boosting (very shallow network)
• The strong classifier H as boosted decision stumps has
a flat structure
– Cf. Decision “ferns” has been shown to outperform “trees” [Zisserman et
al, 07] [Fua et al, 07] 32
c0 c1 c0 c1 c0 c1 c0 c1 c0 c1 c0 c1
x
…… ……
Boosting -continued
• Good generalisation by a flat structure
• Fast evalution
• Sequential optimisation
33
A strong boosting classifier
Boosting Cascade [viola & Jones
04], Boosting chain [Xiao et al]
Very unbalanced tree
Speeds up for unbalanced binary problems
Hard to design
Speeding up
A sequential structure of varying
length • FloatBoost [Li et al PAMI04]
– A backtrack mechanism deleting non-effective weak-learners at each round.
• WaldBoost [Sochman and Matas CVPR05]
– Sequential probability ratio test for the shortest set of weak-learners for the given error rate.
35
Classify x as + and exit
Classify x as - and exit
Making a Shallow Network Deep
36
v
v
T-K Kim, I Budvytis, R. Cipolla, IJCV 2012
Imperial College, Univ. of Cambridge
37
Decision tree Boosting
• Many short paths for speeding up
• Preserving (smooth) decision regions for good
generalisation
Converting a boosting classifier to a
decision tree
• Many short paths for speeding
up
• Preserving (smooth) decision
regions for good generalisation
38
6
8
11
16
13
2
2
3
2
1
4
7
5 times
speed up
Boosting
Super tree
Converting a boosting classifier to a
decision tree
39
Boolean optimisation formulation
• For a learnt boosting classifier
split a data space into 2m primitive regions by m binary
weak-learners.
• Code regions Ri i=1,..., 2m by boolean expressions.
40
R3
R5
R6
R1 R2
R4 R7
W1
0
0
0
1
1
1
W3
W1 W2 W3 C
R1 0 0 0 0
R2 0 0 1 0
R3 0 1 0 0
R4 0 1 1 1
R5 1 0 0 1
R6 1 0 1 1
R7 1 1 0 1
R8 1 1 1 x
W2
Boolean expression minimization
• Optimally joining the regions of the same class label or don’t
care label.
• A short tree built from the minimised boolean expression.
41
W2
R3
R5
R6
R1 R2
R4 R7
W1
0
0
0
1
1
1
W3
W1 W2 W3 C
R1 0 0 0 0
R2 0 0 1 0
R3 0 1 0 0
R4 0 1 1 1
R5 1 0 0 1
R6 1 0 1 1
R7 1 1 0 1
R8 1 1 1 x
0 1
0
0
1
1
W1
W2
W3
1 0
0
1
R4
R5,R6,R7,R8
R1,R2
R3
W1W2W3 v W1W2W3 v W1W2W3 vW1W2W3 W1 v W1W2W3
Boolean optimisation formulation
• Optimally short tree defined with average
expected path length of data points
where p(Ri)=Mi/M and the tree must duplicate the Boosting
decision regions.
42
Solutions Boolean expression minimization
Growing a tree from the decision regions
Extended region coding
Examples
generated from
GMMs
Synthetic data exp1
43
Face detection experiment
Boosting Fast exit [Zhou 05] Super tree
No. of weak learners
False positive rate
False negativerate
Average path length
False positiverate
False negativerate
Average path length
False positive rate
False negative rate
Average path length
20 501 120 20 501 120 11.70 476 122 7.51
40 264 126 40 264 126 23.26 258 128 15.17
60 222 143 60 222 143 37.24 246 126 13.11
100 148 146 100 148 146 69.28 145 152 15.1
200 120 143 200 120 143 146.19 128 146 15.8
44
Total data points = 57499
The proposed solution is about 3 to 5 times faster than boosting and
1.5 to 2.8 times faster than [Zhou 05], at the similar accuracy.
ST vs Random forest
• ST is about 2 to 3 times faster than RF at similar accuracy
45
46
Experiments on tracking and
segmentation by ST
Segmentation by binary-class
RFs
47
global :
74.50%
average:
79.89%
global:
88.33%,
average:
87.26%
global:
77.46%,
average:
80.45%
global:
85.55 %,
average:
85.24 %
Building Non-building Error
Tree Non-tree Error Car Non-car Error
Road Non-road Error
Multi-Class Boosting
Multi-view and multi-category
object detection
• Images exhibit multi-modality.
• A single boosting classifier is not sufficient.
• Manual labeling sub-categories.
49
Images from Torralba et al 07
Multiple Classifier Boosting
(MCBoost)
T-K Kim, R Cipolla, NIPS 08
Problem
51
K-means clustering
Face cluster 1
Face cluster 2 MCBoost
MCBoost: Multiple Classifier Boosting
• Objective ftn:
• The joint prob. as a Noisy-OR:
where
• By AnyBoost Framework [Mason et al. 00]
– For t=1 to T
For k=1 to K
– Update sample weights
52
State diagram
53
Toy XOR classification problem
• Discriminative clustering of positive samples.
• It needs not partition an input space.
54
Toy XOR classification problem
• Matlab demo
classifier1 classifier2 classifier3
We
akle
arn
er
we
igh
t
Boosting round 10 20 30
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
1.2
1.1
55
Experiments
• INRIA pedestrian data set containing 1207 pedestrian images and
11466 random images.
• PIE face data set involving 1800 face images and 14616 random
images.
• A total number of 21780 simple rectangle features.
Face
imag
es
Pe
de
stri
an im
age
s Random images + simple features
Image cluster centers
K=5
K=3
56
Pedestrian detection by MCBoost [Wojek, Walk, Schiele et al CVPR09]
57
• TUD
Brussels
onboard
dataset
58
reca
ll
1-precision
1.0
0.0 0.0 1.0
Pedestrian detection by MCBoost [Wojek, Walk, Schiele et al CVPR09]
59
Multi-class Boosting +
Speeding up
Multiclass object detection [Torralba
et al PAMI 07]
• Learning multiple boosting classifiers by sharing features
• Tree structure speeds up the classification
61
62
Multiclass object detection [Torralba
et al PAMI 07]
63
Multiclass object detection [Torralba et
al PAMI 07]
ClusterBoost [Wu et al ICCV07]
• Algorithm – Start with one sub-category
– Select a feature per branch by Boosting
– Divide the branch if the feature is too weak by clustering on the feature value
– Continue to grow while updating the weak-learners of parent nodes
• Cf. VBT [Huang et al ICCV05],PBT [Tu ICCV05]
64
Feature sharing
←Splitting
←Splitting
ClusterBoost result
65
AdaTree Result [Grossmann 04]
66
1% Noise in data
Erro
r
Boosting round
10% Noise in data
Erro
r
Mean computation cost
0.5
0.4
0.3
0.2
0.1
0 0 90
0.5
0.4
0.3
0.2
0.1
0 0 90
Multiple Classifier System
Committee Machine
Mixture of Experts [Jordan, Jacobs 94]
Gating network encourages specialization (local
experts) instead of cooperation
68
…
output
y1(x)
y2(x)
yL(x)
Input x S
g1(x)
g2(x)
gL(x)
Expert 1
Expert 2
Expert L
Gating Network
Gating ftn
69
…
output
y1(x)
y2(x)
yL(x)
Input x S
Expert 1
Expert 2
Expert L
constant g1
g2
gL
Ensemble learning:
Boosting and Bagging
• Cooperation instead of specialization
Committee Machine
70
The average sum-of-squares error is
The average error by acting individually
is
The expected error of the committee is
Committee Machine
71
Randomised Decision Forests
(Basics)
Binary Decision Trees
1
2 3
6 7 4
9
5
8
category c
split nodes
leaf nodes v
10 11 12 13
14 15 16 17
• data vector v
• split functions fn(v)
• thresholds tn
• Classifications Pn(c)
≥
<
<
≥
Slide credit to J. Shotton
……
↔
73
• Train data:
• Recursive algorithm
– set In of training examples that reach node n is split:
• Features f and thresholds t chosen at random
• Choose f and t to maximize
gain in information
Randomized Tree Learning
left split
right split
category c
In
Il Ir
category c category c 74
Slide credit to J. Shotton
• Forest is an ensemble of decision trees
• Bagging (Bootstrap AGGregatING) – For each set, it randomly draws examples from the
uniform dist. allowing duplication and missing.
Forest of Trees : Learning
…… tree t1 tree tT
DS1 DST
DS
75
• Forest is an ensemble of decision trees
– Classification is done by
Forest of Trees : Recognition
…… tree t1 tree tT
category c
category c split nodes
leaf nodes
v v
76
Overfit
v
vs
Reasonably Smooth
x
…… tree t1 tree tT tree t2
77
DT RF
Decision Tree vs Random Forest
Wrap-up
79
Boosting
Bagging
Tree hierarchy
A decision stump
Random ferns?
Decision tree
Random forest
Boosted decision trees
Bagged boosting classifiers
Boosted decision stumps
Inspired by Yin, Crinimisi CVPR07
Comparisons in literature
In motion segmentation [Yin, Crinimisi CVPR07],
• RF>GB(Boosting stumps)>=BT(Boosting trees) in accuracy
• RF=GB>BT in speed at their best accuracy
In face detection/recognition [Belle et al ICPR08],
• RF>AB (AdaBoost) at low false positive rate,
• AB>RF at high false positive rate, in face detection accuracy
• RF>AB in speed
• SVM>RF in face recognition
80
Summary
81
Pros Cons
Random Forest
• Generalization through random samples/features • Very fast classification • Inherently multi-classes • Simple training
• Inconsistency • Difficulty for adaptation
Boosting Decision Stumps
• Generalisation by a flat structure • Fast classification • Optimisation framework
• Slower than RF • Slow training
PART II
Randomised Forests
(by Case Studies)
82
83
f(x)
ΔE
In
Il Ir
Split node
x
……
y
Tree 1 Tree 2 Tree N
p(c)
Leaf node
…
More discriminative split functions
BMVC12
Multiple split criteria /adaptive weighting
CVPR13 ICCV13
Capturing structural info. BMVC10
Probabilistic outputs
ICCV11 ECCV10
Part II (Recap)
Random Forest Codebook
1. Capturing Structural + Semantic
Info.
84
p(c)
Leaf node
… Capturing structural info. BMVC10
Real-time Action Categorisation
85
T. Yu, T-K Kim, R. Cipolla, BMVC 2010
Univ. of Cambridge, Imperial College
A novel real-time solution for action
recognition
• Our contribution is three-fold:
Efficient codebook learning
High run-time performance
Local appearance + structural information
86
Typical Approaches
Feature Encoding
Feature Matching
K-means Clustering
Slow for Large Codebook
The “Bag of Words” (BOW) Model
Lacks Structural Information
Quantisation Error
Our Method
Semantic Texton Forest
Efficient
PSRM Structural Information
Hierarchical Matching
Robust
Comparison with existing
approaches
87
Overview
Spatiotemporal Semantic Texton Forest
V-FAST Corner
PSRM
BOST Random Forest Classifier
K-means Forest
Results
Spatio-temporal Cuboids
Feature detection
Feature extraction
Feature matching
Classification
88
Building a codebook using STF
• Extract small video cuboids at detected keypoints
• Visual codebook using STF:
• Efficient visual codebook
• One feature → multiple codewords.
• Quantisation and partial matching
Random forest based codebook
• Work on pixels directly
• Hierarchical splits
“Textonises” patches recursively
89
Randomized Forest Codebook
5 4 3
1
2
8 9 6 7
4 2 6 1 3
9 8
……
tree t1 tree tT
7 5
Example Cuboids
6
Example Cuboids
6
… FASTEST Corner
…
Histogram of Forest Codewords
5 4 3
1
2
8 9 6 7
4 2 6 1 3
9 8
……
tree t1 tree tT
7 5
freq
ue
ncy
tree t1 tree tT
node index
“bag of words”
……
Extremely Fast Powerful Discriminative Codebook
TREE N
TREE N
Pyramidal Spatiotemporal Relationship Match
(PSRM)
92
Multiple Structural Relationship Histograms
Pyramid Match Kernel (PMK)
Pyramidal Spatiotemporal
Relationship Match (PSRM)
93
Experiments
• Short video sequences (50 frames ~ 2 seconds) are
extracted from the input video.
• Sampling frequency is 5 frames for experiment and 1
frame for the laptop demo. (so it is a frame-by-frame recognition)
• Two datsets are used for performance evaluation:
• The standard benchmark
• Six classes, with viewpoint changes, illumination changes, zoom , etc.
KTH dataset
• Human interactions, 6 classes of actions, cluttered background
UT dataset (for ICPR contest on Semantic Description of Human Activities 2010)
• Intel Core i7 920 (for accuracy and speed tests)
• Core 2 Duo P9400 (for laptop demo)
Hardware specifications
KTH dataset
UT interaction dataset
94
Experiments: Results (KTH dataset)
90 91 92 93 94 95 96 97 98 99 100
Mined features (ICCV2009)
CCA (CVPR2007)
Neighbourhood (CVPR2010)
Info. Max. (CVPR2008)
Shape-motion tree (ICCV2009)
Vocabulary Forest(CVPR2008)
Point clouds (CVPR2009)
our method (sequence)
our method (snippets)
96.7
95.33
94.53
94.15
93.43
93.17
93.17
95.67
93.55
Comparison with recent state-of-the-art
• Comparable to most state-of-the-art.
• Around ~3% slower than the top performer
• Is it a sensible trade-off?
• Useful for many more practical applications. (surveillance, robotics, etc.)
snippet: subsequence level recognition
sequence: major voting of subsequence labels
leave-of-out-cross-validation
Leave-of-out-cross-validation
95
Experiments: Results
• Results: UT interaction dataset
• Run time performance
PSRM and BOST gave low accuracies when applied separately.
~20% performance improved by simply combining the class labels!
< 25 fps, but enough for most real-time applications
Can be further optimised (e.g. GPU, mult-core processing)
96
2. Probabilistic Interpretation
of RF Output
98
p(c)
Leaf node
… Probabilistic outputs
ICCV11 ECCV10
using 3D Shape Priors
99
Y. Chen, T-K Kim, R. Cipolla, ICCV 2011, ECCV 2010
Univ. of Cambridge, Imperial College
Object Phenotype Recognition
Pose Estimation
• Signal: Poses
• Noise: Shape
variations + camera
view points
Challenges:
Pose and viewpoint variations >> phenotype variations
Phenotype
Recognition vs
• Signal: Shapes
• Noise: Poses +
camera view points
100
• Shape Generator (MS) by GPLVM
– Training Set: shapes in the canonical pose.
– Input: PCA coefficients of vertex positions.
Learning 3D Shape Priors
101
• Pose Generator (MA) by GPLVM
– Training Set: Synthetic 3D poses sequence.
– Input: PCA coefficients of vertex positions and
vertex-wise Jacobian matrices.
Learning 3D Shape Priors
102
The Graphical Model
Shape Synthesis
Pose Generator
Shape Generator
Shape Synthesis: Demo
Shape Generator Pose Generator (Running)
104
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
105
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
106
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
107
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
108
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
109
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
110
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
111
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
112
Shape Generator Pose Generator (Running)
Shape Synthesis: Demo
113
Shape Synthesis
Zero Shape
V0
Pose Generator
MA
Shape Generator
MS
VA VA
VS VS
Shape Transfer
V
V
114
Solution: Two-Stage Model • Main Ideas:
Discriminative + Generative
• Two stages:
1. Hypothesising – Discriminative;
– Using random forests;
2. Shape Synthesis and Verification – Generative;
– Synthesising 3D shapes using shape priors;
– Silhouette verification.
• Recognition by a model selection process.
115
• Use 3 RFs to quickly hypothesize phenotype,
pose, and camera parameters.
• Learned on synthetic silhouettes generated by the
shape priors.
Parameter Hypothesizing
FA:
Pose classifier
FC:
Camera pose classifier
FS:
Phenotype classifier
(canonical pose)
116
Examples of RF Classifiers
The phenotype
classifier The pose
classifier
117
Shape Synthesis and Verification
• Generate 3D shapes V
– From candidate parameters
given by RFs.
– Use GPLVM shape priors
[Chen et al.’10].
• Compare the projection of V
with the query silhouette Sq.
– Oriented Chamfer matching
(OCM). [Stenger et al’03] 118
• We infer the label c∗ of the query instance by
maximizing the posteriori probability
119
• We infer the label c∗ of the query instance by
maximizing the posteriori probability
120
Experiments
• Testing data:
– Manually segmented silhouettes;
• Current Datasets
– Human jumping jack
• (13 instances, 170 images);
– Human walking
• (16 instances, 184 images);
– Shark swimming
• (13 instances, 168 images).
• Phenotype Categorisation
121
Comparative Approaches:
• Learn a single RF phenotype classifier;
• Histogram of Shape Context (HoSC) – [Agarwal and Triggs, 2006]
• Inner-Distance Shape Context (IDSC) – [Ling and Jacob, 2007]
• 2D Oriented Chamfer matching (OCM) – [Stenger et al. 2006]
• Mixture of Experts for the shape reconstruction – [Sigal et al. 2007].
– Modified into a recognition algorithm
122
Comparative Approaches:
• Internal comparisons:
– Proposed method with both feature error
modelling and similarity-aware criteria
function (G+D);
– Proposed method w.o feature error
modelling (G+D–E);
– Proposed method w.o similarity-aware
criteria function (G+D–S)
• Using standard diversity index instead.
123
Recognition Performance • Cross-validation by splitting the dataset instances.
• 5 phenotype categories for every test.
• Selecting one instance from each category.
124
Experiments on Single View
Reconstruction Sharks:
125
Experiments on Single View
Reconstruction
Humans:
126
3. More discriminative split
functions
127
f(x)
ΔE
In
Il Ir
Split node
More discriminative split functions
BMVC12
Classification forest: the split function model
Split ftn: axis aligned Split ftn: oriented line Split ftn: conic section
Examples of split functions
See Appendix C for relation with kernel trick.
Slide credit to Criminisi and Shotton, ICCV11 tutorial
Fast Pedestrian Detection by Cascaded
Random Forest with Dominant Orientation
Templates
129
D. Tang, Y. Liu, T-K Kim, BMVC 2012
Imperial College
Fastest Pedestrian Detection by Cascaded
Random Forest with Dominant Orientation
Templates
Beginning with Hough Forest … [Gall et al 2009]
Random Forest + Implicit Shape Model
Multi-cue features (HOG etc)
Hough voting
Patch input
130
... ≤ τ > τ ≤ τ > τ
2 pixel test
(axis aligned
splits) i
j
f(p)=I(i)-I(j)
c
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6. Clustered Votes.
131
Experiments - Efficiency
132
Multi-cue features (HOG etc)
Hough voting
Patch input
133
... ≤ τ > τ ≤ τ > τ
2 pixel test
(axis aligned splits)
i
j
f(p)=I(i)-I(j)
Beginning with Hough Forest … [Gall et al 2009]
To accelerate run-time speed
Extract gradient information
Discretize dominant orientations into 1 byte
Discard magnitude information
134
Binarization of HOG…
Dominant Orientation Templates
[Hinterstoisser, et al, CVPR10]
DOT
Hough voting
Patch input
135
... ≤ τ > τ ≤ τ > τ
2 pixel test
(axis aligned
splits) i
j
f(p)=I(i)-I(j)
Beginning with Hough Forest … [Gall et al 2009]
To accelerate run-time speed
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6. Clustered Votes.
136
Template matching split function
2 pixel test
(axis aligned
splits),
efficient but
incapable… 137
Template matching split function
Template
based
(nonlinear),
capable but
efficient? 138
139
Template Matching Split Function
using Binary Bit Operations
0 0 1 0 0 1 0 0
0 0 1 0 1 0 0 0
1
≤ τ > τ
DOT
Hough voting
Patch input
140
... ≤ τ > τ ≤ τ > τ
Template
Matching Split
Beginning with Hough Forest … [Gall et al 2009]
To accelerate run-time speed
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6. Clustered Votes.
141
Experiments - Efficiency
142
DOT
Hough voting
Patch input
143
... ≤ τ > τ ≤ τ > τ
Template
Matching Split
Beginning with Hough Forest … [Gall et al 2009]
To accelerate run-time speed
Architecture
DOT Feature Extraction from Input
...
Holistic Random Forest
⊗
...
Patch-based Random Forest
⊗
Hough Voting Output
Applied DOT to pedestrian detection, and extended to colour domain
Template matching split function
Cascade of detectors
Clustering hough votes (similar to Girshick et al, ICCV11) 144
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6. Clustered Votes. 145
Experiments - Efficiency
Time per window: 1.07x10-6 sec
5 Frames per sec
Cf. Fastest Pedestrian Detector in the West(FPDW) [Dollar 2010]: 5 fps [Benenson et al 2012]: 50 fps by GPU 146
147
4. Multiple split criteria
148
f(x)
ΔE
In
Il Ir
Split node
Multiple split criteria /adaptive weighting
CVPR13 ICCV13
Unconstrained Monocular 3D Human
Pose Estimation by Action Detection
T. Yu, T-K Kim, R. Cipolla, CVPR 2013
Univ. of Cambridge, Imperial College
Challenges
Typical techniques for 3D human pose estimation:
• Segmentation
• Multi-camera approach
• Depthmaps
• Inverse kinematics
Multiple cameras with invserse kinematics [Bissacco et al. CVPR2007] [Yao et al. IJCV2012] [Sigal IJCV2011]
Learning-based (regression) [Navaratnam et al. BMVC2006] [Andriluka et al. CVPR2010]
Specialized hardware (e.g. structured light sensor, TOF camera) [Baak et al. ICCV2011] [Ye et al. CVPR2011] [Sun et al. CVPR2012]
Require controlled environments /
special hardware!
Our Approach
• Action recognition + detection
1. Actions are temporally and spatially
structured • Action recognition:
select the best manifold for pose regression. [Yao et al. IJCV2012]
• Action detection:
Detect the occurence of an action in space-time [Yang, ICCV2009]
Actions provides strong prior for pose estimation
Our Approach
2. Unconstrained part detection • Low level feature descriptors are replaced by a deformable part models (DPM) to detect high
level 2D body parts in the video (e.g. Pictorial structure [FelzenswalbIJ and Huttenlocher,
IJCV2005])
2D HPE
•Body parts detection (The pose Is known by detecting body parts)
•A problem of 2D detection and tracking
3D HPE
•Full 3D pose estimation
•A problem of optimization and manifold learning
[Eichner et al. IJCV2012] [Yang and Ramanan.ICCV2011]
[Andriluka et al. CVPR2009]
System overview
Method: Feature extraction
Input: • Action snippet, short, overlapping video subsequences
[Schindler CVPR2009]
• A DPM, using HoG, as an off-the-shelf 2D body part detector.
(Any 2D part detector will do)
Feature representation • Pairwise displacements between parts – 325 dimensions
• Dimensionality reduction: PCA – 6 dimensions (separately per
action)
For each frame • A pose is represented by a bag of feature vectors extracted from
the “action snippet”
Action Detection
Action as an atomic sequence of shape
Each detection provides the
spatiotemporal structure of an action.
Hough Forest: Each frame in the video votes
for the current pose. ACTION: CLAP
ACTION: DANCE
Locate space-time position
Time
Classify action
Reference
Starting Time
Method: Action Detection
•Randomized Hough Forest for action recognition and pose estimation [Gall et al. PAMI2011]
• Action and pose are estimated from a feature vector simultaneously.
• At each split, the quality measure is:
• Ha is the information gain used in standard classification forest
• Where Hp is:
• Which describes the spread of pose in a node
= covariance
Method: Action Detection
•Randomized Hough Forest for action
recognition and pose estimation [Gall et al. PAMI2011]
• ω is the purity of the current tree node
• Each terminal node gives
1. The posterior of the action
P(action class | features)
2. A vote of estimated pose
Action
detection
Action recog.
Method: Action Detection • (Testing)
• Pass down feature vectors extracted from a video snippet
• Action classification: same as typical randomized classification forest.
• Pose: Distributions of votes (joint positions) is modeled as 3D Gaussians.
Pose Regression & Combination
Global:
Action detection Local:
Regression Forests
Final 3D pose estimation:
A mixture of 3D Gaussians
Joint positions are
refined locally by
regression forests
Method: Pose Refinement
•Randomized Regression Forest
for local joint position
refinement •Differences among individual action instance (of the same
class) are corrected using regression forests [Criminisi et al.
MACCAI2010]
• Input:
• Training feature vectors
• Corresponding 3D joint locations
• Each forest refines one joint location
• so there are 15 forests (150 trees!).
• Output: Joint locations in 3D Gaussians
Method: Final Pose Estimation
•Compute the final 3D pose estimation
•Assuming the pose distributions from
(1) Action detection, and
(2) Pose regression
are independent
• the final pose estimation is computed by the joint distribution of Gaussians in (1) and (2)
•Traditional method: single output, global distribution.
•Our approach produces distributions of local "confidence levels" of joint locations.
Experiments
• A relatively new problem • No public dataset that provides both 2D and 3D ground truths
• We collected a new challenging dataset
(APE dataset) • 2D and 3D “ground truths” from Kinect
• 7 action classes, 7 different subjects, 5 sequences / class / subject
• Different environment for each subject
• No segmentation can be applied
• Moving backgrounds and occlusions
• Unseen testing environments --- traditional methods do not work in this dataset!
Bend Box Wave 1 Wave 2 Balance Dance Clap
Experiments • Comparison with the state-of-the-art 2D pose estimator
• A better accuracy is observed: Action detection (ad joint refinement) which gives
strong cues to the 3D pose
• Limitation: the action performed needs to be seen during the training phase.
Per-class localization error (pixels)
Experiments • Demo Video
1. APE dataset
2. Extra challenges: KTH and Weizmann dataset
1. Originally used in action recognition
2. Extremely low resolution for typical 3D HPE KTH dataset: 160 x 120 pixels, each subject is ~80-100 pixels high.
Weizmann dataset: 180 x 144 pixels, each subject is about ~60-75 pixels high.
KTH action dataset Weizmann action dataset
165
166
Multiple Split Criteria / Adaptive
Weighting … goes on …
For Hand Pose Estimation
167
168
Summary
Boosting
Cascade
169
x
x
x
…… t1 tT t2
Boosting
Decision tree Randomised Decision Forest
[IJCV 2012]
[NIPS 2008]
Part I
170
f(x)
ΔE
In
Il Ir
Split node
x
……
y
Tree 1 Tree 2 Tree N
p(c)
Leaf node
…
More discriminative split functions
BMVC12
Multiple split criteria /adaptive weighting
CVPR13 ICCV13
Capturing structural info. BMVC10
Probabilistic outputs
ICCV11 ECCV10
Part II
171
What other topics @ ICL
1 2 3
Thanks to
Jamie Shotton
Roberto
Cipolla
Wenhan Luo Akis Tsiotsios Yuru Pei Yang Liu
Tsz-Ho
Yu
Danhang
Tang
Yu
Chen
A. Doumanoglou Chanki Yu Chao Xiong Mang Shao
Kyuhwa Lee Alykhan Tejani
Ignas Budvytis
Tom
Woodley
Bjorn Stenger
173
Any Questions?
http://www.iis.ee.ic.ac.uk/icvl
top related