machine learning and imprecise probabilities for computer vision fabio cuzzolin idiap, martigny,...
TRANSCRIPT
Machine learning and imprecise probabilities for
computer vision
Fabio Cuzzolin
IDIAP, Martigny, 19/4/2006
Myself
Master’s thesis on gesturegesture recognitionrecognition at the University of Padova Visiting student, ESSRL, Washington
University in St. Louis Ph.D. thesis on the theory of evidencetheory of evidence Young researcher in Milan with the Image
and Sound Processing group Post-doc at UCLA in the Vision Lab
My research
research
Discrete mathematics
linear independence on lattices
Belief functions and imprecise probabilities
geometric approach
algebraic analysis
combinatorial analysis
Computer vision object and body tracking
data association
gesture and action recognition
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
Approach Problem: recognizing an example of a
known category of gestures from image sequences
Combination of HMMsHMMs (for dynamics) and size functionssize functions (for pose representation)
Continuous hidden Markov models
EM algorithm for parameter learning (Moore)
Example
transition matrix A -> gesture dynamics
state-output matrix C -> collection of hand
poses
The gesture is represented as a sequence of transitions between a small set of canonical poses
Size functions Hand poses are represented through their contours
real image measuring function family of lines
size function table
Gesture classification
…
HMM 1
HMM 2
HMM n
EM algorithmEM algorithm is used to learn HMM parameters from an input feature sequence
the new sequence is fed to the learnt gesture models
they produce a likelihood the most likely model is chosen (if above a
threshold)
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
Composition of HMMs Compositional behavior of HMMS: the model of the
action of interest is embedded in the overall model
ClusteringClustering: states of the original model are grouped in clusters, and the transition matrix recomputed accordingly:
kj Ceitjtitkt eXeXPeXCXP )|()|( 11
State clustering Effect of clustering on HMM topology
“Cluttered” model for the two overlapping motions
Reduced model for the “fly” gesture extracted through clustering
Kullback-Leibler comparison
We used the K-L distance to measure the similarity between models extracted from clutter and in absence of it
KL distances between “fly” (solid) and “fly from clutter” (dash)
KL distances between “fly” and “cycle”
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
Volumetric action recognition
problem: recognizing the action performed by a person viewed by a number of cameras
2D approaches: features are extracted from single views -> viewpoint dependence
volumetric approachvolumetric approach: features are extracted from a volumetric reconstruction of the moving body
Locally linear embeddingLocally linear embedding to find topological representation of the moving body
3D feature extraction
Linear discriminant analysis (LDA) to estimate the direction of motion
as the direction of maximal separation between the legs
k-means clusteringk-means clustering to separate bodyparts
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
A number of formalisms have been proposed to extend or replace classical probability:
e.g. possibilities, fuzzy sets, random sets, monotone capacities, gambles, upper and lower previsions
Uncertainty descriptions
theory of evidencetheory of evidence (A. Dempster, G. Shafer) Probabilities are replaced by belief functions Bayes’ rule is replaced by Dempster’s rule families of domains for multiple representation
of evidence
1)( B
Bm
A
B2
B1
AB
BmAs )( ..where m is a mass function on 2Θ s.t.
Belief functions are not additive
belief function s: 2Θ ->[0,1]
Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function which meets the normalization constraint
Probabilities are additive: if AB= then p(AB)=p(A)+p(B)
Belief functions
Dempster’s rule
in the theory of evidence, new information encoded as a belief function is combined with old beliefs in a revision process
belief functions are combined through Dempster’s rule'', ssss
ABBmABel)()(
Ai
Bj
AiBj=A
intersection of focal elements
ABA
ji
ji
BmAmAm )()()( 21
Example of combination
s1:
m({a1})=0.7, m({a1 ,a2})=0.3
a1
a2
a3
a4
s2:
m()=0.1, m({a2 ,a3 ,a4})=0.9
s1 s2 :
m({a1}) = 0.7*0.1/0.37 = 0.19
m({a2}) = 0.3*0.9/0.37 = 0.73
m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08
JPDA with shape info
YX
Z
XY
Z
robustness: clutter does not meet shape constraints occlusions: occluded targets can be estimated
JPDA model: independent targets
shape model: rigid links
Dempster’s fusion
Body tracking
Application: tracking of feature pointstracking of feature points on a moving human body
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
Pose estimation estimating the “posepose” (internal configuration)
of a moving body from the available images
salient image measurements: featuresfeatures
Qtq k ˆt=0
t=T
Model-based estimation if you have an a-priori modela-priori model of the object .. .. you can exploit it to help (or drive) the
estimation
example: kinematic model
Model-free estimation
if you do not have any information about the body..
the only way to do inference is to learn a maplearn a map between features and
poses directly from the data
this can be done in a training stagetraining stage
Collecting training data motion capture system
3D locations of markers = pose
Training data when the object performs some “significant”
movements in front of the camera … … a finite collection of configuration values
are provided by the motion capture system
… while a sequence of features is computed from the image(s)
q q
y y
Q~
1
1
T
T
Learning feature-pose maps
Hidden Markov modelsHidden Markov models provide a way to build feature-pose maps from the training data
a Gaussian density for each state is set up on the feature space -> approximate feature spaceapproximate feature space
mapmap between each region and the set of training poses qk with feature value yk inside it
Evidential model
approximate feature spaces ..
.. and approximate parameter space ..
.. form a family of compatible family of compatible frames: the evidential modelframes: the evidential model
Human body tracking
two experiments, two views
four markers on the right arm
six markers on both legs
Feature extraction
three steps: original image, color segmentation, bounding box
185
94
161
38
185
94
161
38
Performances comparison of three models: left view
only, right view only, both views
pose estimation yielded by the overall model
estimate associated with the “right” model
“left” model
ground truth
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
GaitID The problem: recognizing the identity of
humans from their gait Typical approaches: PCA on image features,
HMMs People typically use silhouette data Issue: view-invarianceview-invariance Can be addressed via 3D representations 3D tracking: difficult and sensitive
Bilinear models From view-invariance to “style” invariance“style” invariance In a dataset of sequences, each motion possess several
labels: action, identity, viewpoint, emotional state, etc. Bilinear modelsBilinear models (Tenenbaum) can be used to separate
the influence of two of those factors, called “style” and “content” (the label to classify)
ySC is a training set of k-dimensional observations with labels S and C
bC is a parameter vector representing content, while AS is a style-specific linear map mapping the content space onto the observation space
CSSC bAy
Content classification of unknown style
Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)
an asymmetric bilinear model can learned from it through the SVD of a stacked observation matrix
when new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…
… an iterative EM procedure can be set up to classify the content (identity)
E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style
M step -> estimation of the linear map for the unknown style s
Three layer modelThree layer model
each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors
Three-layer model
Feature representation:projection of the contour of the silhouette on a sheaf of lines passing through the center
MOBO database Mobo database: 25 people performing 4 different
walking actions, from 6 cameras6 cameras Each sequence has three labels: action, id, viewaction, id, view We set up four experimentsfour experiments in which one label was
chosen as content, another one as style, and the remaining is considered as a nuisance factor
Content = id, style = view -> view-invariant gaitID Content = id, style = action -> action-invariant gaitID Content = action, style = view -> view-invariant action
recogntion Content = action, style = id -> style-invariant action recognition
Results Compared performances with “baseline”
algorithm and straight k-NN on sequence HMMs
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
Distances between dynamical models
Problem: motion classification Approach: representing each movement as a
linear dynamical modellinear dynamical model for instance, each image sequence can be
mapped to an ARMA, or AR linear model Classification is then reduced to find a suitable
distance function in the space of dynamical distance function in the space of dynamical modelsmodels
We can use this distance in any of the popular classification schemes: k-NN, SVM, etc.
Riemannian metrics Some distances have been proposed: Martin’s
distance, subspace angles, gap-metric, Fisher metric
However, it makes no sense to choose a single distance for all possible classification problems
When some a-priori info is available (training set)..
.. we can learn in a supervised fashion the “best” .. we can learn in a supervised fashion the “best” metric for the classification problem!metric for the classification problem!
Feasible approach: volume minimization of volume minimization of
pullback metricspullback metrics
Learning pullback metrics many unsupervised algorithms take in input
dataset and map it to an embedded space they fail to learn a full metric Consider than a family of diffeomorphisms F
between the original space M and a metric space N
The diffeomorphism F induces on M a pullback metric
M
ND
F
Space of AR(2) models Given an input sequence, we can identify the
parameters of the linear model which better describes it
We chose the class of autoregressive models of order 2 AR(2)
Fisher metric on AR(2)
Compute the geodesics of the pullback metric on M
21
12
2212121 1
1
)1)(1)(1(
1),(
aa
aa
aaaaaaag
Results scalar feature, AR(2) and ARMA models NN algorithm to classify new sequences Identity recognition Action recognition
Results -2 Recognition performance of the second-best
distance and the optimal pull-back metric The whole dataset is considered, regardless
the view
View 1 View 5
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
it has the shape of a simplex
),( APClS A
each subset A A-th coordinate s(A)
Geometric approach to the ToE
Belief functions can be seen as points of an Euclidean space of dimension 2n-2
Belief spaceBelief space: the space of all the belief functions on a given frame
Geometry of Dempster’s rule
Dempster’s rule can be studied in the geometric setup too Geometric operator mapping pairs of points onto another point of
the belief space
conditional subspaces
Px
Py
P
F (s)x
F (s)y
y
x
y
x
compositional criterioncompositional criterion the approximation behaves like s when combined through
Dempster’s rule
tpPp
P dttptss minarg
Problem: given a belief function s, finding the “best” probabilistic approximation of s this can be solved in the geometric setup
comparative study of all the proposed probabilstic approximations
Probabilistic approximation
Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov
modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification
Imprecise probabilities Geometric approachAlgebraic analysis
Lattice structure
minimal refinement
1F
maximal coarsening
F is a locally Birkhoff (semimodular with finite length) lattice bounded below
order relation: existence of a refining
families of frames have the
algebraic structure of a lattice
a-priori constraint
conditional constraint
generalization of the total probability theorem
Total belief theorem
whole graph of candidate solutions, connections with combinatorics and linear systems