machine learning and imprecise probabilities for computer vision fabio cuzzolin idiap, martigny,...

54
Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Upload: alyssa-oneal

Post on 28-Mar-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Machine learning and imprecise probabilities for

computer vision

Fabio Cuzzolin

IDIAP, Martigny, 19/4/2006

Page 2: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Myself

Master’s thesis on gesturegesture recognitionrecognition at the University of Padova Visiting student, ESSRL, Washington

University in St. Louis Ph.D. thesis on the theory of evidencetheory of evidence Young researcher in Milan with the Image

and Sound Processing group Post-doc at UCLA in the Vision Lab

Page 3: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

My research

research

Discrete mathematics

linear independence on lattices

Belief functions and imprecise probabilities

geometric approach

algebraic analysis

combinatorial analysis

Computer vision object and body tracking

data association

gesture and action recognition

Page 4: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 5: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Approach Problem: recognizing an example of a

known category of gestures from image sequences

Combination of HMMsHMMs (for dynamics) and size functionssize functions (for pose representation)

Continuous hidden Markov models

EM algorithm for parameter learning (Moore)

Page 6: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Example

transition matrix A -> gesture dynamics

state-output matrix C -> collection of hand

poses

The gesture is represented as a sequence of transitions between a small set of canonical poses

Page 7: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Size functions Hand poses are represented through their contours

real image measuring function family of lines

size function table

Page 8: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Gesture classification

HMM 1

HMM 2

HMM n

EM algorithmEM algorithm is used to learn HMM parameters from an input feature sequence

the new sequence is fed to the learnt gesture models

they produce a likelihood the most likely model is chosen (if above a

threshold)

Page 9: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 10: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Composition of HMMs Compositional behavior of HMMS: the model of the

action of interest is embedded in the overall model

ClusteringClustering: states of the original model are grouped in clusters, and the transition matrix recomputed accordingly:

kj Ceitjtitkt eXeXPeXCXP )|()|( 11

Page 11: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

State clustering Effect of clustering on HMM topology

“Cluttered” model for the two overlapping motions

Reduced model for the “fly” gesture extracted through clustering

Page 12: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Kullback-Leibler comparison

We used the K-L distance to measure the similarity between models extracted from clutter and in absence of it

KL distances between “fly” (solid) and “fly from clutter” (dash)

KL distances between “fly” and “cycle”

Page 13: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 14: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Volumetric action recognition

problem: recognizing the action performed by a person viewed by a number of cameras

2D approaches: features are extracted from single views -> viewpoint dependence

volumetric approachvolumetric approach: features are extracted from a volumetric reconstruction of the moving body

Page 15: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Locally linear embeddingLocally linear embedding to find topological representation of the moving body

3D feature extraction

Linear discriminant analysis (LDA) to estimate the direction of motion

as the direction of maximal separation between the legs

k-means clusteringk-means clustering to separate bodyparts

Page 16: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 17: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

A number of formalisms have been proposed to extend or replace classical probability:

e.g. possibilities, fuzzy sets, random sets, monotone capacities, gambles, upper and lower previsions

Uncertainty descriptions

theory of evidencetheory of evidence (A. Dempster, G. Shafer) Probabilities are replaced by belief functions Bayes’ rule is replaced by Dempster’s rule families of domains for multiple representation

of evidence

Page 18: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

1)( B

Bm

A

B2

B1

AB

BmAs )( ..where m is a mass function on 2Θ s.t.

Belief functions are not additive

belief function s: 2Θ ->[0,1]

Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function which meets the normalization constraint

Probabilities are additive: if AB= then p(AB)=p(A)+p(B)

Belief functions

Page 19: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Dempster’s rule

in the theory of evidence, new information encoded as a belief function is combined with old beliefs in a revision process

belief functions are combined through Dempster’s rule'', ssss

ABBmABel)()(

Ai

Bj

AiBj=A

intersection of focal elements

ABA

ji

ji

BmAmAm )()()( 21

Page 20: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Example of combination

s1:

m({a1})=0.7, m({a1 ,a2})=0.3

a1

a2

a3

a4

s2:

m()=0.1, m({a2 ,a3 ,a4})=0.9

s1 s2 :

m({a1}) = 0.7*0.1/0.37 = 0.19

m({a2}) = 0.3*0.9/0.37 = 0.73

m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08

Page 21: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

JPDA with shape info

YX

Z

XY

Z

robustness: clutter does not meet shape constraints occlusions: occluded targets can be estimated

JPDA model: independent targets

shape model: rigid links

Dempster’s fusion

Page 22: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Body tracking

Application: tracking of feature pointstracking of feature points on a moving human body

Page 23: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 24: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Pose estimation estimating the “posepose” (internal configuration)

of a moving body from the available images

salient image measurements: featuresfeatures

Qtq k ˆt=0

t=T

Page 25: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Model-based estimation if you have an a-priori modela-priori model of the object .. .. you can exploit it to help (or drive) the

estimation

example: kinematic model

Page 26: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Model-free estimation

if you do not have any information about the body..

the only way to do inference is to learn a maplearn a map between features and

poses directly from the data

this can be done in a training stagetraining stage

Page 27: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Collecting training data motion capture system

3D locations of markers = pose

Page 28: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Training data when the object performs some “significant”

movements in front of the camera … … a finite collection of configuration values

are provided by the motion capture system

… while a sequence of features is computed from the image(s)

q q

y y

Q~

1

1

T

T

Page 29: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Learning feature-pose maps

Hidden Markov modelsHidden Markov models provide a way to build feature-pose maps from the training data

a Gaussian density for each state is set up on the feature space -> approximate feature spaceapproximate feature space

mapmap between each region and the set of training poses qk with feature value yk inside it

Page 30: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Evidential model

approximate feature spaces ..

.. and approximate parameter space ..

.. form a family of compatible family of compatible frames: the evidential modelframes: the evidential model

Page 31: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Human body tracking

two experiments, two views

four markers on the right arm

six markers on both legs

Page 32: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Feature extraction

three steps: original image, color segmentation, bounding box

185

94

161

38

185

94

161

38

Page 33: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Performances comparison of three models: left view

only, right view only, both views

pose estimation yielded by the overall model

estimate associated with the “right” model

“left” model

ground truth

Page 34: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 35: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

GaitID The problem: recognizing the identity of

humans from their gait Typical approaches: PCA on image features,

HMMs People typically use silhouette data Issue: view-invarianceview-invariance Can be addressed via 3D representations 3D tracking: difficult and sensitive

Page 36: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Bilinear models From view-invariance to “style” invariance“style” invariance In a dataset of sequences, each motion possess several

labels: action, identity, viewpoint, emotional state, etc. Bilinear modelsBilinear models (Tenenbaum) can be used to separate

the influence of two of those factors, called “style” and “content” (the label to classify)

ySC is a training set of k-dimensional observations with labels S and C

bC is a parameter vector representing content, while AS is a style-specific linear map mapping the content space onto the observation space

CSSC bAy

Page 37: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Content classification of unknown style

Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)

an asymmetric bilinear model can learned from it through the SVD of a stacked observation matrix

when new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…

… an iterative EM procedure can be set up to classify the content (identity)

E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style

M step -> estimation of the linear map for the unknown style s

Page 38: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Three layer modelThree layer model

each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors

Three-layer model

Feature representation:projection of the contour of the silhouette on a sheaf of lines passing through the center

Page 39: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

MOBO database Mobo database: 25 people performing 4 different

walking actions, from 6 cameras6 cameras Each sequence has three labels: action, id, viewaction, id, view We set up four experimentsfour experiments in which one label was

chosen as content, another one as style, and the remaining is considered as a nuisance factor

Content = id, style = view -> view-invariant gaitID Content = id, style = action -> action-invariant gaitID Content = action, style = view -> view-invariant action

recogntion Content = action, style = id -> style-invariant action recognition

Page 40: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Results Compared performances with “baseline”

algorithm and straight k-NN on sequence HMMs

Page 41: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 42: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Distances between dynamical models

Problem: motion classification Approach: representing each movement as a

linear dynamical modellinear dynamical model for instance, each image sequence can be

mapped to an ARMA, or AR linear model Classification is then reduced to find a suitable

distance function in the space of dynamical distance function in the space of dynamical modelsmodels

We can use this distance in any of the popular classification schemes: k-NN, SVM, etc.

Page 43: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Riemannian metrics Some distances have been proposed: Martin’s

distance, subspace angles, gap-metric, Fisher metric

However, it makes no sense to choose a single distance for all possible classification problems

When some a-priori info is available (training set)..

.. we can learn in a supervised fashion the “best” .. we can learn in a supervised fashion the “best” metric for the classification problem!metric for the classification problem!

Feasible approach: volume minimization of volume minimization of

pullback metricspullback metrics

Page 44: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Learning pullback metrics many unsupervised algorithms take in input

dataset and map it to an embedded space they fail to learn a full metric Consider than a family of diffeomorphisms F

between the original space M and a metric space N

The diffeomorphism F induces on M a pullback metric

M

ND

F

Page 45: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Space of AR(2) models Given an input sequence, we can identify the

parameters of the linear model which better describes it

We chose the class of autoregressive models of order 2 AR(2)

Fisher metric on AR(2)

Compute the geodesics of the pullback metric on M

21

12

2212121 1

1

)1)(1)(1(

1),(

aa

aa

aaaaaaag

Page 46: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Results scalar feature, AR(2) and ARMA models NN algorithm to classify new sequences Identity recognition Action recognition

Page 47: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Results -2 Recognition performance of the second-best

distance and the optimal pull-back metric The whole dataset is considered, regardless

the view

View 1 View 5

Page 48: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 49: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

it has the shape of a simplex

),( APClS A

each subset A A-th coordinate s(A)

Geometric approach to the ToE

Belief functions can be seen as points of an Euclidean space of dimension 2n-2

Belief spaceBelief space: the space of all the belief functions on a given frame

Page 50: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Geometry of Dempster’s rule

Dempster’s rule can be studied in the geometric setup too Geometric operator mapping pairs of points onto another point of

the belief space

conditional subspaces

Px

Py

P

F (s)x

F (s)y

y

x

y

x

Page 51: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

compositional criterioncompositional criterion the approximation behaves like s when combined through

Dempster’s rule

tpPp

P dttptss minarg

Problem: given a belief function s, finding the “best” probabilistic approximation of s this can be solved in the geometric setup

comparative study of all the proposed probabilstic approximations

Probabilistic approximation

Page 52: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov

modelsVolumetric action recognitionData association with shape informationEvidential models for pose estimation Bilinear models for view-invariant gaitIDRiemannian metrics for motion classification

Imprecise probabilities Geometric approachAlgebraic analysis

Page 53: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Lattice structure

minimal refinement

1F

maximal coarsening

F is a locally Birkhoff (semimodular with finite length) lattice bounded below

order relation: existence of a refining

families of frames have the

algebraic structure of a lattice

Page 54: Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

a-priori constraint

conditional constraint

generalization of the total probability theorem

Total belief theorem

whole graph of candidate solutions, connections with combinatorics and linear systems