using multi-modality to guide visual tracking jaco vermaak cambridge university engineering...
TRANSCRIPT
![Page 1: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/1.jpg)
Using Multi-Modality to Guide Visual Tracking
Jaco Vermaak
Cambridge University Engineering Department
Patrick Pérez, Michel Gangnet, Andrew Blake
Microsoft Research Cambridge
Paris, December 2002
![Page 2: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/2.jpg)
Introduction Visual tracking difficult: changes in pose and illumination,
occlusion, clutter, inaccurate models, high-dimensional state spaces, etc.
Tracking can be aided by combining information in multiple measurement modalities
Illustrated here on head tracking using: Sound and contour measurements Colour and motion measurements
![Page 3: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/3.jpg)
General Tracking
![Page 4: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/4.jpg)
Tracking Equations Objective: recursive estimation of the filtering distribution:
General solution: Prediction step:
Filtering/update step:
Problem: generally no analytic solutions available
ttttp yyyyx ,,,| 1:1:1
11:1111:1 ||| t
filteringprevious
tt
priordynamical
tttt dppp xyxxxyx
prediction
tt
likelihood
tttt pLp 1:1:1 ||| yxxyyx
![Page 5: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/5.jpg)
Particle Filter Tracking Monte Carlo implementation of general recursions. Filtering distribution represented by samples/particles with
associated importance weights:
Proposal step: new particles proposed from a suitable proposal distribution:
Reweighting step: particles reweighted with importance weights:
Resampling step: multiply particles with high importance weights and eliminate those with low importance weights.
t
N
i
itttN dp i
txyx
x
1
:1|
titt
it q yxxx ,| from simulated 1
tit
it
it
it
itt
it
it qpL yxxxxxy ,|/|| 111
![Page 6: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/6.jpg)
Particle Filter Building Blocks Sampling from conditional density
Resampling
Reweighting with positive function
ii
p
,xx
xx |q
ii
dpq
,
|'
x
xxx
q
ii
p
,xx
Nij MNijp
p
1)( )(,1, x
x
ii
p
,xx
0xh
h
iii h
dphph
xx
xxxx
,
1
![Page 7: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/7.jpg)
Particle Filter ImplementationRequires specification of: System configuration and state space Likelihood model Dynamical model for state evolution State proposal distribution Particle filter architecture
![Page 8: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/8.jpg)
Head Tracking using Sound and Contour Measurements
![Page 9: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/9.jpg)
Problem Formulation Objective: track the head of a person in a video sequence
using audio and image cues Audio: time delay of arrival (TDOA) measurements at
microphone pair orthogonal to optical axis of camera Image: edge events along normal lines to a hypothesised
contour Complimentary modalities: audio good for (re)initialisation;
image good for fine localisation
![Page 10: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/10.jpg)
System Configuration
image plane
camera
microphone pair
![Page 11: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/11.jpg)
Model Ingredients Low-dimensional state space: similarity transform applied to a
reference template
Dynamical prior: integrated Langevin equation, i.e. second-order Markov kernel
Multi-modal data likelihoods:
Sound based likelihood: TDOA at mic. pair Contour based likelihood: edge events
,,, yxx
211:0 , ttttt pp xxxxx
xr
xxyEDGETDOA LxL
Lp
xr
![Page 12: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/12.jpg)
Contour Likelihood Input: maxima of projected luminance gradient along normals
( such events on normal)
j
N
icji
j
j
dNN
qqL
1
2,
00
EDGE ,0;1 xr
jN thjEDGEL
jd ,1 jd ,2 jd ,3
![Page 13: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/13.jpg)
Contour Likelihood Advantages
Low computational cost Robust to illumination changes
Drawbacks Fragile because of narrow support (especially with only
similarity transform on a fixed shape space) Sensitive to background clutter
Extension Multiply gradient by inter-frame difference to reduce
influence of background clutter
II
II
max
![Page 14: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/14.jpg)
Inter-Frame Difference
Without frame difference With frame difference
![Page 15: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/15.jpg)
Audio Likelihood Input: positions of peaks in generalised cross-correlation
function (GCCF) Reverberation leads to multiple peaks
x
TDOA
x
1d
GCCF
TDOA1d Nd
Nd
TDOAL
![Page 16: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/16.jpg)
Audio Likelihood Deterministic mapping from Time Delay of Arrival (TDOA) to
bearing angle (microphone calibration) to X-coordinate in image plane (camera calibration)
Audio likelihood follows in similar manner to contour likelihood
Likelihood assumes a uniform clutter model
xdG :
N
isi xGdN
N
qqxL
1
2100
TDOA ,;1
![Page 17: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/17.jpg)
Particle Filter Architecture
Layered sampling: first X-position and sound likelihood; then rest
X-position proposal: mixture of diffusion dynamics and sound proposal:
To admit “jumps” from proposal X-dynamics have to be augmented with an uniform component:
N
isiX
XXX
dxGNNxGG
xq
xqxpxq
1
211
TDOA
TDOALANG
,;1
1
XqX
X
q
pLTDOA
pppY EDGEL
xUxpxp XX 1LANG
![Page 18: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/18.jpg)
Examples Effect of inter-frame difference:
Conversational ping-pong:
![Page 19: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/19.jpg)
Examples Conversational ping-pong and sound based reinitialisation:
![Page 20: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/20.jpg)
Head Tracking using Colour and Motion Measurements
![Page 21: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/21.jpg)
Problem Formulation Objective: detect and track the head of a single person in a
video sequence taken from a stationary camera Modality fusion:
Motion and colour measurements are complementary Motion: when the object is moving colour is unreliable Colour: when the object is stationary motion information
disappears Automatic object detection and tracker initialisation using
motion measurements Individualisation of the colour model to the object:
Initialised with a generic skin colour model Adapted to object colour during periods of motion: motion
model acts as “anchor”
![Page 22: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/22.jpg)
Object Description and Motion Head modelled as an ellipse that is free to translate and
scale in the image Binary indicator variable to signal whether object is present in
the image or not, so object state becomes: State components assumed to have independent motion
models Indicator: discrete Markov chain Position and scale: Langevin motion with uniform initialisation:
rsyx ,,,x
0 and 1 if
1 and 1 if |
0 if undefined
,,|
1
1111
tttR
ttttL
t
tttt
rrxU
rrxxp
r
rrxxp
x
![Page 23: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/23.jpg)
Image Measurements Measurements taken on a regular filter grid:
Measurement vector:
hue image
saturation image
frame-difference image
iiii DSH ,,y
isotropic Gaussian filters
Gyyy 1
![Page 24: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/24.jpg)
Observation Likelihood Model Measurements at gridpoints assumed to be independent Unique background (object absent) likelihood model for each
gridpoint All gridpoints covered by the object share the same
foreground likelihood model:
At each gridpoint the measurements are also assumed to be independent:
Note that the background motion model is shared by all the gridpoints
xx
yyxyxyBi
iBi
Fii
FG
iii LLLL
1
||
iBM
iBSii
BHii
Bi
iFM
iFS
iFH
iF
DLSLHLL
DLSLHLL
y
y
![Page 25: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/25.jpg)
Colour Likelihood Model Normalised histograms for both foreground and background
colour likelihood models:
Background models trained on a sequence without objects Foreground models trained on a set of labelled face images Histogram models supplied with a small uniform component
to prevent numerical problems associated with empty bins
Bi
c
c
cL
i
c
1bin for count normalised :
tmeasuremen toingcorrespondindex bin :
tmeasuremencolour :
![Page 26: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/26.jpg)
Motion Likelihood Model Background frame-difference measurements empirically found
to be gamma distributed:
Foreground frame-difference depends on magnitude of motion, number and orientation of foreground edges, etc.
Modelling these effects accurately is difficult In general: if the object is moving foreground frame-difference
measurements are substantially larger than those for background
Thus a two-component uniform distribution is adopted for the foreground frame-difference measurements (outlier model)
iaii
BM bDDDL exp1
![Page 27: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/27.jpg)
Particle Proposal Three stages of operation:
Birth: object first enters scene; proposal should detect object and spawn particles in the object region
Alive: object persists in scene; proposal should allow object to be tracked, whether it is stationary or moves around
Death: object leaves scene; proposal should kill particles associated with the object
Form of particle proposal:
N
i
ii rP
syx
rrqPrrqPq
1
)()(
,,
,',,'|','|',,'|
z
yzzyxx
empirical probability ofobject being alive
![Page 28: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/28.jpg)
Particle Proposal Indicator proposal:
Birth only allowed if there is no object currently in the scene All particles alive are subjected to a fixed death probability
State proposal:
Langevin dynamics if object is alive Gaussian birth proposal: parameters from detection module
death
birth
Prrq
PPPrrq
1'|0
otherwise 0
0' if ',0'|1
0' and 1 if ˆ,ˆ;
1' and 1 if '|
0 if undefined
,',,'|
rrN
rrp
r
rrq L
Σμz
zzyzz
![Page 29: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/29.jpg)
Object Detection Object region detected by probabilistic segmentation of the
horizontal and vertical projections of the frame-difference measurements:
Region location and size determine parameters for birth proposal distribution
![Page 30: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/30.jpg)
Colour Model Adaptation Why:
Generic skin colour model may be too broad for accurate localisation
Model sensitive to colour changes due to changes in pose and illumination
When: Object present and moving: largest variations in colour
expected Motion likelihood “anchors” particles around moving object
How: Gradual: avoid fitting to the background: enforced with
prior Stochastic EM: contribution of particles proportional to
likelihood
![Page 31: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/31.jpg)
Colour Model Adaptation Unknown parameters: normalised bin values for object hue and
saturation histograms EM Q-function for MAP estimation:
No analytic solution but particle approximation yields:
Monte Carlo approximation only performed over particles that are currently alive
prior dynamical
1ˆ,|ˆ|log,|logˆ,
:1:1 tttttptt pLEQ
tttθθθxyθθ
θyx
11
ˆ|log,|logˆ,
tttitt
N
i
itttN pLQ θθθxyθθ
![Page 32: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/32.jpg)
Colour Model Adaptation Dirichlet prior used for parameter updates:
Prior centred on old parameter values Variance controlled by multiplicative constant Update rule for normalised bin counts becomes:
11 || tttt CDip θθθθ
parameterprior Dirichlet :
particleth -for count bin th - :
1
1
1
i
ji
N
j
ji
ji
B
jjj
iii
jin
nn
Bn
n
![Page 33: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/33.jpg)
What Happens?
1
2
particlehistograms
weighted averagehistogram
![Page 34: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/34.jpg)
Implementation Colour model adaptation iterations occur between particle
prediction and particle reweighting in standard particle filter Stochastic EM algorithm initialised with parameters from
previous time step A single stochastic EM iteration is sufficient at each time step Number of particles is fixed to 100 Non-optimised algorithm runs at 15fps on standard desktop PC
![Page 35: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/35.jpg)
Examples
No adaptation: tracker gets stuck on skin-coloured carpet in the background
Adaptation: tracker successfully adapts to changes in pose and illumination and lock is maintained
No motion likelihood: tracker fails, illustrating need for “anchor” likelihood
![Page 36: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/36.jpg)
Examples
Tracking is successful despite substantial variations in pose and illumination and the subject temporarily leaving the scene
Particles are killed when the subject leaves the scene; upon re-entering the individualised colour model allows lock to be re-established within a few frames
![Page 37: Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d005503460f949d2db8/html5/thumbnails/37.jpg)
The End