tracking multiple occluding people by localizing on multiple scene planes
DESCRIPTION
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes. Saad M. Khan and Mubarak Shah, PAMI , VOL. 31, NO. 3, MARCH 2009, Donguk Seo [email protected] 2012.07.21. Introduction(1) . - PowerPoint PPT PresentationTRANSCRIPT
Tracking Multiple Occluding People by Localizing on Multiple
Scene Planes
Saad M. Khan and Mubarak Shah, PAMI, VOL. 31, NO. 3, MARCH 2009,
Donguk [email protected]
2012.07.21
2Intelligent Systems
Lab.
Introduction(1) Multi view approach to detect and track mul-tiple people in crowed and cluttered scenes
Detection and occlusion resolutionBased on geometrical constructsThe distinction of foreground from background
Using standard background modeling techniques
Planar Homographic occupancy constraintUsing SIFT feature matches and employing the RANSAC al-gorithm
To track multiple peopleUsing object scene occupancies
3Intelligent Systems
Lab.
Introduction(2) Examples of cluttered and crowded scenes to test this paper’s approach.
4Intelligent Systems
Lab.
Homographic occupancy constraint(1)
(b) Foreground pixels that belong to the blue person but are occluding the feet region of the green person satisfy the homographic occupancy constraint (the green ray). This seemingly creates a see-through effect in view 1, where the feet of the occluded person can be detected.
(a) The blue ray shows how the pixels that satisfy the homographic occupancy constraint warp correctly to foreground in each view.
5Intelligent Systems
Lab.
Homographic occupancy constraint(2)Proposition 1
The image projections of the scene point given by in any views satisfy both of the following: for
: the foreground region in view : the image projection of the scene point
for : the homography induced by plane from view to view
Proposition 2 (Homographic occupancy constraint)
, for : the set of all pixels in a reference view : the homography induced by plane from the reference view to view : the piercing point with respect to lies inside the volume of a foreground object in the scene
6Intelligent Systems
Lab.
Localizing people(1): the images of the scene obtained from uncali-brated cameras: a reference view: homography of the reference plane between the reference view and any other view
: a pixel in the reference image: warped to pixel in image : the observation in images at location
i i rp H p
i i ix p
7Intelligent Systems
Lab.
Localizing people(2): the event that pixel has a piercing point inside a foreground objectUsing Bayes law
: the likelihood of observation belonging to the foregroundPlugging (3) into (2) and back into (1)
1 2 1 2| , ,..., , ,..., |n nP X x x x P x x x X P X (1)
1 2 1 2, ,..., | | | ... |n nP x x x X P x X P x X P x X (2)
|i iP x X L x (3)
1 21
| , ,...,n
n ii
P X x x x L x
(4)
8Intelligent Systems
Lab.
Modeling clutter and FOV constraints(1)Assumed the scene point under examination is in-
side the FOV of each cameraThe concept of an RMS clutter metric of the spatial-intensity properties of the scene
: the variance of pixel values within the th cell: the number of cells or blocks the picture has been divided into
is defined to be twice the length of the largest tar-get dimension.They compute the clutter metric for each view at each time instant on the foreground likelihood maps obtained from background modeling.
2
1
1 N
kk
CN
(5)
9Intelligent Systems
Lab.
Modeling clutter and FOV constraints(2)
The first row: images from two of test sequences
The second row: foreground likelihood maps for views in the first row
10Intelligent Systems
Lab.
Modeling clutter and FOV constraints(3)To assign higher confidence to foreground detected
from views with lesser clutter
: a normalizing factorThe overlapping FOV of all cameras
: higher confidence in regions with greater view overlap
1 21
1log | , ,..., logn
n ii i
P X x x x L xC
(6)
1 21
1log | , ,...,n
n i iii i
P X x x x x xC
(7)
11Intelligent Systems
Lab.
Localization at multiple planes: the homography induced by a reference scene plane between views and : the homography induced by a plane parallel to
: the vanishing point of the normal direction: a scalar multiple controlling the distance between the par-allel planes
The reference plane homographies between viewsAutomatically calculated with SIFT feature matchesUsing the RANSAC algorithm
Vanishing points for the reference directionComputed by detecting vertical line segments in the scene and finding their intersection in a RANSAC framework
3 310 | 0 |1i j i j ref refH H I
v v (8
)
12Intelligent Systems
Lab.
Localization algorithm (1)1) Obtain the foreground likelihood maps .
- Model background using a Mixture of Gaussians.- Perform background subtraction to obtain foreground likeli-
hood information2) Obtain reference plane homographies and vanishing point of
reference direction.3) for to
- Update reference plane homographies using (8)- Warp foreground likelihood maps to a reference view using
homographies of the reference plane.- Warped foreground likelihood maps:
- Fuse at each pixel location of the reference view according to (7) to obtain synergy map
- end for4) Arrange s as a 3D stack in the reference direction
13Intelligent Systems
Lab.
Localization algorithm (2)
14Intelligent Systems
Lab.
TrackingTracking methodology
Based on the concept of spatio-temporal coherency of scene occupancies created by objects
To solve the tracking problem by using a sliding win-dow over multiple frames
Look-ahead technique: the trajectory of spatio-temporal occupancies by individual : the spatial localization of the individual in the oc-cupancy likelihood information at time Localization algorithm for a sliding time window of frames
1
1 11 1 2,...,
ˆ ˆ,..., argmax ,..., | , ,...,n
n nn tl l
P l l (9)
15Intelligent Systems
Lab.
Graph cuts trajectory segmentation (1)For a time window of frames
The scene occupancy likelihood information from our local-ization algorithm: : 3D grid of object occupancy likelihoods
Arranging s in the dimensionSpatio-temporal occupancy likelihood grid:
To segment into background (nonoccupancies) and object occupancy trajectories with the following cri-teria:
Grid locations with high occupancy likelihoods have a higher chance of being included in object trajectories.Object trajectories are spatially and temporally coherent.
Energy function
1 2ˆ ; ;...;i t Θ
,,
ˆp q
p P p q N
p B
Θ (10)
16Intelligent Systems
Lab.
Graph cuts trajectory segmentation (2): the set of all grid locations/nodes in : the set of grid locations in a neighborhood
: the 4D Euclidean distance between grid locations and : a normalizing factor
In order to minimize the energy functionUsing graph cuts techniques
Undirected graph The set of vertices
The set of spatio-temporal grid locations augmented by the source and sink vertices:
The set of edgesAll neighboring pairs of nodesThe edges between each node and the source and sink
The weights on the edges -> The edge contains the source or sink as one of its vertices
17Intelligent Systems
Lab.
Graph cuts trajectory segmentation (3)
(a) A sequence of synergy maps at the ground reference plane of nine people
(b) The XY cut of the 4D spatio-temporal occupancy tracks
18Intelligent Systems
Lab.
Graph cuts trajectory segmentation (4)
(a) Reference view(b) The 3D marginal of the 4D spatiotemporal occupancy tracks at a particular time. Notice the gaps in localizations for each per-son’s color-coded regions. This is because only 10 planes parallel to the ground in the up (Z) direction were used.
19Intelligent Systems
Lab.
Results and discussions
(a) The execution time is linear with both the number of views and number of fusion planes(b) Execution time with varying image resolution. As the resolu-tion increases beyond the cache limit of the GPU, the perfor-mance drops
20Intelligent Systems
Lab.
Parking lot data set(1)
21Intelligent Systems
Lab.
Parking lot data set(2)
(a) Total average track error of persons(b) Plot on the top: the detection error (number of false positives + number false negative) The bottom plot: the variation of the people density
22Intelligent Systems
Lab.
Parking lot data set(3)
(a) Detection error for utilizing a simple threshold of the occu-pancy likelihood data compared with trajectory segmentation-based approach.(b) Detection results using a threshold-based approach.
23Intelligent Systems
Lab.
Indoor data set(1)
24Intelligent Systems
Lab.
Indoor data set(2)
(a) Total average track error over time from the top center of the track bounding box to the manually marked head locations of people.(b) The accumulated detection error (number of false positives + number of false negatives accumulated over time) for different individual planes
25Intelligent Systems
Lab.
Basketball data set
26Intelligent Systems
Lab.
Soccer data set
27Intelligent Systems
Lab.
ConclusionsTo track multiple people in a complex environmentResolving occlusions and localizing people on multi-ple scene planes
Using a planar homographic occupancy constraintSegmentation of individual trajectories of the people
Combining foreground likelihood information from multiple viewsObtaining the global optimum of space-time scene occu-pancies over a window of frames