ICDSC'08 1
MULTI-TARGET TRACKING THROUGH OPPORTUNISTIC CAMERA CONTROL IN ARESOURCE CONSTRAINED MULTIMODAL
SENSOR NETWORK
Jayanth Nayak1, Luis Gonzalez-Argueta2, Bi Song2,
Amit Roy-Chowdhury2, Ertem Tuncel2
Department of Electrical Engineering,
University of California, Riverside
9/8/2008
Bourns College of EngineeringInformation
Processing
Laboratorywww.ipl.ee.ucr.edu
ICDSC'08 2
Overview
Introduction
Problem Formulation
Audio And Video Processing
Camera Control Strategy
Computing Final Tracks Of All Targets
Experimental Results
Conclusion
Acknowledgements
9/8/2008
ICDSC'08 3
Motivation
Obtaining multi-resolution video from a highly active environment requires a large number of cameras.
DisadvantagesCost of buying, installing and maintaining
Bandwidth limitations
Processing and storage
Privacy
Our goal: minimize the quantity of cameras by a control mechanism that directs the attention of the cameras to the interesting parts.
9/8/2008
ICDSC'08 4
Proposed Strategy
Audio sensors direct the pan/tilt/zoom of the camera to the location of the event.
Audio data intelligently turns on the camera and video data turns off the camera.
Audio and video data are fused to obtain tracks of all targets in the scene.
9/8/2008
ICDSC'08 5
Example Scenario
9/8/2008
An example scenario where audio can be used to efficiently control two video cameras. There are four tracks that need to be inferred. Directly indicated on tracks are time instants of interest, i.e., initiation and end of each track, mergings, splittings, and cross-overs. The mergings and crossovers are further emphasized by X. Two innermost tracks coincide in the entire time interval (t2, t3). The cameras C1 and C2 need to be panned, zoomed, and tilted as decided based on their own output and that of the audio sensors a1, . . . , aM.
ICDSC'08 6
Relation To Previous Work
Fusion of simultaneous audio and video data.Our audio and video data are captured at disjoint time intervals.
Dense network of vision sensors.In order to cover a large field, we focus on controlling a reduced set of vision sensors.
Our video and audio data is analyzed from dynamic scenes.
9/8/2008
ICDSC'08 7
Problem Formulation
Audio sensors A = {a1, . . . , aM} are distributed across ground plane RR is also observable from a set of controllable cameras C = {c 1, . . . ,cL}.
However, entire region R may not be covered with one set of camera settings.
p-tracks: tracks belonging to targets
a-tracks: tracks obtained by clustering audio
Resolving p-track ambiguityCamera Control
Person Matching9/8/2008
ICDSC'08 8
Tracking System Overview
9/8/2008
a-tracks
Overall camera control system. Audio sensors A = {a1, . . . , aM} are distributed across regions Ri. The set of audio clusters are denoted by Bt, and Kt− represent the set of confirmed a-tracks estimated based on observations before time t. P/T/Z cameras are denoted by C = {c1, . . . , cL}. Ground plane positions are denoted by Ot
k .
ICDSC'08 9
Processing Audio and Video
a-tracks are clusters of audio data that are above amplitude threshold
Tracked using Kalman Filter
In video, people are detected using histogram of orientated gradients and tracked using Auxilary Particle Filter
9/8/2008
ICDSC'08 10
Mapping From Image Plane to Ground Plane
Learned parameters are used to transform tracks from image plane to ground plane
Estimate projective transformation matrix H during a calibration phase
Precompute H for each PTZ setting of each camera
9/8/2008
vanishing line
ICDSC'08 12
Camera Control
Camera controlGoal: avoid ambiguity or disambiguate when tracks
are created or deleted
intersect
merge
Set pan/tilt/zoom parameters
9/8/2008
ICDSC'08 13
Setting Camera Parameters
Heuristic algorithm
Cover ground plane by regions Ri l
Ri l in field of view of camera Cl
Camera parameters
Tracking algorithm specifies point of interest x from last known a-track
If no camera on, find Ri l containing x
Reassign a camera and set its parameters if x approaches boundary of current Ri
l
9/8/2008
li
li
li ZTP ,,
ICDSC'08 14
Camera Control Based on Track Trajectories
Intersection
9/8/2008
SeparationMerger
Sudden Appearance Undetected Disappearance
Sudden Disappearance
Locatio
n(M
eters)
Time(Seconds)
Locatio
n(M
eters)
Time(Seconds)
Locatio
n(M
eters)
Time(Seconds)
Locatio
n(M
eters)
Time(Seconds)
Locatio
n(M
eters)
Time(Seconds)
Switch to video
Locatio
n(M
eters)
Time(Seconds)
ICDSC'08 15
Creating Final Tracks Of All Targets
Bipartite graph matching over a set of color histograms
We collect features as the target enters and exits the scene in video.
For every new a-track, features are collected from a small set of frames.
The weight of an edge is the distance between the observed video features.
Additionally, audio data is enforced on the weights.
9/8/2008
ICDSC'08 16
Creating Final Tracks Using Bipartite Matching
9/8/2008
Locatio
n(M
eters)
Time(Seconds)
Audio AudioVideo[a+, a-]
[b+, b-]
[c+]
[d+]
[e+, e-]
Tracking in Audio and Video
Locatio
n(M
eters)
Time(Seconds)
Tracking in Audio Only
Three tracks are recovered by matching every node (entry and exit from the scene) where video was capture.
Two tracks are recovered . However, red and green show the wrong path.
Audio cannot disambiguate independence once the clusters have merged.
[f+]
[g+]
Video
abcdefg
+-
Bipartite Graph Matching
abcdefg
abcdefg
+-
Bipartite Graph Matching Without Audio Constraint
abcdefg
[d-]
[c-]
ICDSC'08 17
Experimental Results
9/8/2008
Inter P-Track Distance at a Merge EventInter P-Track Distance at a Crossover Event
ICDSC'08 19
Conclusion
Goal: minimize camera usage in a surveillance system
Save power, bandwidth, storage and money
Alleviate privacy concerns
Proposed a probabilistic scheme for opportunistically deploying cameras in a multimodal network.
Showed detailed experimental results on real data collected in multimodal networks.
Final set of tracks are computed by bipartite matching
9/8/2008
ICDSC'08 20
Acknowledgements
This work was supported by Aware Building: ONR-N00014-07-C-0311 and the NSF CNS 0551719.
Bi Song2 and Amit Roy-Chowdhury2 were additionally supported by NSF-ECCS 0622176 and ARO-W911NF-07-1-0485.
9/8/2008
Thank You.
Questions?
Jayanth Nayak1
Luis Gonzalez-Argueta2, Bi Song2,
Amit Roy-Chowdhury2, Ertem Tuncel2
{largueta,bsong,amitrc,ertem}@ee.ucr.edu9/8/2008
Bourns College of EngineeringInformation
Processing
Laboratorywww.ipl.ee.ucr.edu