distributed video data fusion, analysis, and mining for video surveillance applications* edward...

38
Distributed Video Data Fusion, Distributed Video Data Fusion, Analysis, and Mining for Video Analysis, and Mining for Video Surveillance Applications* Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and Computer Engineering 2 Department of Computer Science 1 University of California Santa Barbara, CA 93106 *Supported in part by NSF Career, ITR, IDM, and Infrastructure grants, and a gift from Proximex Corp.

Upload: clarence-kelley

Post on 04-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Distributed Video Data Fusion, Distributed Video Data Fusion, Analysis, and Mining for Video Analysis, and Mining for Video Surveillance Applications* Surveillance Applications*

Edward Chang2 and Yuan-Fang Wang1

Department of Electrical and Computer Engineering2

Department of Computer Science1

University of CaliforniaSanta Barbara, CA 93106

*Supported in part by NSF Career, ITR, IDM, and Infrastructure grants, and a gift from Proximex Corp.

Page 2: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Problem Statement Video surveillance with

Multiple cameras Mobile, wireless networks Online data processing Intelligent, computer-assisted content analysis

Focus of current work Event Sensing for

detection representation, and Recognition of motion events

Sensor Network Management for Bandwidth and power resource conservation

Page 3: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Potential Applications and Needs

Applications Emergency search and rescue in natural disaster Deterrence of cross-border illegal activities Reconnaissance and intelligence gathering in

digital battlefields Needs

Rapid deployment, dynamic configuration, and continuous operations

Robust and real-time data fusion and analysis Intelligent event modeling and recognition

Page 4: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

1x

1y1z

2x

2y

2z

mx

my

mz

X

Y

Z

TtZtYtXt ))(),(),(()( P

Ttytxt ))(),(()( 111 p

Internet

Slave station

Masterstation

Validation Scenario

Page 5: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Research and Development Framework

Event detection Far-field coordination and update Near-field sensor data fusion

Event representation Hierarchical – multiple levels of detail Invariant – insensitive to incidental changes

Event recognition Temporally correlated event signature Imbalanced training set

Page 6: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Detection: Near-field Sensor Data Fusion

Sensing coordination and intelligent data fusion

Two-level hierarchy of Kalman filter

Bottom level (feed forward) Summarize trajectories in local

state vectors Merge state vectors from multiple

cameras through registration parameters

Top level (feed backward) Fill in missing or occluded

trajectory pieces Camera pose & frame rate control

)0(

)0(

)0(

)0(

p

p

p

x

P

P

P

X

)()0( tz)()( tiz )()1( tmz

XTxworldreal

)0()0(

)0()0( xTX

realworld

XTxworldreal

mm

)1(

)1(

)1()1(

m

realworld m xTXInternet

Master fusion station

Slave stationSlave station

Slave station

)(

)(

)(

)(

i

i

i

i

p

p

p

x

)1(

)1(

)1(

)1(

m

m

m

m

p

p

p

x

Page 7: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Detection: Far-field Coordination and Update

Minimizing Bandwidth and power consumption under pre-specified accuracy constraints

Dual Kalman filters Update necessary only when

predications diverge Cache dynamic algorithms instead of

static data

Page 8: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Representation

Hierarchical Multiple levels of description

Syntactic level Semantic level

Invariant Descriptors unaffected by incidental changes of

environmental factors and camera pose Consequences

Be able to perform both “intra-class” and “inter-class” recognition

Recognize syntactic similarity (the same trajectory) and semantic similarity (the same type of trajectory)

Page 9: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Representation: Syntactic Level

Normalization against View point (Affine or

perspective) Speed

To derive an invariant signature

Page 10: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Representation: Semantic Level Segmentation based on acceleration Segment characterization Markov chain representation

?0P ierP no

?0V oyes constant? r

Stoppedyes no

Constantvelocity

Right spiral

yes no

yes no

Start

constant?

?0V o

?|| oVP

Left half turn

yes no

yes no

Slow down

?oVP

yes no

Right half turn

0)( zoVP

yes no

Right outwardturn

0)( zoVP

yes no

Rightinwardturn

0 oVP 0 oVP

Left outwardturn

Leftinwardturn

yes no yes no

?0V o

0/ dtd

yes no yes no

Right turn

Left turn

yes no

0/ dtd

Left spiral

yes no

Quickaccelerate

0 oVP

yes no

Quickstart

constant?

?0V o

?|| oVP

Left half Turn w.acc

yes no

yes noEmergency stop

?oVP

yes no

Right half turn w.acc

0)( zoVP

yes no

0)( zoVP

yes no

Rightoutwardturn w acc

0 oVP 0 oVP

yesno yes

no

?0V o

0/ dtd

yes no yes no

Left turn w. acc

yes no

0/ dtd

yes no

0/|| dtd r

Right half turn w.decel

yes

0/|| dtd r

Left half Turn w.decel

yesno no

0/|| dtd r

yes

0/|| dtd r

yesno no

0/|| dtd r

yes

0/|| dtd r

yesno no

Rightoutwardturn w decel

Rightinwardturn w acc

Rightinwardturn w decel

Leftoutwardturn w acc

Leftoutwardturn w decel

Leftinwardturn w acc

Leftinwardturn w decel

0/|| dtd r

yes

0/|| dtd r

yesno no

0/|| dtd r

yes

0/|| dtd r

yesno no

Left turn w. decel

Rightturn w. acc

Rightturn w. decel

Left turn w. acc

Left turn w. decel

RightTurn w. acc

Rightturn w. decel

Page 11: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Representation: Semantic Level (cont.)

Left half turn

Left half turn w. acc

Left half turn w. decel

Left outwardspiral

Left outward spiral w. acc

Left outward spiral

w. decel

Left inwardspiral

Left inward spiral w. acc

Left inward spiral

w. decelConstant velocity

Speed up

Slow down

Left half turn

Left half turn w. acc

Left half turn w. decel

Left outwardspiral

Left outward spiral w. acc

Left outward spiral

w. decel

Left inwardspiral

Left inward spiral w. acc

Left inward spiral

w. decel Constant velocity

Speed up

Slow down

Left half turn

Left half turn w. acc

Left half turn w. decel

Left outwardspiral

Left outward spiral w. acc

Left outward spiral

w. decel

Left inwardspiral

Left inward spiral w. acc

Left inward spiral

w. decel Constant velocity

Speed up

Slow down

Left half turn

Left half turn w. acc

Left half turn w. decel

Left outwardspiral

Left outward spiral w. acc

Left outward spiral

w. decel

Left inwardspiral

Left inward spiral w. acc

Left inward spiral

w. decel Constant velocity

Speed up

Slow down

Page 12: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Recognition: Sequence Data Learning

Similarity measurement difficult Sequence data with temporal correlation may

not have a vector space representation However, kernel methods (e.g., SVM) are

applicable No vector space representation OK But with feature space representation

Use DP algorithm for feature space distance metric Use hierarchical kernel recognition and fusion

Page 13: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Recognition: Imbalanced Data Set Negative samples significantly

outnumber positive samples Bayesian risk associated with

false negative significantly outweighs false positive

Adaptive conformal mapping at decision boundary

Page 14: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Event Recognition: Statistical Modeling

HMM is expensive to build

Not all behaviors are structured (e.g., loitering behaviors)

It may not be necessarily to understand individual activities before recognizing interaction

Distinguish interaction patterns Following Following-and-

gaining Stalking

Page 15: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and
Page 16: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Experimental Results: Syntactic Matching

Page 17: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Experimental Results: Semantic Indexing

Page 18: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Experimental Results: Biased Learning

=TP/(TP+FN)

=TN/(TN+FP)

threshold

penalty

Page 19: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Experimental Results: Statistical Learning

Page 20: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and
Page 21: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Results

Page 22: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Relevant Publications

Many details are omitted Sensor registration (spatial and temporal) Object tracking (Kalman and multi-state) Power management and routing

1. L. Jiao, G. Wu, Y. Wu, E. Y. Chang, and Y. F. Wang, “The Anatomy of A Multi-Camera Video Surveillance System,'' to appear in the ACM Multimedia System Journal.

2. K. Wu, J. Long, D. Han, and Y. F. Wang, “Human Activity Detection and Recognition for Video Surveillance,” Proceedings of IEEE International Conference on Multimedia Computing and Systems, 2004.

3. Edward Chang and Yuan-Fang Wang, "Toward Building a Robust and Intelligent video Surveillance System: A Case Study," (invited paper) Proceedings of the IEEE Multimedia and Expo Conference, Taipei, Taiwan, 2004.

4. R. Rangaswami, Z. Dimitrijevic, K. Kakligian, Edward Chang, and Yuan-Fang Wang, "The SfinX Video Surveillance System," Proceedings of the IEEE Multimedia and Expo Conference, Taipei, Taiwan, 2004.

5. G. Wu, Y. Wu, L. Jiao, Y. F. Wang, and E. Y. Chang, `”Multi-camera Spatio-temporal Fusion and Biased Sequence-data Learning for Security Surveillance,'' Proceedings of ACM Multimedia Conference, Berkeley, CA, 2003.

6. K. Wu, J. Long, D. Han, and Y. F. Wang, “Real-Time Multi-person Tracking in Video Surveillance,” Proceedings of the Pacific Rim Multimedia Conference, Singapore, 2003.

7. Y. Wu, L, Jiao, G. Wu, E. Chang, and Y. F. Wang, “Invariant Feature Extraction and Biased Statistical Inference for Video Surveillance,” Proceedings of the IEEE International Conference on Advanced Video and Signal-based Surveillance, Miami, FL, 2003.

Page 23: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Focus of This Seminar

Video-based face tracking, modeling and recognition

Human activity and interaction analysis

Page 24: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Video-Based Face Tracking & Recognition

Image-based Image normalization Feature selection Face recognition

Video-based Face region detection Tracking Face modeling and recognition

Page 25: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Difficulties

Quality of video is low Large illumination, pose variation, occlusion

Face images are small Compared to still image-based system

Model construction and fitting Generic vs. personal-specific 2D vs. 3D

Page 26: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Proposed Approach: Resolution Enhancement

Exploit multiple image frames and spatial coherency Single camera super-resolution (digital zoom) Multi-camera (master-slave) face region detection and

zooming (optical zoom) Need feature appearance (PCA + LDA) and

geometrical relations

Page 27: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

General Framework: Visual Servoing

A Feedback control mechanism Reference and real signals are computed

from images

- J-1 Camera Control +

External Disturbance

New Image

FeatureDection

Referencesignal

Realsignal

Errorsignal

Controlsignal

Page 28: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Master-Slave Combo Setup

slaveslaveslaveworldworldmastermaster

slaveworldslaveworldmastermasterworldworldslaveslave

worldworldmastermaster

zf

ff

pTTp

pTTpPTp

PTp

),,,(

),,(),,(

1

X

Y

Z

X

X

Y

YZ Z

fslavep

),,,( slaveslaveworld zfT

worldmasterT

Page 29: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Mater: Anatomy-Guided Face Modeling

Face region localization based on anatomy Face region detection based on skin color

segmentation Face region modeling based on ellipse fitting Face region tracking using mean-shift tracker

X

YZ

worldmasterT

X

Y

Z

Page 30: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Slave: Master-Guided Zooming

X

Y

Z

X

X

Y

YZ Z

fslavep

),,,( slaveslaveworld zfT

worldmasterT

Page 31: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

What’s Next?

View-based recognition Frontal-view detection Multi-frame evidence aggregation 3D model (?)

Page 32: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Single Camera Super resolution

Multiple, spatially-coherent frames as down-sampled, low-resolution (LR) images of original high-resolution (HR) images

Page 33: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Mathematically

)(

,)(

2,)(

1,)(

,1)(

12)(

11

)()(2

)(1

)(1

)(12

)(11

kncmc

kmc

kmc

knc

kk

kmn

km

km

kn

kkk

kkkkk

IIIIII

IIIIII

I

I

nITBDI

Three components: Spatial registration function

(T) Blurring function (B) Down-sampling function

(D) c: down-sampling factor

Page 34: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Spatial Registration Function

Modeled as affine transform Capture translation, rotation, and zooming In reality, only translation motion has been

successfully demonstrated

yyy

xxxk cba

cbaT

Page 35: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Blurring Function

Modeled as Gaussian kernel Caveats:

point spread function (blurring) function may not be known and is wave-length dependent

Diffraction effect induces ripples and is better modeled with Besel functions

Page 36: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Numerical Solution

Large system of equations Require preconditioning

Not sure that it will work in the real world Simpler mechanism (e.g., bi-linear

interpolation) exists with inferior performance

Optical zoom instead of digital zoom

Page 37: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Schedule 9/29: overview 10/6: Dan: face recognition overview 10/13: no meeting (research travel) 10/20: Dr. Kang 10/27: 11/3: 11/10: 11/17: 11/24:

Video-based face modeling and recognition Super resolution

Multiple images Space-time

Human activity/interaction analysis

Page 38: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and

Video-based face modeling and recognition Super resolution

Multiple images Space-time

Human activity/interaction analysis