anomaly detection using projective markov models

Anomaly Detection Using Projective Markov Models in a Distributed Sensor Network

Sean MeynDepartment of Electrical and Computer Engineeringand the Coordinated Science Laboratory University of Illinois

Joint work with Amit Surana, Yiqing Lin, and Satish Narayanan, United Technologies Research Center

Acknowledgements: Research supported by United Technologies Research Center and the National Science Foundation, CCF 07-29031

Outline

• Detection in a Sensor Network

• Multiple Models for Distributed Detection

• Application to a Building Security

• Detection in a Sensor Network

• Multiple Models for Distributed Detection

• Application to a Building Security

IDetection in a Sensor Network

Problem Statement

Detect anomalous behavior based on

• A large number of heterogeneous sensors • A large heterogeneous region • Partial information regarding anomalous behavior • Complex behavior with or without anomaly

Interest at UTRC: building monitoring for security and energy efficiency

Challenges and Resolution

1. Model for anomalous behavior 2. Complexity of normal behavior 3. Lack of coordinated action, or communication constraints 4. High variance associated with detection

Challenges and Resolution

Approach: Projective Markov models address 1 - 3. Paramaterized models address 4.

IIMultiple Models for Distributed Detection

Qβ∗( )

Q ( )ηπ0

π1π1

Binary Hypothesis Testing - Geometric View

For ease of explanation only: The classical i.i.d. setting

For starters: Binary hypothesis testing

Z is an i.i.d. sequence on a finite state space

marginal under model of normal behaviorπ0

marginal under model of anomalous behaviorπ1

Optimal test: Under Neyman-Pearson or Bayesian criteria,Optimal test: Under Neyman-Pearson or Bayesian criteria,

Z is an i.i.d. sequence on a finite state space

marginal under model of normal behaviorπ0

marginal under model of anomalous behavior

Log-likelihood Ratio Test Log-likelihood Ratio

L = log(dπ1/dπ0)φ(ZT1 ) = I

L(Z(t)) ≥ τ

Geometry:Geometry:

L(Z(t)) ≥ τ

Qβ∗( )

Q ( )η

π1π1

Separating hyperplane

Geometry:Geometry:

L(Z(t)) ≥ τ

Qβ∗( )

Q ( )η

π1π1

Separating hyperplane:

∫L(z)µ(dz) = τ

Qη(π0) = µ : D(µ‖π0) < η{ }

Geometry:Geometry:

LLR test: Declare Anomaly if empirical distributions lie outside of lower half spaceLLR test: Declare Anomaly if empirical distributions lie outside of lower half space

Qβ∗( )

Q ( )η

π1π1

∫L(z)µ(dz) = τ

ΓT (z) =:1

I Z(t) = z , z ∈ Z{ }

Universal Detection

Anomalous behavior is not modeledAnomalous behavior is not modeled

Alarm is sounded if empirical distribution lies outside divergence nbd

Q ( )η π0

φ(ZT1 ) = I ΓT Q∈� η(π0)

= I D(ΓT ‖π0) ≥ η

Universal Detection

Good news: For large T, performance approaches optimality of LLR testGood news: For large T, performance approaches optimality of LLR test

π0Q ( )η π0

φ(ZT1 ) = I ΓT Q∈� η(π0)

= I D(ΓT ‖π0) ≥ η

Universal Detection

Good news: For large T, performance approaches optimality of LLR testGood news: For large T, performance approaches optimality of LLR test

Bad news: For finite T, variance of statistic grows linearly with size of observation alphabet Z

π0Q ( )η π0

φ(ZT1 ) = I ΓT Q∈� η(π0)

= I D(ΓT ‖π0) ≥ η

D(ΓT ‖π0)

Universal Detection - Multiple Models

Suppose we extract features of the observations,Suppose we extract features of the observations,

Selected based onSelected based on

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

Optimal combination of features for optimal detection:Optimal combination of features for optimal detection:

φ(ZT1 ) = I Γi(T ) Q∈� η(π0

i ) for some i{ }

= I{ΓT �∈

Qη(π0i )

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

Geometry:Geometry:

Qη(π01)

Qη(π02)

φ(ZT1 ) = I Γi(T ) Q∈� η(π0

i ) for some i{ }

= I{ΓT �∈

Qη(π0i )

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

Geometry:Geometry:

Qη(π01)

Qη(π02)

φ(ZT1 ) = I Γi(T ) Q∈� η(π0

i ) for some i{ }

= I{ΓT �∈

Qη(π0i )

Safe RegionSafe Region

Markov Models

KL Divergence is replaced by relative entropy rateKL Divergence is replaced by relative entropy rate

wherewhere andand are the distributions of are the distributions of

assumed Markovian, with transition matrices Q and Passumed Markovian, with transition matrices Q and P

J(Q‖P ) = limn→∞

nD(γ(n)‖π(n))

γ(n) π(n)

= D(γ(2)‖π(2)) − D(γ‖π)

(Z(1), . . . , Z(n))

Equal under Markov assumption

Markov Models

KL Divergence is replaced by relative entropy rateKL Divergence is replaced by relative entropy rate

Local models? Shannon-Mori-Zwanzig projection:Local models? Shannon-Mori-Zwanzig projection:

wherewhere andand are the distributions of are the distributions of

assumed Markovian, with transition matrices Q and Passumed Markovian, with transition matrices Q and P

J(Q‖P ) = limn→∞

nD(γ(n)‖π(n))

γ(n) π(n)

= D(γ(2)‖π(2)) − D(γ‖π)

(Z(1), . . . , Z(n))

P (x, y) :=π(2)(x, y)

Equal under Markov assumption

Markov Models

Option A: Shannon-Mori-Zwanzig projection:Option A: Shannon-Mori-Zwanzig projection:

Option B: Parameterized models:Option B: Parameterized models:

θ chosen using ML estimationθ chosen using ML estimation

Local models? Two approaches:Local models? Two approaches:

P (x, y) :=π(2)(x, y)

π(2)θ (x, y) = eθTψ(x,y), θ ∈ Rm

Markov Models

Option A: Shannon-Mori-Zwanzig projection:Option A: Shannon-Mori-Zwanzig projection:

Option B: Parameterized models:Option B: Parameterized models:

Advantage of option B: Variance grows with dimension m, not the cardinality of the observation spaceAdvantage of option B: Variance grows with dimension m, not the cardinality of the observation space

θ chosen using ML estimationθ chosen using ML estimation

Local models? Two approaches:Local models? Two approaches:

P (x, y) :=π(2)(x, y)

π(2)θ (x, y) = eθTψ(x,y), θ ∈ Rm

IIIApplication to Building Security

Building Testbed at UTRC

Eleven Markov models for occupancy based on eleven zonesEleven Markov models for occupancy based on eleven zones

Option A: Empirical Markov model Option A: Empirical Markov model

Option B: Queueing modelOption B: Queueing model

Video cameraVideo camera

ZoneZone

[Smith and Towsley, 1981][Smith and Towsley, 1981]

2 3 4 6

Experiment Architecture12

2 3 4 6

Scenarios:Scenarios:

Capture a range of unusual traffic patterns in a buildingCapture a range of unusual traffic patterns in a building

1 Convergence: Numerous occupants converge to a single zone

2 Divergence: Numerous occupants leave a single zone

3 Idleness: Numerous occupants converge to a single zone

4 Loitering: Numerous occupants converge to a single zone

5 High occupancy: Higher than normal occupancy in combined zones

1 Convergence: Numerous occupants converge to a single zone

2 Divergence: Numerous occupants leave a single zone

3 Idleness: Numerous occupants converge to a single zone

4 Loitering: Numerous occupants converge to a single zone

5 High occupancy: Higher than normal occupancy in combined zones

Typical ROC Curves12

2 3 4 6

0.2 0.4 0.6 0.80.2 0.4 0.6 0.80

1Empirical model

Test statistic: Based on a moving window of length δ

Semi-empirical model

R(t) =

J(Γ(2)δ0,t

Γ(2)δ0,t

‖π(2)) Empirical bivariate empirical distribution

k=t−δ0+1

log �t,δ0 �t,δ0Semi-empirical Likelihood ratio using ML estimate,( )

Typical ROC Curves12

2 3 4 6

0.2 0.4 0.6 0.80.2 0.4 0.6 0.80

1Empirical model

Test statistic: Based on a moving window of length δ

Semi-empirical model

R(t) =

J(Γ(2)δ0,t

Γ(2)δ0,t

‖π(2)) Empirical bivariate empirical distribution

k=t−δ0+1

log �t,δ0 �t,δ0Semi-empirical Likelihood ratio using ML estimate,( )

�t,δ0 :=Pt,δ0(Z(k), Z(k + 1))

P (Z(k), Z(k + 1))

Centralized Detection12

2 3 4 6

Empirical Statistic

Delay is similar using either test

Semi-empirical statistic is far more disciminating

30 40 50 600

Semi-empirical Statistic

Anomalous episode:Convergence to zone 6

Decentralized Detection: Divergence12

2 3 4 6

Empirical Statistic

Delay is similar using either test

Many false alarms from empirical statistic

Anomalous episode:Divergence from zone 5

10 20 30 40

Decentralized Detection: Occupancy12

2 3 4 6

30 40 50 60

Empirical Statistic

Empirical statistic clairvoyant?

Missed detection using empirical statistic

Anomalous episode:10% higher occupancy in zones 5 and 6

ConclusionsEmpirical Statistic

Contributions:

• Feasibility of an anomaly detection framework using projected Markov models.

• Advantages of semi-empirical Markov models

ConclusionsEmpirical Statistic

Current research:

• Feature selection for distributed detection

• Active learning - e.g., query for additional data

• Diagnosis

• Response

Contributions:

• Feasibility of an anomaly detection framework using projected Markov models.

• Advantages of semi-empirical Markov models

References

[1,3,4,5,6,8] Geometry

[2,3,4,6,8] Universal detection

[4,7] Variance in detection

and parameter estimation

.razsisC.I]1[ I-divergence geometry of probability distributionsand minimization problems. Ann. Probab., 3:146–158, 1975.

[2] O. Zeitouni and M. Gutman. On universal hypotheses testingvia large deviations. IEEE Trans. Inform. Theory, 37(2):285–290, 1991.

[3] C. Pandit and S. P. Meyn. Worst-case large-deviations withapplication to queueing and information theory. Stoch. Proc.Applns., 116(5):724–756, May 2006.

[4] J. Unnikrishnan, D. Huang, S. Meyn, A. Surana, and V. Veer-avalli. Universal and composite hypothesis testing via mis-matched divergence. CoRR and submitted for publication,IEEE Trans. IT., abs/0909.2234, 2009.

[5] S. Borade and L. Zheng. I-projection and the geometryof error exponents. In Proceedings of the Forty-FourthAnnual Allerton Conference on Communication, Control, andComputing, Sept 27-29, 2006, UIUC, Illinois, USA, 2006.

[6] E. Abbe, M. Medard, S. Meyn, and L. Zheng. Finding thebest mismatched detector for channel coding and hypothesistesting. Information Theory and Applications Workshop, 2007,pages 284–288, 29 2007-Feb. 2 2007.

[7] B. S. Clarke and A. R. Barron. Information-theoretic asymp-totics of Bayes methods. IEEE Trans. Inform. Theory,36(3):453–471, 1990.

[8] D. Huang, J. Unnikrishnan, S. Meyn, V. Veeravalli, andA. Surana. Statistical SVMs for robust detection, supervisedlearning, and universal classification. In Proceedings of theInformation Theory Workshop on Networking and InformationTheory, Volos, Greece., 2009.

anomaly detection using projective markov models

Education

projective rectification with minimal geometric...

projective windows: arranging windows in space using...

finite projective planes by aaron walker wagner geometry...

anomaly detection in the wiper system using a markov...

projective geometry

projective cameras motivation elements of projective...

projective geometryavalonlibrary.net/ebooks/nick thomas -...

projective geometry on manifolds contentswmg/pgom.pdf ·...

sparse gaussian markov random field mixtures for anomaly...

hidden markov anomaly detection -...

projective geometries, projective fictions: reading robin

projective tests

towards adaptive anomaly detection systems using boolean...

projective technique

projective geometry: paradoxes, polarities,...

projective trig

projective geometry for photogrammetric orientation · pdf...

anomaly subspace detection based on a multi-scale markov...

lightweight collaborative anomaly detection for the iot...

probabilistic program modeling for high-precision anomaly...