a system for detecting anomalies in data streams for ...dddas/papers/alec_proposal_slides.pdfonline...

39
A System for Detecting Anomalies in Data Streams for Emergency Response Applications Alec Pawling Overview Detection and Alert System Real-Time Data Source Conclusion A System for Detecting Anomalies in Data Streams for Emergency Response Applications Alec Pawling University of Notre Dame October 2, 2007

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

A System for Detecting Anomalies inData Streams for Emergency Response

Applications

Alec Pawling

University of Notre Dame

October 2, 2007

Page 2: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Proposed Research

WIPER

Detection and Alert

System

Real-Time Data Source

Conclusion

Outline

Overview

Proposed Research

WIPER

Detection and Alert System

Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection

Real-Time Data Source

Conclusion

Page 3: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Proposed Research

WIPER

Detection and Alert

System

Real-Time Data Source

Conclusion

Overview

Proposed Research

Fast, online anomaly detection in streaming sensordata

Non-relational dataRelational data

Real-time data aggregation and distribution tovarious system components

Motivation

Wireless Phone-based Emergency Response System(WIPER)

Page 4: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Proposed Research

WIPER

Detection and Alert

System

Real-Time Data Source

Conclusion

Wireless Phone-Based Emergency ResponseSystem (WIPER)

Emergency Response System

Provide decision support to emergency responsemanagers

Cell phones as sensors

Page 5: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Proposed Research

WIPER

Detection and Alert

System

Real-Time Data Source

Conclusion

Wireless Phone-Based Emergency ResponseSystem (WIPER)

Page 6: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Outline

Overview

Detection and Alert System

Online Anomaly Detection via Clustering

Problem Definition

Related Work

An Online Hybrid Clustering Algorithm

Datasets

Experimental Setup

Results

Proposed Research

Link Sampling and Anomalous Link Detection

Real-Time Data Source

Conclusion

Page 7: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Problem Definition

Problem:

How can we detect anomalies in streaming cellphone transaction data?

Challenges:

Lots of data

Limited time for detecting anomalies

Page 8: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Proximity Based Anomaly Detection

Makes no assumptions about data distribution

Anomalous points are far from other points (specificdefinitions vary from application to application)

Computationally expensive

Clustering can be used to reduce computationalcomplexity

Page 9: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Approaches to Data Clustering [Jain, Murty, and Flynn,1999]:

Hierarchical Clustering

Iteratively split/merge clustersComputationally expensive

Partitional Clustering

Divides the data into disjoint subsetsRelatively efficientAssumes prior knowledge of the number of cluster;prone to finding local maxima

Incremental Clustering

Consider examples one at a time; update clustersEfficient

Page 10: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Leader Algorithm [Hartigan, 1975]

For each data example

Locate the closest cluster center.If the distance between the example and the clustercenter is less than a user defined threshold

Add the example to the cluster.

Otherwise, create a new cluster centered at theexample.

Page 11: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Hybrid Clustering: combination of two clusteringalgorithms

Cheu et al. 2004: Use partitional algorithms toreduce data set for hierarchical algorithms

Chipman and Tibshiran 2006: Combine bottom upalgorithms with top down algorithms

Page 12: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

An Online Hybrid Clustering Algorithm

For each example ~x :

Find the closest cluster Ci

Let ~µi be the centroid of Ci

Let ~σi standard deviations of the features of Ci

If d(~x , ~µi ) < l |~σi |, add ~x to Ci

Otherwise, add ~x to the set of unclustered examples

If there are km examples in the unclustered set:

Cluster the unclustered examples using k-meansFor each cluster with m or more examples:

Accept the cluster

For each cluster with less than m examples:

Return its examples to the unclustered set

Page 13: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Experimental Setup

Dataset:

Real world data:

12 days of cell phone network transaction dataDiscretized into 1 minute intervals18721 examples

Feature vector:

Timestamp: hour and minuteNumber of times each service is used in the interval

5 services

Evaluation:

Compare hybrid algorithm to 1-NN anomalydetection

Page 14: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Results

Ful

lT

rial 2

Tria

l 5T

rial 8

0 500 1000 1500 2000 2500

Pairwise distances

Figure: Distribution of distances between outliers and theirnearest neighbor.

Page 15: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Proposed Research

New first level clustering algorithm:

Deterministic, hierarchical

Additional analysis of clusters:

Movement of clusters

Rate at which examples are added to clusters

Page 16: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Outline

Overview

Detection and Alert System

Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection

Problem Definition

Related Work

Datasets

Implementation Details

Experimental Setup

Results

Conclusions

Proposed Research

Real-Time Data Source

Conclusion

Page 17: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Problem Definition

Problem:

How does sampling a graph (network) affect ourability to identify anomalous edges (links)?

Challenges:

Large graphs

Limited time

Limited memory

Page 18: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Sampling Networks

“Subnets of Scale-Free Networks are not Scale-Free”[Stumpf et al., 2005]

Sampling a network changes often changes itscharacteristics in predictable ways. [Lee et al., 2006]

Sampling from Streams

Sliding window: only contains most recent items inthe stream

Uniform sample [Vitter, 1985]: all items in thestream have equal probability of being retained bythe sample

Biased sample [Aggarwal, 2006]: compromisebetween sliding window and uniform sample

Page 19: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Anomalous Link Detection [Rattigan and Jensen, 2005]

Goal: Identify “surprising” edges in a graph

Methods from link prediction literature[Liben-Nowell and Kleinberg, 2003]

For each edge, (u, v), in the graph, compute theproximity of u and v

Anomalous links have a proximity below somethreshold

Two general approaches:

Neighborhood based methodsPath based methods

Page 20: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Related Work

Neighborhood based methods. Let Γ(u) be the setof vertices that are connected to u by an edge

Common neighbors: the number of neighbors sharedby u and v

|Γ(u) ∩ Γ(v)|

Jaccard’s coefficient: the probability that a neighborof u or v is a neighbor of both u and v

|Γ(u) ∩ Γ(v)|

|Γ(u) ∪ Γ(v)|

Path based method

Rooted PageRank: the probability that a randomwalk starting at u will reach v if the walk fails ateach step with some probability

Page 21: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Datasets

Cell phone network: transactions initiated bymembers of a single service provider

SMS: one day of text message transactionsPhone: one day of call transactions

Enron: snapshot of Enron email server. Containsemails to and from @enron.com addresses, May 10,1999 to January 31, 2002

vertices transaction edges

SMS (1 day) 2,350,793 3,339,708 1,597,818Call (1 day) 6,261,633 8,019,290 5,243,128Enron 25,854 1,033,638 201,243

Page 22: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Implementation Details

Implementation is straightforward for commonneighbors and Jaccard’s coefficient

Rooted PageRank is typically determined using thestationary distribution of a Markov Chain

Stationary distribution is computed by repeatedmatrix multiplicationsMatrices for the SMS and call datasets are too largeto store in main memory

We use a series of random walks to approximaterooted PageRank

Bound the walk length using a geometricdistributionTotal number of random walks is based on theaverage degree of the graph

Page 23: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Experimental Setup

Three sampling methods: sliding window, uniformsampling, and biased sampling

Three anomalous link detection methods: commonneighbors, Jaccard’s coefficient, and rootedPageRank

Sample sizes range from 10% to 90% of thetransactions

Evaluate using Spearman’s rank correlation

Page 24: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Results

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Ran

k C

orre

latio

n

Fraction of Data Set

Uniform sampleBiased sampleSliding window 0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Ran

k C

orre

latio

n

Fraction of Data Set

Uniform sampleBiased sampleSliding window

Figure: Rank correlations for call dataset. Left: Jaccard’scoefficient. Right: rooted PageRank.

Page 25: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Observations

Rooted PageRank performs better on smaller samples

Rooted PageRank is computationally expensive

Better to use Jaccard’s coefficient with larger samples.

Page 26: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Online Anomaly Detectionvia Clustering

Link Sampling andAnomalous Link Detection

Real-Time Data Source

Conclusion

Proposed Research

Extract and analyze city level subgraphs

Investigate changes in Jaccard’s coefficientdistribution over time

Page 27: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Outline

Overview

Detection and Alert System

Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection

Real-Time Data Source

Overview

Prototype Implementation

Experimental Setup

Results

Conclusions

Proposed Research

Conclusion

Page 28: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Overview

Motivation:

Use existing cell phone network as a sensor network

Advantages:

Cheap deployment

Disadvantages:

No control over the network

Goal:

Receive transaction data from the cellular serviceprovider

Summarize and distribute data to clients (DSS,DAS, SPS) in real-time

Page 29: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Overview

Incoming data:

Time at which service was initiated

The network service used

Anonymized values indicating people involved inusing the service

Towers involved in providing the service

Outgoing data:

Stream of interval summaries

Each item in the stream consists of

A timestamp indicating the end of the intervalA vector containing the number of times eachservice was used in the interval

Clients specify interval length

Page 30: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Prototype Implementation

Ruby:

Interpreted language

Web-services support

Multi-threading support with large priority space

Assumption:

Data from service provider arrives in order

Periodic Task Model:

Periodic tasks: send data to clients

For each client: a task executes at the end of everyintervalDeadline is the end of the next interval

Aperiodic tasks: maintain interval summaries

Page 31: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Experimental Setup

Setup:

2 to 24 clients

Task periods of 0.05, 0.06, 0.07, 0.08, 0.09 seconds

Constant transaction streams: 100 transactions /second

Four evaluation measures:

the rate of missed deadlines

the rate of skipped tasks

the average delay for the periodic tasks

the correctness of the data source output

Page 32: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Results

Observations:

System fails (incorrect output) with a low utilization(≈ 0.26)

In many cases, tasks were released after deadline,skipped

Conclusion:

Periodic task model is too inflexible for this system

Page 33: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Proposed Research

Use rate-based execution model [Jeffay andGoddard, 1999]

Parameterize with:

Maximum expected aperiodic task rate

Desired aperiodic task response time

When aperiodic task rate exceeds maximumexpected rate:

Deadlines shift, response time decays

Remove assumption that transaction stream arrivesin order

Sporadic tasks with dynamic release times todistribute summaries to clientMinimize data loss, minimize delay

Page 34: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Summary of ProposedResearch

Published Papers

Proposed Schedule

Outline

Overview

Detection and Alert System

Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection

Real-Time Data Source

Conclusion

Summary of Proposed Research

Proposed Schedule

Page 35: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Summary of ProposedResearch

Published Papers

Proposed Schedule

Summary of Proposed Research

Detection and Alert SystemOnline Anomaly Detection via Clustering

Extend hybrid clustering algorithm into a

streaming algorithm

Link Sampling and Anomalous Link Detection

Identify feasible methods for reducing graph data

for online analysis

Identify graph features that can be quickly

computed and allow the identification of

anomalous behavior in graphs

Real-Time Data Source

Develop a real-time system for distributingsummaries of streaming transaction data to clientsHandle out of order data arrival dynamicallyOnline minimization of dropped data andpropagation delay

Page 36: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Summary of ProposedResearch

Published Papers

Proposed Schedule

Published Papers

Online Anomaly Detection via Clustering:

Proceedings of the North American Association forComputational Social and Organization Science,2006. (Best student paper.)

Computational and Mathematical OrganizationTheory. To appear.

Page 37: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Summary of ProposedResearch

Published Papers

Proposed Schedule

Proposed Schedule

Detection and Alert System

Online Anomaly Detection via Clustering

New conference paper early in 2008New journal paper late in 2008

Anomaly Detection in Graphs

Conference submission (SIAM) in October 2007Additional conference submission early in 2008Journal submissions in late 2008 or early 2009

Real-Time Data Source:

Conference submission in mid 2008 (describing acompletely redesigned and rebuilt system)

Journal submission in early 2009

Dissertation Defense: March 2009

Page 38: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Summary of ProposedResearch

Published Papers

Proposed Schedule

Acknowledgments

The material presented here is based in part upon worksupported by the National Science Foundation, theDDDAS Program, under grant No. CNS-050348.

The committee:

Dr. Chaudhary

Dr. Chawla

Dr. Poellabauer

The outside chair:

Dr. Hachen

My advisor:

Dr. Madey

Page 39: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time

A System for Detecting

Anomalies in Data

Streams for Emergency

Response Applications

Alec Pawling

Overview

Detection and Alert

System

Real-Time Data Source

Conclusion

Summary of ProposedResearch

Published Papers

Proposed Schedule

Questions?