spatio-temporal frequent pattern mining for public safety: concepts and techniques

35
Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan * Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava *Contact: [email protected]

Upload: tacey

Post on 05-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques. Pradeep Mohan * Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Pradeep Mohan*

Department of Computer ScienceUniversity of Minnesota, Twin-Cities

Advisor: Prof. Shashi ShekharThesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava

*Contact: [email protected]

Page 2: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Biography Education

Ph.D., Student, Department. of Computer Science and Engineering., University of Minnesota, MN, 2007 – Present.

B. E., Department. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. 2003-2007

Major Projects during PhDUS DoJ/NIJ- Mapping and analysis for Public Safety

CrimeStat .NET Libaries 1.0 : Modularization of CrimeStat, a tool for the analysis of crime incidents.

Performance tuning of Spatial analysis routines in CrimeStat CrimeStat 3.2a - 3.3: Addition of new modules for spatial analysis.

US DOD/ ERDC/ TEC – Cascade models for multi scale pattern discoveryDesigned new interest measures and formulated pattern

mining algorithms for identifying patterns from large crime report datasets.

1

Page 3: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Thesis Related PublicationsCascading spatio-temporal pattern discovery (Chapter 2) P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern

discovery: A summary of results. In Proc. Of 10th SIAM International Conference on Data Mining 2010 (SDM 2010, Full paper acceptance rate 20%)

P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering (TKDE). (Accepted Regular Paper, In Press ~20% Acceptance Rate)

Regional co-location pattern discovery (Chapter 3)

P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers, Z.Jiang, N.Wayant. A spatial neighborhood graph based approach to Regional co-location pattern discovery: summary of results. In Proc. Of 19th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%)

Crime Pattern Analysis Application (Chapter 4)

S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou. Crime pattern analysis: A spatial frequent pattern mining approach. M. Leitner (Ed.), Crime modeling and mapping using Geospatial Technologies, Springer (Accepted with Revisions).

2

Page 4: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Outline

IntroductionMotivation

Problem Statement

Future Work

Our Approach

4

Page 5: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Motivation: Public Safety

Identifying events (e.g. Bar closing, football games) that lead to increased crime.

Crime generators and attractors

Identifying frequent crime hotspots

Law enforcement planning

Predicting crime events

Predictive policing (e.g. Predict next location of offense, forecast crime levels around conventions etc.)

Predicting the next location of burglary.Courtsey: www.startribune.com

Question: What / Where are the frequent crime generators ?

Question: Where are the crime hotspots ?

Question: What are the crime levels 1 hour after a football game within a radius of 1 mile ?

5

Other Applications: Epidemiology

Courtsey: https://www.llnl.gov/str/September02/Hall.html

Page 6: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Scientific Domain: Environmental Criminology

Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16

Crime pattern theory Routine activity theory and Crime Triangle

Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepnum=8

Crime Event: Motivated offender, vulnerable victim (available at an appropriate location and time), absence of a capable guardian.

Crime Generators : offenders and targets come together in time place, large gatherings (e.g. Bars, Football games) Crime Attractors : places offering many criminal opportunities and offenders may relocate to these areas (e.g. drug areas)

6

Page 7: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Outline

Introduction

Future Work

Our Approach

Problem Statement Spatio-temporal frequent pattern mining problem Challenges

7

Page 8: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Spatio-temporal frequent pattern mining problem

Given : Spatial / Spatio-temporal framework. Crime Reports with type, location and / or time. Spatial Features of interest (e.g. Bars). Interest measure threshold (Pθ) Spatial / Spatio-temporal neighbor relation.

Find: Frequent patterns with interestingness >= Pθ

Objective : Minimize computation costs.

Constraints : Correctness and Completeness. Statistical Interpretation (i.e. account for autocorrelation or

heterogeneity)

8

Page 9: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Illustration: Output

Cascading ST Patterns (Inputs: Spatial, Temporal Neighborhood - 0.5 miles, 20 mins, Threshold - 0.5)

Regional Co-location patterns (Inputs: Spatial Neighborhood – 1 mile, Threshold- 0.25)

Aggregate(T1,T2,T3)

Time T1

Assault(A)

Drunk Driving (C)

Bar Closing(B)

Time T3>T2Time T2 > T1

a

B A

C

CSTP: P1

9

Page 10: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Challenges

Spatio-temporal Semantics

Continuity of space / time Partial order

Conflicting Requirements Statistical Interpretation Computational Scalability

Computational Cost Exponential set of Candidate patterns

Time T1 Time T3>T2Time T2 > T1

B.2

B.1 C.2C.3C.1

C.4

A.1

A.3

A.2A.4

A.5

a

Aggregate(T1,T2,T3)

B.1

B.2

A.2A.4

C.2

C.3

C.4

A.5

C.1

A.1

A.3

Time partitioning misses relationships

Space partitioning misses relationships

{Null}

A B A C B A B C C A C B

C

B A

B

C A

C

B A

A

B C

C

A B……….……….

C

A B

B

A C

A

B C

C

B A

B

C A

A

C B

# Patterns = Exponential (# event types)

10

Page 11: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Our Contributions

11

New Spatio-temporal frequent pattern families. Ex: Cascading ST Patterns and Regional Co-location patterns.

Novel interest measures guarantee statistical interpretation and computable in polynomial time.

Scalable algorithms based on properties of spatio-temporal data and interest measures.

Experimental evaluation using synthetic and real crime datasets.

Page 12: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Outline

Introduction

Future Work

Problem Statement

Our Approach Big Picture Cascading Spatio-temporal pattern discovery Other Frequent Pattern Families

12

Page 13: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Cascading ST pattern (CSTP)

Output: CSTP

Partially ordered subsets of ST event types.

Located together in space.

Occur in stages over time.

B A

C

CSTP: P1

Aggregate(T1,T2,T3)

Time T1

Assault(A) Drunk Driving (C)

Bar Closing(B)

Time T3>T2Time T2 > T1

a

Input: Crime reports with location and time.

14

Page 14: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

16

Related Pattern Semantics: ST Data mining

Spatio-temporal frequent patterns

Partially OrderedOthers

Unordered(ST Co-occurrence)

Totally Ordered(ST Sequences)

Our Work(Cascading ST patterns )

ST Co-occurrence [Celik et al. 2008, Cao et al. 2006] Designed for moving object datasets by treating trajectories as location time series Performs partitioning over space and time.

ST Sequence [Huang et al. 2008, Cao et al. 2005 ]Totally ordered patterns modeled as a chain. Does not account for multiply connected patterns(e.g. nonlinear) Misses non-linear semantics. No ST statistical interpretation.

15

Page 15: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Interpretation Model: Directed Neighbor Graph (DNG)

Nodes: Individual Events

Directed Edge (N1 N2) iff Neighbor( N1, N2) and After(N2, N1)

B.2

B.1C.2

C.3C.1 C.4A.1

A.3

A.2A.4

TimeT1 TimeT3TimeT2

Assault(A) Drunk Driving (C)

Bar Closing(B)

A.5

A.1

A.2

B.1

C.2

C.3A.3

A.4

C.1

C.4

B.2A.5

B A

C

CSTP: P1

17

Page 16: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Statistical Foundation: Interest Measures Instances of CSTP P1 : (BA, BC, AC) are

(B1A1, B1C1, A1C1) (B1A3, B1C2, A3C2) ? ?(B1A1; A1 C2; B1 C2)

Cascade Participation Ratio : CPR (CSTP, M) : Conditional Probability of an instance of CSTP in neighborhood, given an instance of event-type M

Examples

Cascade Participation Index: CPI(CSTP) Min ( CPR(CSTP, M) ) over all M in CSTP Example:

5.04

2),( CCSTPCPR

B A

C

CSTP: P1

A.1

A.2

B.1

C.2

C.3A.3

A.4

C.1

C.4

B.2A.5

18

Page 17: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Analytical Evaluation: Statistical Interpretation

Cascade Participation Index (CPI) is an upper bound to the ST K-Function per unit volume.

Example:

ST -K (B A) 2/6 = 0.33 3/6 = 0.5 6/6 = 1

CPI (B A) 2/3 = 0.66 1 1

A.1

A.3

B.1

A.2B.2

A.1

A.3

B.1

A.2B.2

A.1

A.3

B.1

A.2B.2

Spatial Statistics: ST K-Function (Diggle et al. 1995)

K^

AB (h, t) = 1(S.T )⋅

1λ A ⋅λ B

⋅ Iht (d(Ai,B j ),td (Ai,B j ))j∑

i∑

20

Page 18: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Comparison with Related Interest MeasuresMeasure Key Property

Frequency Double counting of pattern instances

Maximum Independent Set (MIS) Size [Kuramochi and Karypis, 2004]

NP Complete

Scoring Criterion for Bayesian Networks [Neopolitan, 2003; Chickering, 1996]

NP Complete Learning requires Prior specification

Lower bound on vertex label frequency Frequency based interpretation.

B A

C

CSTP: P1Measure Value

Frequency 3 / (What is the # of transactions ?)

MIS 2

Lower Bound on Frequency

min{1,2,2} = 1

A.1

A.2

B.1

C.2

C.3A.3

A.4

C.1

C.4

B.2A.5

19

Page 19: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Computational Structure: CSTP Miner Algorithm

Basic Idea Initialization

for k in (1,2…3..K-1) and prevalent CSTP found do

Generate size k candidates.

Compute CSTP instances / Materialize part of DNG

Calculate interest measure and select prevalent CSTPs.

end

Not part of a conventional apriori setting

Item sets in Association rule mining Chemical compounds/sub graphs in graph mining. Directed acyclic graph in CSTP mining

21

Page 20: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

CSTP Miner Algorithm: Illustration{Null}

A B A C B A B C C A C B

C

B A

C

B A

A.1

A.2

B.1

C.2

C.3A.3

A.4

C.1

C.4

B.2A.5

CPI Threshold = 0.33

0 0.4 0.8 0.75 0.2 0

C

B A

C

B A

0.4 0.4 0.8

0.4

Spatio-temporal join

22

Page 21: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Key Bottlenecks

Computational Structure: CSTP Miner Algorithm

Interest measure evaluation

Exponential pattern space

Space-Time Partition Join Strategy

Time Ordered Nested Loop Strategy

Filtering strategies

Fixed Parameters: Spatial neighborhood = 0.62 miles and temporal neighborhood = 1hr, CPI threshold = 0.0055

Computational Strategies

Reduce irrelevant interest measure evaluation

Compute interest measure efficiently

23

Page 22: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

CSTP Miner Algorithm: Interest Measure Evaluation ST Join Strategies: Perform each interest measure computation efficiently

Time Ordered Nested Loop (TONL) Strategy Space-Time Partitioning (STP) Strategy

Time

Spa

ce

= volume of ST neighborhood

ST join by plane sweep

A.1

A.2

B.1

C.2

C.3A.3

A.4

C.1

C.4

B.2

A.5

# Edges = 13

24

Page 23: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Multi resolution ST Filter:

CSTP Miner Algorithm: Filtering Strategies

Summarizing on a coarser neighborhood yields compression in most cases.

27

BA BC AC CA

B.1 A.1 B.1 C.2 A.1 C.2 C.1 A.5

B.1 A.3 B.1 C.3 A.3 C.3

B.2 A.2 B.2 C.1 A.1 C.3

B.2 A.4 A.3 C.4

0.8 0.75 0.4 0.2

CPI Threshold = 0.33

Time

Space

Actual Relation

Coarse Relation

BA BC AC CA

(0,0) (1,0) (0,2) (1,2) (1,2)(1,2) (1,1)(2,0)

(0,2) (1,2) (0,0)(1,1) (1,0)(1,1) (2,1)(2,0)

(1,2)(2,1)

(1,0)(2,1)

0.8 0.75 0.8 0.2

Page 24: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Experimental Evaluation :Experiment Setup

Goals

1. Compare different design decisions of the CSTPM Algorithm - Performance: Run-time

2. Test effect of parameters on performance: - Number of event types, Dataset Size, Clumpiness Degree

Experiment Platform: CPU: 3.2GHz, RAM: 32GB, OS: Linux, Matlab 7.9

28

Page 25: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Experimental Evaluation :DatasetsLincoln, NE Dataset

Data size: 5 datasets Drawn by increments of 2 months5000- 33000 instances

Event types: Drawn by increments of 5 event types 5 – 25 event types.

Real Data

Synthetic Data

Data size: 5 datasets5000- 26000 instances

Event types: 5 – 25 event types.

Clumpiness Degree: 5- 25 instances per event type per cell.

29

Page 26: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Experimental Evaluation: Join strategy performanceQuestion: What is the effect of dataset size on performance of join strategies?

Trends: ST Partitioning improves performance by a factor of 5-10 on synthetic data and by a factor of 3 on real data.

Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25)

30

Page 27: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Lincoln, NE crime dataset: Case study Is bar closing a generator for crime related CSTP ?

Observation: Crime peaks around bar-closing!

Bar locations in Lincoln, NE

Is bar closing a crime generator ?

Are there other generators (e.g. Saturday Nights )?

Questions

Bar closing Increase(Larceny,vandalism, assaults)

Saturday Night Increase(Larceny,vandalism, assaults)

K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10-7 , K =0.41)

35

Page 28: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Lincoln, NE crime dataset: Case study

36

Page 29: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Outline

Introduction

Future Work

Problem Statement

Our Approach Big Picture Cascading Spatio-temporal pattern discovery Other Frequent Pattern Families

38

Page 30: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Regional co-location patterns (RCP)

Input: Spatial Features, Crime Reports. Output: RCP (e.g. < (Bar, Assaults), Downtown >)

Subsets of spatial features. Frequently located in certain regions of a study area.

39

Page 31: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Statistical Foundation: Accounting for Heterogenity

Regional Participation Ratio

Regional Participation index

Example

;RPR(< {ABC},PL2 >,B) =2

6

RPR(< {ABC},PL2 >,C) =1

4

RPI(< {ABC},PL2 >) = min2

4,2

6,1

4

⎧ ⎨ ⎩

⎫ ⎬ ⎭=

1

4

Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality.

Example

Quantifies the local fraction participating in a relationship.

40

Page 32: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Conclusions

Proposed SFPM techniques (e.g., Cascading ST Patterns and Regional Co-location patterns) honor ST Semantics (e.g., Partial order, Continuity).

Interest measures achieve a balance between statistical interpretation and computational scalability.

Algorithmic strategies exploiting properties of ST data (e.g., multiresolution filter) and properties of interest measures enhance computational savings.

42

Page 33: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Future Work – Short and Medium Term

Input Data

Spatial Spatio-temporal (ST)

Pattern Semantics Unordered ✔ ✔

Totally Ordered X ✔

Partially Ordered X CSTP discovery

Statistical Foundation

Autocorrelation ✔ CSTP discovery

Heterogeneity RCP Discovery X

Underlying Framework

Euclidean RCP Discovery CSTP discovery

Non-Euclidean (Networks) X X

Neighbor Relation User specified RCP Discovery CSTP discovery

Algorithm Determined X X

Interestingness Criterion

Interest measure threshold RCP Discovery CSTP discovery

Threshold free X X

Type of data Boolean / Categorical RCP Discovery CSTP discovery

Quantitative data (e.g., Climate) X X

X: Unexplored

43

Page 34: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Future Work – Long Term

43

Exploring interpretation of discovered patterns by law enforcement.

ST Predictive analytics, Predictive models based on SFPM and Predictive policing.

Towards Geo-social analytics for policing (e.g. Criminal Flash mobs, gangs, groups of offenders committing crimes)

New ST frequent pattern mining algorithms based on depth first graph enumeration.

ST frequent pattern mining techniques that account for patron demographic levels.

Explore evaluation of choloropeth maps via ST frequent pattern mining.

Page 35: Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Acknowledgment

Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities.

This Work was supported by Grants from U.S.ARMY, NGA and U.S. DOJ.

Advisor: Prof. Shashi Shekhar, Computer Science, University of Minnesota.

Thesis committee.

U.S. DOJ – National Institute of Justice: Mr. Ronald E. Wilson (Program Manager, Mapping and Analysis for Public Safety) , Dr. Ned Levine (Ned Levine and Associates, CrimeStat Program)

U.S. Army – Topographic Engineering Center: Dr. J.A.Shine (Mathematician and Statistician, Geospatial Research and Engineering Division ) and Dr. J.P. Rogers (Additional Director, Topographic Engineering Center)

Mr. Tom Casady, Public Safety Director (Formerly Lincoln Police Chief), Lincoln, NE, USA

Thank You for your Questions, Comments and Attention!

44