spatio-temporal frequent pattern mining for public safety: concepts and techniques
DESCRIPTION
Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques. Pradeep Mohan * Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava. - PowerPoint PPT PresentationTRANSCRIPT
Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques
Pradeep Mohan*
Department of Computer ScienceUniversity of Minnesota, Twin-Cities
Advisor: Prof. Shashi ShekharThesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava
*Contact: [email protected]
Biography Education
Ph.D., Student, Department. of Computer Science and Engineering., University of Minnesota, MN, 2007 – Present.
B. E., Department. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. 2003-2007
Major Projects during PhDUS DoJ/NIJ- Mapping and analysis for Public Safety
CrimeStat .NET Libaries 1.0 : Modularization of CrimeStat, a tool for the analysis of crime incidents.
Performance tuning of Spatial analysis routines in CrimeStat CrimeStat 3.2a - 3.3: Addition of new modules for spatial analysis.
US DOD/ ERDC/ TEC – Cascade models for multi scale pattern discoveryDesigned new interest measures and formulated pattern
mining algorithms for identifying patterns from large crime report datasets.
1
Thesis Related PublicationsCascading spatio-temporal pattern discovery (Chapter 2) P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern
discovery: A summary of results. In Proc. Of 10th SIAM International Conference on Data Mining 2010 (SDM 2010, Full paper acceptance rate 20%)
P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering (TKDE). (Accepted Regular Paper, In Press ~20% Acceptance Rate)
Regional co-location pattern discovery (Chapter 3)
P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers, Z.Jiang, N.Wayant. A spatial neighborhood graph based approach to Regional co-location pattern discovery: summary of results. In Proc. Of 19th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%)
Crime Pattern Analysis Application (Chapter 4)
S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou. Crime pattern analysis: A spatial frequent pattern mining approach. M. Leitner (Ed.), Crime modeling and mapping using Geospatial Technologies, Springer (Accepted with Revisions).
2
Outline
IntroductionMotivation
Problem Statement
Future Work
Our Approach
4
Motivation: Public Safety
Identifying events (e.g. Bar closing, football games) that lead to increased crime.
Crime generators and attractors
Identifying frequent crime hotspots
Law enforcement planning
Predicting crime events
Predictive policing (e.g. Predict next location of offense, forecast crime levels around conventions etc.)
Predicting the next location of burglary.Courtsey: www.startribune.com
Question: What / Where are the frequent crime generators ?
Question: Where are the crime hotspots ?
Question: What are the crime levels 1 hour after a football game within a radius of 1 mile ?
5
Other Applications: Epidemiology
Courtsey: https://www.llnl.gov/str/September02/Hall.html
Scientific Domain: Environmental Criminology
Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16
Crime pattern theory Routine activity theory and Crime Triangle
Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepnum=8
Crime Event: Motivated offender, vulnerable victim (available at an appropriate location and time), absence of a capable guardian.
Crime Generators : offenders and targets come together in time place, large gatherings (e.g. Bars, Football games) Crime Attractors : places offering many criminal opportunities and offenders may relocate to these areas (e.g. drug areas)
6
Outline
Introduction
Future Work
Our Approach
Problem Statement Spatio-temporal frequent pattern mining problem Challenges
7
Spatio-temporal frequent pattern mining problem
Given : Spatial / Spatio-temporal framework. Crime Reports with type, location and / or time. Spatial Features of interest (e.g. Bars). Interest measure threshold (Pθ) Spatial / Spatio-temporal neighbor relation.
Find: Frequent patterns with interestingness >= Pθ
Objective : Minimize computation costs.
Constraints : Correctness and Completeness. Statistical Interpretation (i.e. account for autocorrelation or
heterogeneity)
8
Illustration: Output
Cascading ST Patterns (Inputs: Spatial, Temporal Neighborhood - 0.5 miles, 20 mins, Threshold - 0.5)
Regional Co-location patterns (Inputs: Spatial Neighborhood – 1 mile, Threshold- 0.25)
Aggregate(T1,T2,T3)
Time T1
Assault(A)
Drunk Driving (C)
Bar Closing(B)
Time T3>T2Time T2 > T1
a
B A
C
CSTP: P1
9
Challenges
Spatio-temporal Semantics
Continuity of space / time Partial order
Conflicting Requirements Statistical Interpretation Computational Scalability
Computational Cost Exponential set of Candidate patterns
Time T1 Time T3>T2Time T2 > T1
B.2
B.1 C.2C.3C.1
C.4
A.1
A.3
A.2A.4
A.5
a
Aggregate(T1,T2,T3)
B.1
B.2
A.2A.4
C.2
C.3
C.4
A.5
C.1
A.1
A.3
Time partitioning misses relationships
Space partitioning misses relationships
{Null}
A B A C B A B C C A C B
C
B A
B
C A
C
B A
A
B C
C
A B……….……….
C
A B
B
A C
A
B C
C
B A
B
C A
A
C B
# Patterns = Exponential (# event types)
10
Our Contributions
11
New Spatio-temporal frequent pattern families. Ex: Cascading ST Patterns and Regional Co-location patterns.
Novel interest measures guarantee statistical interpretation and computable in polynomial time.
Scalable algorithms based on properties of spatio-temporal data and interest measures.
Experimental evaluation using synthetic and real crime datasets.
Outline
Introduction
Future Work
Problem Statement
Our Approach Big Picture Cascading Spatio-temporal pattern discovery Other Frequent Pattern Families
12
Cascading ST pattern (CSTP)
Output: CSTP
Partially ordered subsets of ST event types.
Located together in space.
Occur in stages over time.
B A
C
CSTP: P1
Aggregate(T1,T2,T3)
Time T1
Assault(A) Drunk Driving (C)
Bar Closing(B)
Time T3>T2Time T2 > T1
a
Input: Crime reports with location and time.
14
16
Related Pattern Semantics: ST Data mining
Spatio-temporal frequent patterns
Partially OrderedOthers
Unordered(ST Co-occurrence)
Totally Ordered(ST Sequences)
Our Work(Cascading ST patterns )
ST Co-occurrence [Celik et al. 2008, Cao et al. 2006] Designed for moving object datasets by treating trajectories as location time series Performs partitioning over space and time.
ST Sequence [Huang et al. 2008, Cao et al. 2005 ]Totally ordered patterns modeled as a chain. Does not account for multiply connected patterns(e.g. nonlinear) Misses non-linear semantics. No ST statistical interpretation.
15
Interpretation Model: Directed Neighbor Graph (DNG)
Nodes: Individual Events
Directed Edge (N1 N2) iff Neighbor( N1, N2) and After(N2, N1)
B.2
B.1C.2
C.3C.1 C.4A.1
A.3
A.2A.4
TimeT1 TimeT3TimeT2
Assault(A) Drunk Driving (C)
Bar Closing(B)
A.5
A.1
A.2
B.1
C.2
C.3A.3
A.4
C.1
C.4
B.2A.5
B A
C
CSTP: P1
17
Statistical Foundation: Interest Measures Instances of CSTP P1 : (BA, BC, AC) are
(B1A1, B1C1, A1C1) (B1A3, B1C2, A3C2) ? ?(B1A1; A1 C2; B1 C2)
Cascade Participation Ratio : CPR (CSTP, M) : Conditional Probability of an instance of CSTP in neighborhood, given an instance of event-type M
Examples
Cascade Participation Index: CPI(CSTP) Min ( CPR(CSTP, M) ) over all M in CSTP Example:
5.04
2),( CCSTPCPR
B A
C
CSTP: P1
A.1
A.2
B.1
C.2
C.3A.3
A.4
C.1
C.4
B.2A.5
18
Analytical Evaluation: Statistical Interpretation
Cascade Participation Index (CPI) is an upper bound to the ST K-Function per unit volume.
Example:
ST -K (B A) 2/6 = 0.33 3/6 = 0.5 6/6 = 1
CPI (B A) 2/3 = 0.66 1 1
A.1
A.3
B.1
A.2B.2
A.1
A.3
B.1
A.2B.2
A.1
A.3
B.1
A.2B.2
Spatial Statistics: ST K-Function (Diggle et al. 1995)
€
K^
AB (h, t) = 1(S.T )⋅
1λ A ⋅λ B
⋅ Iht (d(Ai,B j ),td (Ai,B j ))j∑
i∑
20
Comparison with Related Interest MeasuresMeasure Key Property
Frequency Double counting of pattern instances
Maximum Independent Set (MIS) Size [Kuramochi and Karypis, 2004]
NP Complete
Scoring Criterion for Bayesian Networks [Neopolitan, 2003; Chickering, 1996]
NP Complete Learning requires Prior specification
Lower bound on vertex label frequency Frequency based interpretation.
B A
C
CSTP: P1Measure Value
Frequency 3 / (What is the # of transactions ?)
MIS 2
Lower Bound on Frequency
min{1,2,2} = 1
A.1
A.2
B.1
C.2
C.3A.3
A.4
C.1
C.4
B.2A.5
19
Computational Structure: CSTP Miner Algorithm
Basic Idea Initialization
for k in (1,2…3..K-1) and prevalent CSTP found do
Generate size k candidates.
Compute CSTP instances / Materialize part of DNG
Calculate interest measure and select prevalent CSTPs.
end
Not part of a conventional apriori setting
Item sets in Association rule mining Chemical compounds/sub graphs in graph mining. Directed acyclic graph in CSTP mining
21
CSTP Miner Algorithm: Illustration{Null}
A B A C B A B C C A C B
C
B A
C
B A
A.1
A.2
B.1
C.2
C.3A.3
A.4
C.1
C.4
B.2A.5
CPI Threshold = 0.33
0 0.4 0.8 0.75 0.2 0
C
B A
C
B A
0.4 0.4 0.8
0.4
Spatio-temporal join
22
Key Bottlenecks
Computational Structure: CSTP Miner Algorithm
Interest measure evaluation
Exponential pattern space
Space-Time Partition Join Strategy
Time Ordered Nested Loop Strategy
Filtering strategies
Fixed Parameters: Spatial neighborhood = 0.62 miles and temporal neighborhood = 1hr, CPI threshold = 0.0055
Computational Strategies
Reduce irrelevant interest measure evaluation
Compute interest measure efficiently
23
CSTP Miner Algorithm: Interest Measure Evaluation ST Join Strategies: Perform each interest measure computation efficiently
Time Ordered Nested Loop (TONL) Strategy Space-Time Partitioning (STP) Strategy
Time
Spa
ce
= volume of ST neighborhood
ST join by plane sweep
A.1
A.2
B.1
C.2
C.3A.3
A.4
C.1
C.4
B.2
A.5
# Edges = 13
24
Multi resolution ST Filter:
CSTP Miner Algorithm: Filtering Strategies
Summarizing on a coarser neighborhood yields compression in most cases.
27
BA BC AC CA
B.1 A.1 B.1 C.2 A.1 C.2 C.1 A.5
B.1 A.3 B.1 C.3 A.3 C.3
B.2 A.2 B.2 C.1 A.1 C.3
B.2 A.4 A.3 C.4
0.8 0.75 0.4 0.2
CPI Threshold = 0.33
Time
Space
Actual Relation
Coarse Relation
BA BC AC CA
(0,0) (1,0) (0,2) (1,2) (1,2)(1,2) (1,1)(2,0)
(0,2) (1,2) (0,0)(1,1) (1,0)(1,1) (2,1)(2,0)
(1,2)(2,1)
(1,0)(2,1)
0.8 0.75 0.8 0.2
Experimental Evaluation :Experiment Setup
Goals
1. Compare different design decisions of the CSTPM Algorithm - Performance: Run-time
2. Test effect of parameters on performance: - Number of event types, Dataset Size, Clumpiness Degree
Experiment Platform: CPU: 3.2GHz, RAM: 32GB, OS: Linux, Matlab 7.9
28
Experimental Evaluation :DatasetsLincoln, NE Dataset
Data size: 5 datasets Drawn by increments of 2 months5000- 33000 instances
Event types: Drawn by increments of 5 event types 5 – 25 event types.
Real Data
Synthetic Data
Data size: 5 datasets5000- 26000 instances
Event types: 5 – 25 event types.
Clumpiness Degree: 5- 25 instances per event type per cell.
29
Experimental Evaluation: Join strategy performanceQuestion: What is the effect of dataset size on performance of join strategies?
Trends: ST Partitioning improves performance by a factor of 5-10 on synthetic data and by a factor of 3 on real data.
Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25)
30
Lincoln, NE crime dataset: Case study Is bar closing a generator for crime related CSTP ?
Observation: Crime peaks around bar-closing!
Bar locations in Lincoln, NE
Is bar closing a crime generator ?
Are there other generators (e.g. Saturday Nights )?
Questions
Bar closing Increase(Larceny,vandalism, assaults)
Saturday Night Increase(Larceny,vandalism, assaults)
K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10-7 , K =0.41)
35
Lincoln, NE crime dataset: Case study
36
Outline
Introduction
Future Work
Problem Statement
Our Approach Big Picture Cascading Spatio-temporal pattern discovery Other Frequent Pattern Families
38
Regional co-location patterns (RCP)
Input: Spatial Features, Crime Reports. Output: RCP (e.g. < (Bar, Assaults), Downtown >)
Subsets of spatial features. Frequently located in certain regions of a study area.
39
Statistical Foundation: Accounting for Heterogenity
Regional Participation Ratio
Regional Participation index
Example
€
;RPR(< {ABC},PL2 >,B) =2
6
€
RPR(< {ABC},PL2 >,C) =1
4
€
RPI(< {ABC},PL2 >) = min2
4,2
6,1
4
⎧ ⎨ ⎩
⎫ ⎬ ⎭=
1
4
Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality.
Example
Quantifies the local fraction participating in a relationship.
40
Conclusions
Proposed SFPM techniques (e.g., Cascading ST Patterns and Regional Co-location patterns) honor ST Semantics (e.g., Partial order, Continuity).
Interest measures achieve a balance between statistical interpretation and computational scalability.
Algorithmic strategies exploiting properties of ST data (e.g., multiresolution filter) and properties of interest measures enhance computational savings.
42
Future Work – Short and Medium Term
Input Data
Spatial Spatio-temporal (ST)
Pattern Semantics Unordered ✔ ✔
Totally Ordered X ✔
Partially Ordered X CSTP discovery
Statistical Foundation
Autocorrelation ✔ CSTP discovery
Heterogeneity RCP Discovery X
Underlying Framework
Euclidean RCP Discovery CSTP discovery
Non-Euclidean (Networks) X X
Neighbor Relation User specified RCP Discovery CSTP discovery
Algorithm Determined X X
Interestingness Criterion
Interest measure threshold RCP Discovery CSTP discovery
Threshold free X X
Type of data Boolean / Categorical RCP Discovery CSTP discovery
Quantitative data (e.g., Climate) X X
X: Unexplored
43
Future Work – Long Term
43
Exploring interpretation of discovered patterns by law enforcement.
ST Predictive analytics, Predictive models based on SFPM and Predictive policing.
Towards Geo-social analytics for policing (e.g. Criminal Flash mobs, gangs, groups of offenders committing crimes)
New ST frequent pattern mining algorithms based on depth first graph enumeration.
ST frequent pattern mining techniques that account for patron demographic levels.
Explore evaluation of choloropeth maps via ST frequent pattern mining.
Acknowledgment
Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities.
This Work was supported by Grants from U.S.ARMY, NGA and U.S. DOJ.
Advisor: Prof. Shashi Shekhar, Computer Science, University of Minnesota.
Thesis committee.
U.S. DOJ – National Institute of Justice: Mr. Ronald E. Wilson (Program Manager, Mapping and Analysis for Public Safety) , Dr. Ned Levine (Ned Levine and Associates, CrimeStat Program)
U.S. Army – Topographic Engineering Center: Dr. J.A.Shine (Mathematician and Statistician, Geospatial Research and Engineering Division ) and Dr. J.P. Rogers (Additional Director, Topographic Engineering Center)
Mr. Tom Casady, Public Safety Director (Formerly Lincoln Police Chief), Lincoln, NE, USA
Thank You for your Questions, Comments and Attention!
44