![Page 1: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/1.jpg)
Spatial Data Mining: Spatial Data Mining: Accomplishments and Research NeedsAccomplishments and Research Needs
Shashi Shekhar
Department of Computer Science and Engineering
University of Minnesota
![Page 2: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/2.jpg)
Why Data Mining?Why Data Mining? Holy Grail - Informed Decision Making
Lots of Data are Being CollectedBusiness - Transactions, Web logs, GPS-track, …
Science - Remote sensing, Micro-array gene expression data, …
Challenges:Volume (data) >> number of human analysts
Some automation needed
Data Mining may help!Provide better and customized insights for business
Help scientists for hypothesis generation
![Page 3: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/3.jpg)
Spatial DataSpatial Data Location-based Services
E.g.: MapPoint, MapQuest, Yahoo/Google Maps, …
Courtesy: Microsoft Live Search (http://maps.live.com)
![Page 4: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/4.jpg)
Spatial DataSpatial Data In-car Navigation Device
Emerson In-Car Navigation System (Courtesy: Amazon.com)
![Page 5: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/5.jpg)
Spatial Data Mining (SDM)Spatial Data Mining (SDM) The process of discovering
interesting, useful, non-trivial patterns patterns: non-specialist exception to patterns: specialist
from large spatial datasets
Spatial pattern families
Spatial outlier, discontinuities
Location prediction models
Spatial clusters
Co-location patterns
…
![Page 6: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/6.jpg)
Spatial Data Mining and ScienceSpatial Data Mining and Science Understanding of a physical phenomenon
Though, final model may not involve location Cause-effect e.g. Cholera caused by germs
Discovery of model may be aided by spatial patterns Many phenomenon are embedded in space and time Ex. 1854 London – Cholera deaths clustered around a water pump Spatio-temporal process of disease spread => narrow down potential causes Ex. Recent analysis of SARS
Location helps bring rich contextsPhysical: e.g., rainfall, temperature, and wind
Demographical: e.g., age group, gender, and income type
Problem-specific, e.g. distance to highway or water
![Page 7: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/7.jpg)
Example Pattern: Spatial ClusterExample Pattern: Spatial Cluster The 1854 Asiatic Cholera in London
![Page 8: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/8.jpg)
Example Pattern: Spatial OutliersExample Pattern: Spatial Outliers Spatial Outliers
Traffic Data in Twin Cities
Abnormal Sensor Detections
Spatial and Temporal Outliers
![Page 9: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/9.jpg)
Example Pattern: Predictive ModelsExample Pattern: Predictive Models Location Prediction:
Predict Bird Habitat Prediction
Using environmental variables
Nest Locations
![Page 10: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/10.jpg)
Example Patterns: Co-locations Example Patterns: Co-locations Given: A collection of
different types of spatial events
Find: Co-located subsets of event types
![Page 11: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/11.jpg)
What’s NOT Spatial Data MiningWhat’s NOT Spatial Data Mining Simple Querying of Spatial Data
Find neighbors of Canada given names and boundaries of all countriesFind shortest path from Boston to Houston in a freeway mapSearch space is not large (not exponential)
Testing a hypothesis via a primary data analysisEx. Female chimpanzee territories are smaller than male territoriesSearch space is not large!SDM: secondary data analysis to generate multiple plausible hypotheses
Uninteresting or obvious patterns in spatial dataHeavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul, Given that the two cities are 10 miles apart.Common knowledge: Nearby places have similar rainfall
Mining of non-spatial dataDiaper sales and beer sales are correlated in evening
![Page 12: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/12.jpg)
Application DomainsApplication Domains Spatial data mining is used in
NASA Earth Observing System (EOS): Earth science dataNational Inst. of Justice: crime mappingCensus Bureau, Dept. of Commerce: census dataDept. of Transportation (DOT): traffic dataNational Inst. of Health (NIH): cancer clustersCommerce, e.g. Retail Analysis
Sample Global Questions from Earth Science How is the global Earth system changingWhat are the primary forcing of the Earth systemHow does the Earth system respond to natural and human included changesWhat are the consequences of changes in the Earth system for human civilizationHow well can we predict future changes in the Earth system
![Page 13: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/13.jpg)
Example of Application DomainsExample of Application Domains Sample Local Questions from Epidemiology [TerraSeer]
What’s overall pattern of colorectal cancer
Is there clustering of high colorectal cancer incidence anywhere in the study area
Where is colorectal cancer risk significantly elevated
Where are zones of rapid change in colorectal cancer incidence
Geographic distribution of male colorectal cancer in Long Island, New York (Courtesy: TerraSeer)
![Page 14: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/14.jpg)
Business ApplicationsBusiness Applications Sample Questions:
What happens if a new store is addedHow much business a new store will divert from existing storesOther “what if” questions: changes in population, ethic-mix, and transportation network changes in retail space of a store changes in choices and communication with customers
Retail analysis: Huff model [Huff, 1963]A spatial interaction model Given a person p and a set S of choices
Connection to SDM Parameter estimation, e.g., via regression
For example: Predicting consumer spatial behaviors Delineating trade areas Locating retail and service facilities Analyzing market performance
),(utilityperceived_]choiceselectspersonPr[ pSccp
)parameters),,(distance
),(footagesquare()person,store(utilityperceived_
pc
cfpc
![Page 15: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/15.jpg)
Map ConstructionMap Construction Sample Questions
Which features are anomalous?
Which layers are related?
How can the gaps be filled? Korea Data
Latitude 37deg15min to 37deg30min
Longitude 128deg23min51sec to 128deg23min52sec Layers
Obstacles (Cut, embankment, depression)
Surface drainage (Canal, river/stream, island, common open water, ford, dam)
Slope
Soils (Poorly graded gravel, clayey sand, organic silt, disturbed soil)
Vegetation (Land subject to inundation, cropland, rice field, evergreen trees, mixed trees)
Transport (Roads, cart tracks, railways)
![Page 16: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/16.jpg)
Colocation in Example DataColocation in Example Data Road: river/stream Crop land/rice fields: ends of roads/cart roads Obstacles, dams and islands: river/streams Embankment obstacles and river/stream: clayey soils Rice, cropland, evergreen trees and deciduous trees:
river/stream Rice: clayey soil, wet soil and terraced fields Crooked roads: steep slope
![Page 17: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/17.jpg)
Colocation ExampleColocation Example Interestingness
Patterns to Non-Specialist vs. Exceptions to Specialist
Road-River/Stream Colocation
Road-River Colocation Example (Korea database, Courtesy: Architecture Technology Corporation)
![Page 18: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/18.jpg)
SQL Example for Colocation QuerySQL Example for Colocation Query SQL3/OGC (Postgres/Postgis) Detecting Road River Colocation Pattern:
Spatial Query Fragment
CREATE TABLE Road-River-Colocation ASSELECT DISTINCT R.*FROM River-Area-Table T, Road-Line-Table RWHERE distance ( T.geom, R.geom ) < 0.001;
CREATE TABLE Road-Stream-Colocation ASSELECT DISTINCT R.*FROM Stream-Line-Table T, Road-Line-Table RWHERE distance ( T.geom, R.geom ) < 0.001;
CREATE TABLE Cartroad-River-Colocation ASSELECT DISTINCT R.*FROM River-Area-Table T, Cartroad-Line-Table RWHERE distance ( T.geom, R.geom ) < 0.001;
CREATE TABLE Cartroad-Stream-Colocation ASSELECT DISTINCT R.*FROM Stream-Line-Table T, Cartroad-Line-Table RWHERE distance ( T.geom, R.geom ) < 0.001;
![Page 19: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/19.jpg)
Colocation: Road-RiverColocation: Road-River 375 road features Center-line to center-line distance threshold = 0.001 units
(about 100 meters) 77 % of all roads colocated with river
Colocation
Pattern
Number of Colocated Features
Interest Measure (%)
(Colocated roads/Total roads) *100
Road with stream 153 to 239 64%
Road with river 96 of 239 40%
Road with stream or river 176 of 239 74%
Cartroad with stream 97 of 136 71%
Cartroad with river 44 of 136 32%
Cartroad with stream or river 111 of 136 82%
All roads with river or stream 287 of 375 77%
Road-River Colocation Example (Korea dataset)
![Page 20: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/20.jpg)
A Complex Colocation ExampleA Complex Colocation Example Cropland colocated with river, stream or road
Complex Colocation Example (Korea dataset, Courtesy: Architecture Technology Corporation)
![Page 21: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/21.jpg)
Outliers in Example DataOutliers in Example Data Outlier detection
Extra/erroneous features
Positional accuracy of features
Predict mislabeled/misclassified features
Overlapping road and river Road crossing river and disconnected road Stream mislabeled
as river Cropland close to river and road Cropland outliers on edges
![Page 22: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/22.jpg)
Outliers in ExampleOutliers in Example Map production
Identifying errors E.g., expected colocation: (bridge, ∩(road, river)) Violations illustrated below:
Finding errors in maps having road, river and bridges (Korea dataset)
![Page 23: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/23.jpg)
OverviewOverview Spatial Data Mining
Find interesting, potentially useful, non-trivial patterns from spatial data
Components of Data MiningInput: table with many columns, domain (column)
Statistical Foundation
Output: patterns and interest measures e.g., predictive models, clusters, outliers, associations
Computational process: algorithms
![Page 24: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/24.jpg)
OverviewOverview Input Statistical Foundation Output Computational Process Trends
![Page 25: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/25.jpg)
Overview of InputOverview of Input Data
Table with many columns (attributes)
e.g., tid: tuple id; fi: attributes
Spatial attribute: geographically referenced
Non-spatial attribute: traditional
Relationships among DataNon-spatial
Spatial
tid f1 f2 … fn
0001 3.5 120 … Yes
0002 4.0 121 … No
Example of Input Data
![Page 26: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/26.jpg)
Data in Spatial Data MiningData in Spatial Data Mining Non-spatial Information
Same as data in traditional data mining
Numerical, categorical, ordinal, boolean, etc
e.g., city name, city population Spatial Information
Spatial attribute: geographically referenced Neighborhood and extent Location, e.g., longitude, latitude,
elevation
Spatial data representations Raster: gridded space Vector: point, line, polygon Graph: node, edge, path
Raster Data for UMN CampusCourtesy: UMN
Vector Data for UMN CampusCourtesy: MapQuest
![Page 27: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/27.jpg)
Relationships on Data in Spatial Data MiningRelationships on Data in Spatial Data Mining Relationships on non-spatial data
ExplicitArithmetic, ranking (ordering), etc.Object is instance of a class, class is a subclass of another class, object is part of another object, object is a membership of a set
Relationships on Spatial DataMany are implicitRelationship Categories Set-oriented: union, intersection, and membership, etc Topological: meet, within, overlap, etc Directional: North, NE, left, above, behind, etc Metric: e.g., Euclidean: distance, area, perimeter Dynamic: update, create, destroy, etc Shape-based and visibility
Granularity
Granularity Elevation Example Road Example
Local Elevation On_road?
Focal Slope Adjacent_to_road?
Zonal Highest elevation in a zone Distance to nearest road
![Page 28: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/28.jpg)
OGC ModelOGC Model Open GIS Consortium Model
Support spatial data types: e.g. point, line, polygons
Support spatial operations as follows:
Operator Type Operator Name
Basic Function SpatialReference, Envelope, Boundary, Export,
IsEmpty, IsSimple
Topological/Set Operations Equal, Disjoint, Intersect, Touch,
Cross, Within, Contains, Overlap
Spatial Analysis Distance, Buffer, ConvexHull,
Intersection, Union, Difference, SymmDiff
Examples of Operations in OGC Model
![Page 29: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/29.jpg)
OGISOGIS Topology
9-intersection model
Relation
disjoint meet overlap equal
9-intersection
model
111
100
100
111
110
100
111
111
111
100
010
001
)()()(
)()()(
)()()(
BABABA
BABABA
BABABA
o
o
oooo
![Page 30: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/30.jpg)
Mining Implicit Spatial RelationshipsMining Implicit Spatial Relationships Choices
Materialize spatial info + classical data miningCustomized spatial data mining techniques
What spatial info is to be materializedDistance measure: Point: Euclidean Extended objects: buffer-based Graph: shortest pathTransactions: i.e., space partitions Circles centered at reference features Gridded cells Min-cut partitions Voronoi diagram
Relationships Materialization Customized SDM Tech.
Topological Neighbor, Inside, Outside Classical Data Mining can be
used
NEM, co-location
Euclidean Distance, density K-means
Directional North, Left, Above DBSCAN
Others Shape, Visibility Clustering on sphere
Mining Implicit Spatial Relationships
![Page 31: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/31.jpg)
Research Needs for DataResearch Needs for Data Limitations of OGC Model
Aggregate functions - e.g. Mapcube
Direction predicates - e.g. absolute, ego-centric
3D and visibility
Network analysis
Raster operations
Needs for New ResearchModeling semantically rich spatial properties
Moving objects
Spatial time series data
![Page 32: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/32.jpg)
OverviewOverview Input Statistical Foundation Output Computational Process Trends
![Page 33: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/33.jpg)
Statistics in Spatial Data MiningStatistics in Spatial Data Mining Classical Data Mining
Learning samples are independently distributed
Cross-correlation measures, e.g., Chi-square, Pearson
Spatial Data MiningLearning sample are not independent
Spatial Autocorrelation Measures:
distance-based (e.g., K-function) neighbor-based (e.g., Moran’s I)
Spatial Cross-CorrelationMeasures: distance-based, e.g., cross K-function
Spatial Heterogeneity
![Page 34: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/34.jpg)
Overview of Statistical FoundationOverview of Statistical Foundation Spatial Statistics [Cressie, 1991][Hanning, 2003]
Geostatistics Continuous Variogram: measure how similarity decreases with distance Spatial interpolationLattice-based statistics Discrete location, neighbor relationship graph Spatial Gaussian models
Conditionally specified, Simultaneously specified spatial Gaussian model Markov Random Fields, Spatial Autoregressive ModelPoint process Discrete Complete spatial randomness (CSR): Poisson process in space K-function: test of CSR
Point Process Lattice Geostatistics
Raster √ √
Vector Point √ √ √
Line √
Polygon √ √
Graph
![Page 35: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/35.jpg)
Spatial Autocorrelation (SA)Spatial Autocorrelation (SA) First Law of Geography
“All things are related, but nearby things are more related than distant things. [Tobler, 1970]”
Spatial autocorrelationNearby things are more similar than distant thingsTraditional i.i.d. assumption is not validMeasures: K-function, Moran’s I, Variogram, …
Pixel property with independent identical distribution
Vegetation Durability with SA
![Page 36: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/36.jpg)
Spatial Autocorrelation: Distance-based measureSpatial Autocorrelation: Distance-based measure K-function Definition
Test against randomness for point pattern λ is intensity of eventModel departure from randomness in a wide range of scales
InferenceFor Poisson complete spatial randomness (CSR): K(h) = πh2
Plot Khat(h) against h, compare to Poisson CSR >: cluster <: decluster/regularity
EhK 1)( [number of events within distance h of an arbitrary event]
K-Function based Spatial Autocorrelation
![Page 37: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/37.jpg)
Spatial Autocorrelation: Topological Spatial Autocorrelation: Topological MeasureMeasure
Moran’s I Measure Definition
W: the contiguity matrix Ranges between -1 and +1
higher positive value => high SA, Cluster, Attractlower negative value => interspersed, de-clustered, repele.g., spatial randomness => MI = 0e.g., distribution of vegetation durability => MI = 0.7e.g., checker board => MI = -1
t
t
zz
zWzMI
},...,{ 1 xxxxz n
n
x
xi : data values
: mean of x: number of data
![Page 38: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/38.jpg)
Cross-CorrelationCross-Correlation Cross K-Function Definition
Cross K-function of some pair of spatial feature types
Example Which pairs are frequently co-located Statistical significance
EhK jji1)( [number of type j event within distance h of a randomly chosen
type i event]
![Page 39: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/39.jpg)
Cross-CorrelationCross-CorrelationFind Patterns in the following data:
Answers: and
![Page 40: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/40.jpg)
Illustration of Cross-CorrelationIllustration of Cross-Correlation Illustration of Cross K-function for Example Data
Cross-K Function for Example Data
![Page 41: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/41.jpg)
Spatial SlicingSpatial Slicing Spatial heterogeneity
“Second law of geography” [M. Goodchild, UCGIS 2003]Global model might be inconsistent with regional models spatial Simpson’s Paradox
Global Model Regional Models Spatial Slicing
Slicing inputs can improve the effectiveness of SDMSlicing output can illustrate support regions of a pattern e.g., association rule with support map
![Page 42: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/42.jpg)
Edge EffectEdge Effect Cropland on edges may not be classified as outliers No concept of spatial edges in classical data mining
Korea Dataset, Courtesy: Architecture Technology Corporation
![Page 43: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/43.jpg)
Research Challenges of Spatial StatisticsResearch Challenges of Spatial Statistics State-of-the-art of Spatial Statistics
Research Needs
Correlating extended features: e.g. road, river (line strings) e.g. cropland (polygon), road, river
Edge effect Relationship to classical statistics
Ex. SVM with spatial basis function vs. SAR
Point Process
Lattice Geostatistics
raster √ √
Vector Point √ √ √
Line √
Polygon √ √
graph
Data Types and Statistical Models
![Page 44: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/44.jpg)
OverviewOverview Input Statistical Foundation Output Computational Process Trends
![Page 45: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/45.jpg)
General Approaches in SDMGeneral Approaches in SDM Materializing spatial features, use classical DM
Ex. Huff's model – distance (customer, store)
Ex. spatial association rule mining [Koperski, Han, 1995]
Ex: wavelet and Fourier transformations
commercial tools: e.g., SAS-ESRI bridge Spatial slicing, use classical DM
Ex. association rule with support map
[P. Tan et al]
commercial tools: e.g., Matlab, SAS, R, Splus Customized spatial techniques
Ex. geographically weighted regression:
parameter = f(loc)
e.g., MRF-based Bayesian Classifier (MRF-BC)
commercial tools e.g., Splus spatial/R spatial/terraseer +
customized codes
Association rule with support map(FPAR-high -> NPP-high)
![Page 46: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/46.jpg)
Overview of Data Mining OutputOverview of Data Mining Output Supervised Learning: Prediction
Classification
Trend Unsupervised Learning:
Clustering
Outlier Detection
Association Input Data Types vs. Output Patterns
Patterns Point Process
Lattice Geostatistics
Prediction √ √
Trend √
Clustering √ √
Outliers √ √ √
Associations √ √
Output Patterns vs. Statistical Models
![Page 47: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/47.jpg)
Location PredictionLocation Prediction
Nest Locations Vegetation
Water Depth Distance to Open Water
![Page 48: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/48.jpg)
Prediction and TrendPrediction and Trend Prediction
Continuous: trend, e.g., regression Location aware: spatial autoregressive model (SAR)
Discrete: classification, e.g., Bayesian classifier Location aware: Markov random fields (MRF)
Classical Spatial
Xy
)Pr(
)Pr()|Pr()|Pr(
X
CCXXC ii
i
XyWy
),Pr(
)|,Pr()Pr(),|Pr(
N
iNiNi CX
cCXCCXc
Prediction Models
![Page 49: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/49.jpg)
Prediction and TrendPrediction and Trend Linear Regression Spatial Regression Spatial model is better
Xy
XWyy
ROC Curve for learning ROC Curve for testing
![Page 50: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/50.jpg)
Spatial Contextual Model: SARSpatial Contextual Model: SAR Spatial Autoregressive Model (SAR)
Assume that dependent values yi are related to each other yi = f(yi) i ≠ jDirectly model spatial autocorrelation using W
Geographically Weighted Regression (GWR)A method of analyzing spatially varying relationships parameter estimates vary locallyModels with Gaussian, logistic or Poisson forms can be fittedExample:where are location dependent
Xyy W
'' Xy'' and
![Page 51: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/51.jpg)
Spatial Contextual Model: MRFSpatial Contextual Model: MRF Markov Random Fields Gaussian Mixture Model (MRF-
GMM)Undirected graph to represent the interdependency relationship of random variables
A variable depends only on neighbors
Independent of all other variables
fC(Si) independent of fC(Si), if W (si, sj) = 0
Predict fC(Si) , given feature value X and neighborhood class label CN
Assume: Pr(ci); Pr(X, CN|ci); and Pr(X, CN) are mixture of Gaussian distributions.
),Pr(
)|,Pr(*)Pr(),|Pr(
N
iNiNi CX
cCXcCXc
![Page 52: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/52.jpg)
Research Needs for Spatial ClassificationResearch Needs for Spatial Classification Open Problems
Estimate W for SAR and MRF-BC
Scaling issue in SAR Scale difference:
Spatial interest measure: e.g., avg, dist(actual, predicted)
Xvs.Wy
Actual Sites Pixels withactual sites
Prediction 1 Prediction 2.Spatially more accurate
than Prediction 1
![Page 53: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/53.jpg)
ClusteringClustering Clustering: Find groups of tuples Statistical Significance
Complete spatial randomness, cluster, and decluster
Inputs: Complete Spatial Random (CSR),Cluster,Decluster
Classical Clustering
Spatial Clustering
![Page 54: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/54.jpg)
ClusteringClustering Similarity Measures
Non-spatial: e.g., soundexClassical clustering: Euclidean, metric, graph-basedTopological: neighborhood EM (NEM) Seeks a partition that is both well clustered in feature space and spatially
regular Implicitly based on locations
Interest measure:spatial continuitycartographic generalizationunusual densitykeep nearest neighbors in common cluster
ChallengesSpatial constraints in algorithmic designEx. Rivers, mountain ranges, etc
![Page 55: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/55.jpg)
Semi-Supervised Bayesian ClassificationSemi-Supervised Bayesian Classification Motivation: high cost of collecting labeled samples Semi-supervised MRF
Idea: use unlabeled samples to improve classification Ex. reduce salt-N-pepper noise
Effects on land-use data - smoothing
Bayesian Classifiers
![Page 56: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/56.jpg)
Outlier DetectionOutlier Detection Spatial Outlier Detection
Finding anomalous tuplesGlobal and spatial outlierDetection Approaches Graph-based outlier detection: variogram, Moran scatter plot Quantitative outlier detection: scatter plot, and z-score
Location-awareness
Outlier in Traffic Data
![Page 57: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/57.jpg)
Outlier DetectionOutlier Detection Tests: Quantitative, Graphical Quantitative Tests:
Scatter Plot
Spatial Z-test Quantitative Test Results
Tests: algebraic functions of join
Join predicate: neighbor relations
Our algorithm is I/O-efficient for Algebraic tests
![Page 58: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/58.jpg)
Outlier DetectionOutlier Detection Graphical Tests
Moran Scatter Plot
Variogram Cloud
![Page 59: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/59.jpg)
An Example of Spatial Outlier Detection An Example of Spatial Outlier Detection (backup)(backup)
Consider Scatter Plot Model Building
Neighborhood aggregate function
Distributive aggregate functions
Algebraic aggregate functions
)()(
1)(:
xNy
Naggr yf
kxEf
)(),(),()(),(),( 22 xExfxExfxExf
22 ))(()(
)()()()(
xfxfN
xExfxExfNm
22
2
))(()(
)()()()()(
xfxfN
xExfxfxExfb
)2(
)( 2
n
SmS xxyy
n
xfxfSxx
2
2 )()(
n
xfxES yy
2
2 )()(
![Page 60: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/60.jpg)
An Example of Spatial Outlier Detection An Example of Spatial Outlier Detection (backup)(backup)
TestingDifference function
Statistic test function ST
![Page 61: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/61.jpg)
Spatial Outlier DetectionSpatial Outlier Detection Separate two phases
Model Building
Testing: test a node (or a set of nodes)
Computation Structure of Model BuildingKey insights: Spatial self join using N(x) relationship Algebraic aggregate function can be computed in one disk scan of spatial
join
Computation Structure of TestingSingle node: spatial range query Get_All_Neighbors(x) operation
A given set of nodes Sequence of Get_All_Neighbor(x)
![Page 62: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/62.jpg)
Multiple Spatial Outlier DetectionMultiple Spatial Outlier Detection Deficiency of previous
algorithmsAn outlier may have negative impact on its nearby points E.g. S1 on E1
Outliers may be ignored E.g. S2
Courtesy: C.T.Lu, Virginia Tech
Expected Outliers: S1, S2, S3
Outliers detected by traditional approaches: E1, E2, S1
![Page 63: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/63.jpg)
Multiple Spatial Outlier DetectionMultiple Spatial Outlier Detection Multiple Spatial Outlier Detection
Iterative algorithm Detects one outlier in each iteration In successive iteration, substitute the attribute value of outlier detected in
previous iteration with the average of its neighbors
Median algorithm Use Median, instead of Mean, to represent the average attribute value of
neighbors
![Page 64: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/64.jpg)
Research Needs in Spatial Outlier Research Needs in Spatial Outlier DetectionDetection
Multiple spatial outlier detectionEliminating the influence of neighboring outliers
Incremental
Multi-attribute spatial outlier detectionUse multiple attributes as features
Design of spatial statistical tests Scale up for large data
![Page 65: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/65.jpg)
Association Rules – An AnalogyAssociation Rules – An Analogy Association rule e.g. (Diaper in T => Beer in T)
Support: probability (Diaper and Beer in T) = 2/5
Confidence: probability (Beer in T | Diaper in T) = 2/2
Algorithm Apriori [Agarwal, Srikant, VLDB94]Support based pruning using monotonicity
Note: Transaction is a core concept!
Transaction Items Bought
1 {socks, , milk, , beef, egg, …}
2 {pillow, , toothbrush, ice-cream, muffin, …}
3 { , , pacifier, formula, blanket, …}
… …
n {battery, juice, beef, egg, chicken, …}
![Page 66: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/66.jpg)
Spatial ColocationSpatial Colocation Spatial Colocation
A set of features frequently co-located Given
A set T of K boolean spatial feature types T={f1,f2, … , fk}
A set P of N locations P={p1, …, pN } in a spatial frame work S, pi P is of some spatial feature in TA neighbor relation R over locations in S
FindTc = subsets of T frequently co-located
ObjectiveCorrectness CompletenessEfficiency
Constraints R is symmetric and reflexiveMonotonic prevalence measure
Reference Feature Centric
Window Centric Event Centric
![Page 67: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/67.jpg)
Spatial ColocationSpatial Colocation
Participation index
Participation ratio pr(fi, c) of feature fi in colocation c = {f1, f2, …, fk}: fraction of instances of fi with feature {f1, …, fi-1, fi+1, …, fk} nearby. Participation index = min{pr(fi, c)}
AlgorithmHybrid Colocation Miner
Association rules Colocation rules
underlying space discrete sets continuous space
item-types item-types events /Boolean spatial features
collections transactions neighborhoods
prevalence measure support participation index
conditional probability measure
Pr.[ A in T | B in T ] Pr.[ A in N(L) | B at L ]
Comparison with Association rules
![Page 68: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/68.jpg)
Spatial Colocation: ApproachesSpatial Colocation: Approaches
Spatial feature A,B, C,and their instances
Support A,B =2 B,C=2
Support A,B=1 B,C=2
Support(A,B)=min(2/2,3/3)=1 Support(B,C)=min(2/2,2/2)=1
Partition approach
Our approach Dataset
Reference feature approach
C as reference featureTransactions: (B1) (B2)Support (A,B) = Ǿ
![Page 69: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/69.jpg)
Spatial Colocation: Partial-Join ApproachSpatial Colocation: Partial-Join ApproachRelated work and limitation Join-based approach is computationally expensive. Transaction-based association mining method is fast but no explicit transaction concept in spatial dataset
Partial-Join Approach Partition spatial objects
Keep cut neighbor relationships Partial join co-location mining algorithm
A transaction-based Apriori method Instance Join operation (to keep trace of cut
co-location instances) Computation: Partial join < Join-based
A B
A.1 B.1
A.2 B.4
A.3 B.3
3/5
A C
A.2 C.2
A.3 C.1
A.4 C.1
A.1 C.1
A.3 C.1
2/3
B C
B.4 C.2
B.3 C.3
B.3 C.1
2/5
A B C
A.2 B.4 C.2
A.3 B.3 C.1
2/5
GDA R
A
A
RR
R
D DGGH
H
H
H
H
Co-location patterns
{Auto dealer, Auto Repair shop},
{Department Store, Gift store}
Transactions
No Items
1 B.2, B.5
2 A.1, B.1
3 A.3, A4 C.1
4 B.3 C.3
5 A.2, B.4, C.2
Cut neighbor relations
A.1, C.1
A.3, B.3
B.3, C.1
C.2B.4
A.2
B.5
B.2
A.1
B.1
A4
C.1
A.3
B.3
C3
Intra
instances
Inter
instancesSpatial
Prevalence
measure
![Page 70: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/70.jpg)
Spatial Colocation: Join-less ApproachSpatial Colocation: Join-less Approach
A B
A.1 B.1
A.2 B.4
A.4 B.3
3/5
A C
A.1 C.1
A.2 C.2
A.3 C.1
A.4 C.1
2/3
B C
B.3 C.1
B.3 C.3
B.4 C.2
2/5
A B C
A.2 B.4 C.1
A 2 B.4 C.2
A.3 B.3 C.1
A 2 B.4 C.2
A.3 B.3 C.1
2/5
Star neighborhood
Center Neighbors
A.1 B.1, C.1
A.2 B.4, C.2
A.3 B.3, C.1
A.4 C.1
B.1
B.2
B.3 C.1, C.3
B.4 C.2
B.5
C.2B.4
A.2B.5
B.2
A.1
B.1 C.3
C.1
A.3
B.3
A.4
Star
instances
Spatial prevalence measure
clique check
Clique
instances
Join-less Approach Key Idea Partition spatial neighbor relationships. Instance filtering: No join, Instance lookup scheme Co-location pattern filtering: event-level, coarse level, refinement level filtering
Join less Co-location Mining Algorithm Partition disjoint star neighborhoods (edge partition) Star instances? clique check? co-location instances Complete and Correct
Computation: Join-less < Partial join
Related work and limitation Join-based: too expensive Partial-join: Expensive if cut relationships increase
![Page 71: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/71.jpg)
Spatial Colocation ApproachesSpatial Colocation Approaches Approaches
Spatial Join-based approaches Join based on map overlay e.g. [Estivill-Castro and Lee, 1001] Join using K-function e.g. [Shekhar and Huang, 2001]
Transaction-based approaches E.g. [Koperski and Han, 1995] and [Morimoto, 2001]
ChallengesNeighborhood definition“Right” trasactionazationStatistical interpretationComputational complexity Large number of joins Join predicate is a conjunction of
Neighbor Distinct item types
![Page 72: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/72.jpg)
OverviewOverview Input Statistical Foundation Output Computational Process Trends
![Page 73: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/73.jpg)
Computational ProcessComputational Process Most algorithmic strategies are applicable Algorithmic Strategies in Spatial Data Mining:
Classical Algorithms Algorithmic Strategies in SDM Comments
Divide-and-Conquer Space partitioning Possible loss of informationFilter-and-Refine Minimum-Bounding Rectangle
(MBR), Predicate Approximation
Ordering Plane Sweeping, Space Filling Curve
Hierarchical Structures Spatial Index, Tree Matching
Parameter Estimation Parameter estimation with spatial autocorrelation
Algorithmic Strategies in Spatial Data Mining
![Page 74: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/74.jpg)
Computational ProcessComputational Process Challenges
Does spatial domain provide computational efficiency Low dimensionality: 2-3 Spatial autocorrelation Spatial indexing methods
Generalize to solve spatial problems Linear regression vs. SAR
Continuity matrix W is assumed known for SAR, however, estimation of anisotropic W is non-trivial
Spatial outlier detection: spatial join Co-location: bunch of joins
![Page 75: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/75.jpg)
Example of Computational ProcessExample of Computational Process Teleconnection
Find (land location, ocean location) pairs with correlated climate changes Ex. El Nino affects climate at many land locations
Global Influence of El Nino during the Northern Hemisphere Winter(D: Dry, W: Warm, R: Rainfall)
Average Monthly Temperature
(Courtsey: NASA, Prof. V. Kumar)
![Page 76: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/76.jpg)
Example: Teleconnection (Cont’)Example: Teleconnection (Cont’) Challenge
high dimensional (e.g., 600) feature space
67k land locations and 100k ocean locations (degree by degree grid)
50-year monthly data
Computational EfficiencySpatial autocorrelation Reduce Computational Complexity
Spatial indexing to organize locations Top-down tree traversal is a strong filter Spatial join query: filter-and-refine
save 40% to 98% computational cost at θ = 0.3 to 0.9
![Page 77: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/77.jpg)
Parameter estimation of SARParameter estimation of SAR Spatial Auto-Regression Model
Estimate ρ and β for
The estimation uses maximum-likelihood (ML) theory
Log-likelihood function LLF = log-det + SSE + const
log-det = ln|I- ρW|
SSE =
XyWy
})W()W({2
12
yIMIy TTT
![Page 78: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/78.jpg)
Parameter estimation of SARParameter estimation of SAR Computational Insight:
LLF is uni-model [Kazar et al., 2005]: breakthrough result
Optimal ρ found by Golden Section Search or Binary Search
![Page 79: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/79.jpg)
Reducing Computational CostReducing Computational Cost Exact Solution
Bottleneck = evaluation of log-det
Reduce cost by getting a seed for ρ minimizing SSE term [Kazar et.al., 2005]
Approximate SolutionReduce cost by approximating log-determinant term
E.g., Chebyshev Polynomials, Taylor Series [LeSage and Pace, 2001]
Comparison of Accuracy, e.g., Chebyshev Polynomials >> Taylor Series [Kazar et.al., 2004]
![Page 80: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/80.jpg)
Reducing Computational CostReducing Computational Cost Parallel Solution
Computational ChallengesEigenvalue + Least square + ML
Computing all eigenvalues of a large matrix
Memory requirement
![Page 81: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/81.jpg)
Life Cycle of Data MiningLife Cycle of Data Mining CRISP-DM (CRoss-Industry Standard Process for DM)
Application/Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Is CRISP-DM adequate for Spatial Data Mining?
[1] CRISP-DM URL: http://www.crisp-dm.org
Phases of CRISP-DM
![Page 82: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/82.jpg)
SummarySummary What’s Special About Spatial Data Mining
Classical DM Spatial DM
Input Data All explicit, simple types Often implicit relationships, complex types
Statistical Foundation
Independence of samples Spatial autocorrelation
Output Interest Measures: set-based
Location-awareness
Computational Process
Combinatorial optimization,
Numerical Algorithms
Computational efficiency opportunity,
Spatial autocorrelation, plane-sweeping, New complexity: SAR, co-location mining, Estimation of anisotropic W is nontrivial
Objective Function
Max Likelihood, Min sum of squared errors
Map_Similarity (Actual, Predicted)
Constraints Discrete space, Support threshold, Confidence threshold
Keep NN together, Honor geo-boundaries
Other Issues Edge effect, scale
![Page 83: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/83.jpg)
OverviewOverview Input Statistical Foundation Output Computational Process Trends
Spatio-Temporal Data Mining
![Page 84: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/84.jpg)
Trends: Spatio-Temporal Data MiningTrends: Spatio-Temporal Data Mining Spatio-Temporal Data Spatio-Temporal Statistics Spatio-Temporal Patterns
![Page 85: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/85.jpg)
Spatio-Temporal DataSpatio-Temporal Data Spatial Time Series Data
Space is fixed
Measurement value changes
over a series of time
E.g. Global Climate Patterns,
Army vehicle movement
Average Monthly Temperature
Manpack stinger
(1 Objects) M2_IFV
(3 Objects)
Field_Marker
(6 Objects)
T80_tank
(2 Objects) BRDM_AT5 (enemy) (1 Object)
Army vehicle movement
![Page 86: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/86.jpg)
Spatio-Temporal DataSpatio-Temporal Data Moving objects Data
Area of interest changes
with the moving object
E.g. GPS track of a vehicle,
Personal Gazetteers
GPS Tracks of a User
Personal Gazetteer
(a personal gazetteer records places meaningful for a specific person)
![Page 87: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/87.jpg)
Spatio-Temporal Data: ModelingSpatio-Temporal Data: Modeling
Spatial Spatio-Temporal
Differentiation Aggregation
Topology 9-Intersection Matrix, OGIS
d/dt(9-Intersection Matrix)
Open
Time series of
9-Intersection Matrix
Vector Space
Location
OGIS – direction, distance, area,
perimeter
Speed, Velocity, d/dt(area)
Time series of points, lines, polygons (tracks)
Visualized as helixes (linear/angular motion)
Spatial properties of objects
Motion – Translation, Rotation, Deformation
d/dt(position, orientation, shape)
Open
e.g. Helix
Track = (ti, xi, yi) – moving object databases
Aspatial properties of objects
d/dt(mass) Time-series of velocities
![Page 88: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/88.jpg)
Spatio-Temporal Data: ModelingSpatio-Temporal Data: Modeling Topology
Differentiation
AggregationTime 1 2 3
Relation
disjoint meet overlap
9-intersection model
111
100
100
111
110
100
111
111
111
)()()(
)()()(
)()()(
BAdt
dBA
dt
dBA
dt
d
BAdt
dBA
dt
dBA
dt
d
BAdt
dBA
dt
dBA
dt
d
o
o
oooo
A, B - objects
![Page 89: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/89.jpg)
Spatio-Temporal Data: ModelingSpatio-Temporal Data: Modeling Open Problems
Aggregation Modeling – Helix
HelixRepresentation of trajectory and boundary changes in an object over time
Helix representation of an object’s trajectoryand change in shape over time
Spine – represents trajectory of the object
Prongs – representsdeformation of the object
Courtesy: University of Maine
![Page 90: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/90.jpg)
Trends: Spatio-Temporal Data MiningTrends: Spatio-Temporal Data Mining Spatio-Temporal Data Spatio-Temporal Statistics Spatio-Temporal Patterns
![Page 91: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/91.jpg)
Spatio-Temporal StatisticsSpatio-Temporal Statistics Emerging topic
“First” statistics book on Spatio-temporal models,1st edition, 2007
Chapter on Bayesian-basedSpatio-Temporal modeling,2004
32nd Spring Lecture Series,2007
Principal Lecturer: Noel Cressie
![Page 92: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/92.jpg)
Trends: Spatio-Temporal Data MiningTrends: Spatio-Temporal Data Mining Spatio-Temporal Data Spatio-Temporal Statistics Spatio-Temporal Patterns
![Page 93: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/93.jpg)
Spatio-Temporal PatternsSpatio-Temporal Patterns Association Colocation
Sustained Emerging
Mixed-Drove
Moving Clusters Hotspots Outlier Detection Prediction
![Page 94: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/94.jpg)
Spatio-Temporal Patterns: AssociationSpatio-Temporal Patterns: Association Spatio-temporal Associations in Climate Data
FPAR-Hi ==> NPP-Hi (sup=5.9%, conf=55.7%)
Grassland/Shrubland areas
Association rule is interesting because it appears mainly in regions with grassland/shrubland vegetation type
Courtesy: Tan et al 2001
![Page 95: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/95.jpg)
Spatio-Temporal Patterns: Mixed DroveSpatio-Temporal Patterns: Mixed Drove Ecology
Animal movements (migration, predator-prey, encounter)
Species relocation and extinction (wolf – deer)
GamesGame tactics of opponent team (soccer, American football, …)
Co-occurring role patterns
![Page 96: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/96.jpg)
Spatio-Temporal Patterns: Sustained EmergingSpatio-Temporal Patterns: Sustained Emerging
Sustained Emerging
time slot t=0 time slot t=2time slot t=1
Which pairs are sustained emerging patterns?
![Page 97: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/97.jpg)
Spatio-Temporal Patterns: Sustained EmergingSpatio-Temporal Patterns: Sustained Emerging
Sustained EmergingPublic health (Infectious emerging diseases - dengue fever)
homeland defense (looking for growing “events”, bio-defense)
(Singapore)
Courtesy: Wikipedia
• Newly emerging diseases o Re-emerging diseases
Instances of sustained
emerging patterns
![Page 98: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/98.jpg)
Spatio-Temporal Patterns: Moving ClustersSpatio-Temporal Patterns: Moving Clusters
Moving ClustersNorth Atlantic Oscillation
Source: Portis et al, Seasonality of the NAO, AGU Chapman Conference, 2000.
![Page 99: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/99.jpg)
Spatio-Temporal Patterns: Mixed DroveSpatio-Temporal Patterns: Mixed Drove Flock Pattern Mining
Flock Pattern [Gudmundsson05]Each time step treated separately
Time Patterns
1-10
3-9
3-9
3-9
A B
A C
B C
A B C
• Significant Flock Pattern
Patterns Interest Measure
(threshold 0.5)
(A B)
(A C)
(B C)
(A B C)
others
1
0.7
0.7
0.7
below threshold
Time Patterns
7
7
7
7
A D
B D
C D
A B C B
![Page 100: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/100.jpg)
Spatio-Temporal Patterns: OutliersSpatio-Temporal Patterns: Outliers Spatio-Temporal Outliers
Example Application: Sensor Networks - Traffic Data in Twin Cities
Abnormal Sensor Detections
Example: Sensor 9 (spatial) at time 0-60 (temporal)
![Page 101: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/101.jpg)
Spatio-Temporal Patterns: PredictionSpatio-Temporal Patterns: Prediction Predestination, John Krumm and Eric Horvitz, Microsoft
ResearchPredict driver’s probabilistic destinations
From driver’s destination history and behavior
Destination cells for a driver Probabilistic destinations, darker outlines arecells with higher probabilityCourtesy: Microsoft Research
![Page 102: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/102.jpg)
SummarySummaryWhat’s Special About Spatio-Temporal Data Mining ?
Spatial DM Spatio-Temporal DM
Input Data Often implicit relationships, complex types
Another dimension – Time.
Implicit relationships changing over time
Statistical Foundation Spatial autocorrelation Spatial autocorrelation and
Temporal correlation
Output Association Colocation Spatio-Temporal association
Mixed-Drove pattern
Sustained Emerging pattern
Clusters Hot-spots Flock pattern
Moving Clusters
Outlier Spatial outlier Spatio-Temporal outlier
Prediction Location prediction Future Location prediction
![Page 103: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/103.jpg)
BookBookhttp://www.spatial.cs.umn.edu
![Page 104: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/104.jpg)
ReferencesReferences N. Cressie, Statistics for Spatial Data, John Wiley and Sons, 1991
M. Degroot and M. Schervish, Probability and Statistics (Third Ed.), Addison Wesley, 2002
A. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Weighted Regression : The Analysis of Spatially Varying Relationships, John Wiley, 2002
M. Goodchild, Spatial Analysis and GIS, 2001 ESRI User Conference Pre-Conference Seminar
R. Hanning, Spatial Data Analysis : Theory and Practice, Cambridge University Press, 2003
Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer-Verlag, 2001
D. Huff, A Probabilistic Analysis of Shopping Center Trade Areas, Lan Economics, 1963
B. M. Kazar, S. Shekhar, D. J. Lilja, R. R. Vatsavai, R. K. Pace, Comparing Exact and Approximate Spatial Auto-Regression Model Solutions for Spatial Data Analysis , GIScience 2004
![Page 105: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/105.jpg)
ReferencesReferences
K. Kopperski and J. Han, Discovery of Spatial Association Rules in Geographic Information Database, SSTD, 1995
K. Kopperski, J. Adhikary, and J. Han, Spatial Data Mining: Progress and Challenges, DMKD, 1996
J. LeSage and R. K. Pace, Spatial Dependence in Data Mining, in Data Mining for Scientific and Engineering Applications, R.L. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, and R. R. Namburu (eds.), Kluwer Academic Publishing, p. 439-460, 2001.
H. Miller and J. Han(eds), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001
J. Roddick, K. Hornsby and M. Spiliopoulou, Yet Another Bibliography of Temporal, Spatial Spatio-temporal Data Mining Research, KDD Workshop, 2001
S. Shekhar, C. T. Lu, and P. Zhang, A Unified Approach to Detecting Spatial Outliers, GeoInformatica, 7(2), Kluwer Academic Publishers, 2003
![Page 106: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/106.jpg)
ReferencesReferences S. Shekhar and S. Chawla, Spatial Databases: A Tour, Prentice Hall, 2003
S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla, Spatial Contextual Classification and Prediction Models for Mining Geospatial Data, IEEE Transactions on Multimedia (special issue on Multimedia Databases), 2002
S. Shekhar and Y. Huang, Discovering Spatial Co-location Patterns: A Summary of Results, SSTD, 2001
P. Tan and M. Steinbach and V. Kumar and C. Potter and S. Klooster and A. Torregrosa, Finding Spatio-Temporal Patterns in Earth Science Data, KDD Workshop on Temporal Data Mining, 2001
W. Tobler, A Computer Movie Simulating Urban Growth of Detroit Region, Economic Geography, 46:236-240, 1970
P. Zhang, Y. Huang, S. Shekhar, and V. Kumar, Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries, SSTD, 2003
P. Zhang, M. Steinbach, V. Kumar, S. Shekhar, P.Tan, S. Klooster, C. Potter, Discovery of Patterns of Earth Science Data Using Data Mining, to appear in Next Generation of Data Mining Applications, edited by Mehmed M. Kantardzic and Jozef Zurada, IEEE Press, 2005
![Page 107: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/107.jpg)
ReferencesReferences K. Eickhorst, A. Croitoru, P. Agouris & A. Stefanidis (2004):
Spatiotemporal Helixes for Environmental Data Modeling, IEEE CompSAC, Hong Kong, Vol. 2, pp. 138-141.
H. Cao, N. Mamoulis, and D. W. Cheung, "Discovery of Periodic Patterns in Spatiotemporal Sequences," IEEE Transactions on Knowledge and Data Engineering (TKDE), to appear.
Marios Hadjieletheriou, George Kollios, Petko Bakalov, and Vassilis Tsotras. Complex Spatio-Temporal Pattern Queries. Proc. of the 31st International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, August 2005.
Nikos Mamoulis, Huping Cao, George Kollios, Marios Hadjieleftheriou, Yufei Tao, and David Cheung. Mining, Indexing, and Querying Historical Spatio-Temporal Data. Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, WA, August 2004.
Sanjay Chawla, Florian Verhein. Mining Spatio-Temporal Association Rules, Sources, Sinks, Stationary Regions and ThouroughFares in Object Mobility Databases" Proc. of 11th International Conference on Database Systems for Advanced Applications (DASFAA'06)
B. Arunasalam, S. Chawla and P. Sun, Striking Two Birds With One Stone: Simultaneous Mining of Positive and Negative Spatial Patterns, In Proceedings of the Fifth SIAM International Conference on Data Mining, Newport Beach, CA, 2005.
![Page 108: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/108.jpg)
Google Earth video…focusing MetrodomeGoogle Earth video…focusing Metrodome
![Page 109: Spatial Data Mining: Accomplishments and Research Needs](https://reader036.vdocument.in/reader036/viewer/2022062409/56814588550346895db26d58/html5/thumbnails/109.jpg)