theories and applications of spatial-temporal data mining and knowledge discovery

1. Theories and Applications of Spatial-Temporal Data Mining and Knowledge Discovery

Yee Leung

Email:[email_address]

Department of Geography

and Resource Management

The Chinese University of Hong Kong

2. 3. 4. a) b) 5. a) b) 6. 7. 8. 9. Daily rainfall data of two stations in Pearl River basin of China 10. 11. The monthly sunspot time series. 12. The Portuguese Stock Index PSI-20 evolution from 1993 to 2002 (adopted from J.A.O. Matos et al. / Physica A 342 (2004) 665 676) 13. Outbreak of Avian Flu in different regions 14. 15. What are the structures and processes hidden in spatial data?

What are the Concepts hidden in theinformation system?

Do the Concepts form a knowledgestructure?

16. Typhoon Tracks Adapted from Wang and Chan 17. Typhoon/Hurricane Tracking Objective:Intensity, track (land falling, recurvature) Object: The space-time track of unusually low sea- surface air pressure in the x-y-z plane Data: potential temperature, horizontal velocity,vertical velocity, relative humidity,horizontalwind, etc Data: Hundreds and thousands of gigabytes within aspecific time interval 18. 19. 20. 21. Data Mining in Hyperspectral Images 1. Objective Classification, Pattern Recognition 2. Object Spectral Signatures of Objects 3. Data Spectral, Non-spectral Data 4. Data Volume e.g. AVIRIS from 0.4 to 2.45 micrometers, 224 bands HYDICE from 0.4 to 2.5 micrometers, 210 bands Hyperion from 0.4 to 2.5 micrometers, 220 bands, 30 meter resolution 22. The Objective of Knowledge Discovery and Data Mining Fayyad:The discovery of non-trivial, novel,potentially useful and interpretable knowledge/information from data DataInformationKnowledgeDecision 23. Characteristics of Spatial Data

1. Voluminous

2. Sparse 3. Diversity 4. Complex 5. Dynamic 6. Redundant 7. Imperfect (random fuzzy granular incomplete noisy)8. Multi-scale 24. Main Tasks of Spatial Knowledge Discovery and Data Mining 1. Clustering 3. Association 2. Classification Spatial Relations Temporal Relations Spatial-temporal Relations *In particular the local-global issue 4. Processes 25. CLUSTERING

The Scale-Space Filtering Approach

The Regression-Classes Decomposition Approach

26. Scale Space Theory

Given a primary imagef (x) at a distance of from human eyes, the observed blur redimage f (x, ) can be mathematically determined by the following partial differential equation :

The solution of the above equation is explicitly expressed as where denotes the convolution operation, g (x,) is the Gaussian function 27. If the training samples are treated as an imaginary image with expression: Then the corresponding blurred imagef (x, , D l ) at scale can be specified by 28. Essentials of Clustering byScale-space Filtering

1. Visual system simulation

2. Cluster validity check 3. Clustering validity check 4. Relevant concepts (a) life time of a cluster (b) life time of a clustering (c) compactness (d) isolatedness 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.

Ms-time plot of clustering results for earthquakes (Ms6):

a) 3 clusters in the 59th~95th scale range; b) 17 clusters at the 6th scale step

a) b) 42. Temporal segmentation of Strong Earthquakes (Ms6.0) of 1290A.D. - 2000A.D.

Scale-space for earthquakes (Ms6)

Indices of clustering along the time scale for earthquakes (Ms6.0):

a) number of clusters; b) Lifetime, Isolation and Compactness of the clustering

a) b) 44. a) b) Ms-time plot of clustering results for earthquakes (Ms4.7):a) 2 clusters in the 74th~112th scale range; b) 18 clusters at the 10th scale step 45. Temporal Segmentation of Strong Earthquakes (Ms4.7) of 1484A.D. - 2000A.D.

Indices of clustering along the time scale for earthquakes (Ms4.7):

a) number of clusters ( The vertical axis just shows the part no larger than 150 );

b) Lifetime, Isolation and Compactness of the clustering

a) b) 46.

Table 1 Seismic active periods and episodes obtained by the clustering algorithm and the seismologists ( The number in parentheses is the number of earthquakes in the cluster )

47. 48. Advantages of Scale-space Filtering

Free from solving global optimization problem

Independent of initialization

Robust

Outliers Detection

Generalization of scale-related algorithms

Consistent with visual system

49. 5. Scale Space Clustering Scale-Space Filtering for Simulated Data 50. 5. Scale Space Clustering Scale-Space Filtering for Remote-Sensing Data Clustering Tree Quasi-Light 51. Clustering by Regression-Classes Decomposition Method 52. Simple Gaussian Class 53. Linear Structure 54. Identification of line objects in remotely sensed data 55. Ellipsoidal Structure 56. 57. Two ellipsoidal feature extraction 58. General Curvilinear Structure 59. Complex Shape Structure 60. ANALYSIS OF SPATIAL RELATIONSHIP

Global Description

Morans I

Gearys c

Local Description

Local Morans I

Local Gearys c

G Statistic

Geographically Weighted Regression

Mixture Distribution

61. Geographically Weighted Regression Hypothesis testing 1.Ho: No difference between OLR and GWR 2.Ho: a 1k= a 2k= = a nk 62. 63. 64. (Regression-Classes Decomposition Method) 65. CLASSIFICATION

The Neural Network Approach

The Classification and Regression Tree

The Statistical Classifiers

66. Information Extraction and Classification Neural Networks for Classification--MLP-BP 67. Some Typical Feedforward Neural Networks

Perception

In late 1950s: layered feed forward networks named perceptron.

Today:Perceptron : single-layer, feed-forward networks.

See Fig. 8, each multi-output unitOis fedindependentlyfrom the input units.

Figure 8. Perceptrons 68.

Mulitlayer Feedforward Neural Network

Learning algorithms for multilayer networks are neither efficient nor guaranteed to converge toglobal optimum .

Most popular learning method:back-propagation .

Back-propagation learning

The restaurant problem: use a 2-layer network, 10 attributes = 10 input units, 4 hidden units. See Fig. 13.

Some Typical Feedforward Neural Networks (con t) Fig. 13. A 2-layer feedforward network for the restaurant problem. 69. 70. 71. 72. 73. 74. 75. 76.

Competitive Pattern Recognition by Recurrent NN

77. 78. Typhoon Tracks Adapted from Wang and Chan 79. Trees by Classification and Regression Tree (CART)MSW 6/12/18: Maximum Sustained Wind of TC 6/12/18 hours before recurvature.0: Recurve,1: Straight 80.

1. If MSW of a TC is smaller than or equal to 34 m/s and MSW of that TC is smaller than 2 m/s 6 hours later, then the TC will recurve in 12 hours with 96% accuracy.

2. If MSW of a TC is smaller than or equal to 34 m/s and MSW of that TC is larger than 2 m/s 6 hours later, the TC will move straight in 12 hours with 86.8% accuracy.

3. If MSW of a TC is larger than 34 m/s, it will recurve in 18 hours with 94.1% accuracy.

Rules by CART 81. DISCOVERY OF TEMPORAL PROCESSES

The Multifractal Approach

Conventional Time Series Analyses

Mining of Scaling Behavior by Multifractal Analysis

83. Multiplicative Cascade

An approach to the study of scaling behavior with multiple scales (granules).

Multiplicative Binomial Cascade

84. Schematic representation of cascade (adopted from Puente and Lopez, 1995, Physical Letters A) 85. 86. TEMPORAL ANALYSIS

Linear Time Invariant SystemSelf SimilarityMultiscalingInfinitely Divisible Cascade

Stationary Process

Non-stationary Process

Random Walk

Fractional Brownian Motion, fmb

Multifractal Analysis of Stochastic Trends

87. The Multifractal Approach

Establish a data model for stochastic time series

Discovery of relevant models in stochastic time series

88. MF-DFA

Detrended fluctuation analysis (DFA) is a method for detecting the long-range correlation and fractal property in the both stationary and non-stationary time series.MF-DFA , which is based on DFA, can give full description of more complicated scaling behavior of time series

89. MF-DFA

Given a time serieswith lengthN.

Step1:i=1,2,,N;

Step2: DivideY(i)intonon-overlapping segments of equal lengthss.In order not to disregard this part of the series, the same procedure is repeated starting from the opposite end. Thereby, 2 Nssegments are obtained altogether.

90. MF-DFA

Step 3 . Calculate the local trend for each of the 2 N s segments by a least squares fit of the series. Then determine the variance

for each segment ,= 1 , . . .,N s , and

for=N s + 1 , . . . ,2 N s . Here,is the fitting polynomial in segment,whose ordermcan be 1, 2, 3.

Step 4 . Average over all segments to obtain theq th-order fluctuation function, defined

Where,s m+ 2.

91. MF-DFA

Step5: Determine the scaling behavior of the fluctuation functionby analyzing log-log plots ofFq(s)versussfor each value ofq . If we have , for large values ofs,we get the exponenth(q),which may depend onqgenerally.

H=h(q=2), for stationary time series;

H=h(q=2)-1,for non-stationary time series.

92. MF-DFA

For MF-DFA, ifh(q)is constant for allq , the corresponding time series is mono-fractal. However, ifh(q)varies withq,that means multifractal.

( adopted fromPeng et. al., 1994 )

94. 95. 96. 97. 98. 99. 100. 101. 102. Daily rainfall data of two stations in Pearl River basin of China 103. Log-log plots ofF q(s)versussfor the daily rainfall time seriesof station 56691 in Pearl River basin (left) and Station Chuantang in East River basin (right) withq =2. 104. Theh ( q ) curves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right). 105. Thecurves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right). 106. Thecurves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right) 107. Thecurves of daily rainfall time series of 5 stations in the Pearl River basin 108. Thecurves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right). 109. Thecurves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right). The real lines are their cascade model fitting. 110. The correlation relationship between the altitude of the rainfall stations in the East River basin and theD (2) value of the rainfall time series. 111. Elevation of rainfall stations in the East River basin with theD2values of their rainfall data.Elevation (m above MSL) 112. DISCOVERY OF KNOWLEDGE STRUCTURES

The Concept Lattice Approach

Discovery of Hierarchical Knowledge from Relational Spatial Data

114. 115. Spatial Concept/Class and Data Encapsulation 116. Concept Hierarchy 117. Inheritance 118. Generalization and Specialization 119. Summary

(1)Concept lattice as a mathematical foundation for object-oriented spatial information system

(2)Concpet lattice can be employed as method to unravel hierarchical structure from spatial information system

(3)A bridge between relational spatial information system (vector-based, raster-based) and object-oriented spatial information system.

120. Yee Leung. Knowledge Discovery in Spatial Data. Berlin: Springer-Verlag, 2010. [email_address] IGU-Commission on Modeling Geographical Systems http://www.science.mcmaster.ca/~igu~cmgs/

theories and applications of spatial-temporal data mining and knowledge discovery

Technology

time scale

scale range b

data spectral

scale step

scale space theory

advantages of scale

indices of clustering

clustering validity