discovering p ersistent change windows in big spatiotemporal datasets a summary of results
DESCRIPTION
Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results. Xun Zhou, Shashi Shekhar , Dev Oliver 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data ( BigSpatial 2013) Nov. 5, 2013. Outline. Motivation Problem Formulation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/1.jpg)
Discovering Persistent Change Windows in Big Spatiotemporal
DatasetsA summary of results
Xun Zhou, Shashi Shekhar, Dev Oliver
2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial 2013)
Nov. 5, 2013
![Page 2: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/2.jpg)
Outline
• Motivation• Problem Formulation• Challenges• Our Contribution• Novelty• Validation
![Page 3: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/3.jpg)
Motivation (1)• Understanding climate and environmental changes
– A global challenge: where, when, how, why?– Deforestation: forest is logged down at a certain speed– Desertification: grassland turned into desert– Urban changes: city sprawl, irrigation (vegetation increase).
• Detecting changes: An essential step– Where and when
Desertification Deforestation Urban sprawl
![Page 4: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/4.jpg)
Motivation (2)• Big Data for climate and earth science
– Land cover data at various resolutions: MODIS, Landsat, etc.– Help domain scientists find potential regions of interests:
desertification, deforestation, urban sprawl…– Google time lapse: Amazon deforestation [1]
• Our goal: • Find a spatial window and a time period where data value (e.g.,
vegetation cover) change at a certain high speed
1984 1998 2012
[1]. Google Earth Engine, https://earthengine.google.org/#intro/
![Page 5: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/5.jpg)
Problem Formulation: Basic Concepts• Spatiotemporal Windows
– A spatial field S, each location si has a time series of length |T|– Spatial window: a rectangular area in S.– ST window: a pair of <spatial window Sj, time interval Tj>
• Spatial aggregated time series– For a spatial window Sj, TSj ={ , ,…, }
– x(si, 1), x(si, 2),… are values in location si at time 1, 2, … |T|
– SUM can be replaced by AVG, etc.• Average change rate (ACR):
– For a ST window, ACR(Sj, Tj) = [TSj(t1) – TSj(tn)]/TSj(t1)/(tn-t1), Tj = [t1, tn]
• Persistent Change Window (PCW): ACR ≥ threshold
• “Total (average) vegetation cover in an area change at an average rate of … in a few years”
![Page 6: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/6.jpg)
Problem statement• Given:
– A spatial time series with |S| = M x N locations, and |T| time steps.– A threshold r of average change rate (ACR)– Minimum window size Smin and minimum time length Tmin
• Find:– All the ST persistent change windows <Si, Ti> where ACR(Si, Ti) ≥ r
• Objective:– Reduce computational cost
• Constraints:– |Si| ≥ Smin and |Ti| ≥Tmin
– <Si, Ti> is not a subset of any other window <S’, T’>, such that Si S’ and Ti T’
– Completeness & Correctness
![Page 7: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/7.jpg)
Examples
Red box (3x3) for T=[1,4]ACR = 16.5%
Yellow box (2x4) for T=[3,4]ACR = 14.5%
Threshold: 15%Smin
= 6, Tmin = 2
Output:<Red-box, [1, 4]>
![Page 8: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/8.jpg)
Challenges
• Large number of candidates ( big combinatorics)– M2xN2xT2 candidate patterns (M x N locations, T time steps).
• Pattern lack of monotonicity– Temporal pattern may have non-interesting part– Sub-regions in a window may be non-interesting
• Large dataset: • 250m MODIS tile: 4800 by 4800 pixels and 250 snapshots• Hundreds of such tiles in the dataset• Terabyte data volume
![Page 9: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/9.jpg)
Contributions
• Formulate the Persistent Change Window (PCW) discovery problem
• A ST window enumeration and pruning (SWEP) approach
• Theoretical analysis : correctness, completeness, and space/time complexity
• Case study on MODIS NDVI data • Experiments: scalability w.r.t. data volume and input
parameters.
![Page 10: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/10.jpg)
Related work
Spatiotemporal Change pattern discovery
Persistent (arbitrary long interval in long time series) Zonal Change
(Our work)
Other footprints
Time point (CUSUM[2]) or interval [7] in single time series
Local/zonal change across few snapshots(image differencing, object-based change detection[5,6])
Zonal change at time point (ST scan statistics [3, 4])
![Page 11: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/11.jpg)
Baseline solution: Naïve approach• Two step framework (N,M= sides of spatial field, T=# time steps)
– Step 1:• Enumerate all the pairs of {spatial window, time interval} and
generate aggregated time series for each window• Find interesting intervals for each spatial window and add to
candidate set• O(N3 x M3 x T3)
– Step 2:• For each window-interval pair (S, T) in candidate set, prune all the
pairs that are dominated by it. • O(k2) where k is the total number of candidates from step 1• K = O(N2 x M2 x T2) in the worst case
– Time complexity: • O(M x N x T)4 in worst case
![Page 12: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/12.jpg)
A ST Window Enumeration-n-Pruning approach (SWEP)
– Step 0: • Scan all the windows with left-top corner (1,1) and build a lookup
table for all spatial windows• O(M x N x T) time cost, O(M x N X T) memory cost
– Step 1:• Two level BFS enumeration of all ST windows• Outer loop: Enumerate all the LBN locations from (1,1,1)
– Find the enumeration space for the current LBN using record– Inner loop: enumerate all the “valid RTF” for each LBN– Record all the WPCs found in this iteration
– Step 2:• Refine step not needed. No dominated ST window will be generated.
![Page 13: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/13.jpg)
Step 0: Window sum lookup table(1,1)
A
C
B
D
SUM(Target area) = D – B – C + A
Target area
x\y 1 2 3 4
1 3 9 15 24
2 7 19 31 50
3 13 33 53 82
4 23 53 82 121T = 4
![Page 14: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/14.jpg)
Step 1: Two-level Enumeration (1)• Enumerate 3-D ST windows in the dataset using two corner locations
• BFS on the Left-bottom-near (LBN) and Right-top-far (RTF) locations• Avoid visiting dominated ST windows
LBN and RTF representation of a window Enumeration of LBN location Enumeration of RTF for each LBN
• Challenge: Record discovered PCWs for later pruning• For each LBN, record the discovered PCWs• For later LBNs, skip RTF locations inside these PCWs
W1 = <LBN1, RTF1> is a PCW. For LBN2 , we don’t need to test RTFs inside W1.
W1
![Page 15: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/15.jpg)
Step 1: Two-level enumeration (2)• A six-dimensional enumeration space
(1, 1, 1) (2, 1, 1) (3, 1, 1)
(1, 1, 2)
(3, 3, 3)
(1, 1, 3)
<(1,1,1), (3,3,3)>
<(1,1,1), (3,3,2)>
<(1,1,1), (3,2,3)>
<(1,1,1), (2,3,3)>
<(1,1,2), (3,3,3)>
<(1,2,1), (3,3,3)>
<(2,1,1), (3,3,3)>
<(1,1,1), (3,3,1)>
<(1,1,1), (3,2,2)>
<(1,1,1), (3,3,1)>
<(1,1,1), (3,2,2)>
<(1,1,1), (2,2,3)>
<(1,1,1), (3,1,3)>
<(1,1,2), (3,3,2)>
<(1,2,1), (3,3,2)>
<(2,1,1), (3,3,2)>
<(1,1,2), (3,2,3)>
<(1,2,1), (3,2,3)>
<(2,1,1), (3,2,3)>
<(1,1,1), (2,3,2)>
<(1,1,1), (2,2,3)>
<(1,1,1), (1,3,3)>
<(1,1,2), (3,3,2)>
<(1,1,2), (3,2,3)>
<(1,1,2), (2,3,3)>
<(1,1,2), (2,3,3)>
<(1,2,1), (2,3,3)>
<(2,1,1), (2,3,3)>
<(1,1,3), (3,3,3)>
<(1,2,2), (3,3,3)>
<(2,1,2), (3,3,3)>
<(1,2,1), (3,3,2)>
<(1,2,1), (3,2,3)>
<(1,2,1), (2,3,3)>
<(2,1,1), (3,3,2)>
<(2,1,1), (3,2,3)>
<(2,1,1), (2,3,3)>
<(1,2,2), (3,3,3)>
<(1,3,1), (3,3,3)>
<(2,2,1), (3,3,3)>
<(2,1,2), (3,3,3)>
<(2,2,1), (3,3,3)>
<(3,1,1), (3,3,3)>
(3, 2, 3)
![Page 16: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/16.jpg)
Evaluations
• Theoretical – Correct– Complete– Time & space complexity
• Case study– Land cover data: MODIS 250m NDVI Data
• Experimental Evaluation– Change data volume (with fixed time length)– Change data volume (with fixed number of locations)– Change the location of pattern in the search space
16
![Page 17: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/17.jpg)
Theoretical analysis
• The SWEP algorithm is correctness• The SWEP algorithm is complete• Space/time complexity (k = MxNxT)
17
Best Scenario
O(k) O(k2) O(k3)
O(k)
O(k2)
O(k3)
SWEP
Naive
O(k4)
O(k4)
Worst Scenario
SWEP
Naive
![Page 18: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/18.jpg)
Case study
An annual increase of 11.5%, 2001-2012
• Initial Results• MODIS 250m NDVI data (16 days)• Time:2000-2012. Annual: July 27/28 of each year.
Study area
2001 2006 2012
2001
2006
2012
Average NDVI in outlined window
Irrigation in Saudi Arabia, shown by Google Time lapse [1]
Results of the proposed algorithm with average change rate >= 10% (outlined window)
![Page 19: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/19.jpg)
Experiments• Questions:
– What is the impact of the data volume on run-time?– What is the impact of the pattern size on run-time?
• Synthetic data– Data volume (area size, time length)– Pattern size (pattern volume ratio, PVR)
• PVR = max pattern volume/ST data volume
• Settings:
– Matlab 2013 Under Linux– HP ProLiant BL280c G6 blade servers, with a quad-core 2.8 GHz Intel Xeon
X5560 processor and 24 GB shared memory
![Page 20: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/20.jpg)
Impact of Varying Dataset Size
2000 8000 18000 32000 50000 Data Volume (# values)
2000 8000 18000 32000 50000 Data Volume (# values)
25000 50000 75000 100000 125000 Data Volume (# values)
25000 50000 75000 100000 125000 Data Volume (# values)
(1) Fixed PVR = 0.1 (worst case), varying data volume with fixed T = 20
(2) Fixed PVR = 0.95 (best case), varying data volume with fixed T = 20
(3) Fixed PVR = 0.1 (worst case), varying data volume with fixed |S| = 2500
(4) Fixed PVR = 0.95 (best case), varying data volume with fixed |S| = 2500
![Page 21: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/21.jpg)
Impact of Varying Pattern Size(1) Fixed M = N = T, 125000 total data points, varying PVR from 0.1
(worst case) to 1 (best case)
Summary: SWEP is orders of magnitude faster than Naïve algorithm with respect to (1) data volume and (2) the pattern size.
![Page 22: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/22.jpg)
Conclusion and Future work
• The PCW discovery problem is defined• A space-time window enumeration and pruning (SWEP) approach is
proposed to mine PCW patterns • Correct, complete and faster.• Case study primarily show usefulness.
• Future work– Accelerate the approach using parallel computing (e.g., CUDA) – Improve the SWEP algorithm (e.g., multi-resolution enumeration)– More case studies on remote sensing datasets (e.g., Amazon deforestation)
to compare with known results (e.g., Google Time lapse).
![Page 23: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/23.jpg)
Acknowledgements & References • Acknowledgements
– NSF, USDOD for funding projects.– Minnesota Supercomputing Institute (MSI)– Spatial DB & DM group @ UMN
References[1] Google Engine ( https://earthengine.google.org/#intro/)[2] Basseville, Michele, and Igor V. Nikiforov. "Detection of abrupt changes: theory and applications." Journal of the Royal Statistical Society-Series A Statistics in Society 158.1 (1995): 185.
[3] Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics-Theory and methods, 26(6), 1481-1496.
[4] M. Kulldorff. Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society: Series A (Statistics in Society),164(1):61--72, 2001.
[5] Coppin, Pol, et al. "Review Article Digital change detection methods in ecosystem monitoring: a review." International journal of remote sensing 25.9 (2004): 1565-1596.
[6] A. Singh. Review article digital change detection techniques using remotely-sensed data. International journal of remote sensing, 10(6):989--1003, 1989.
[7] X. Zhou, S. Shekhar, P. Mohan, S. Liess, and P. K. Snyder. Discovering interesting sub-paths in spatiotemporal datasets: A summary of results. In19th ACM SIGSPATIAL GIS, pages 44-53. ACM,2011.
![Page 24: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/24.jpg)
Step 1: Two-level enumeration (2)• Find the space to enumerate in
each round– Skip any location that
• Falls into the union of existing PCWs
– “Covered space” of a LBN• The minimum set of RTF
locations to traverse for each LBN
• The “Covered space” of a LBN is a subset of the “covered space” of its predecessors.
– The space to traverse for each LBN is
• The intersection of covered space of all its direct parents [proof]
34
(1,1,1)
(3,3,3)
PCW
(2,2,3)
![Page 25: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/25.jpg)
Step 1: Two-level enumeration (3)• Record the traversal space of each LBN location
– Intersection of covered space of all the parents– Put all the “covered space“ as a list of “3-D Boolean maps”– Use a pointer array to link LBN with a “map”– Merge duplicate “maps”
35
Map2Map1 Map3 Map k
List of covered space
Map1
Map2
Map2
LBN1 LBN2 LBN3 LBN4
![Page 26: Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results](https://reader030.vdocument.in/reader030/viewer/2022032414/568132e3550346895d999f35/html5/thumbnails/26.jpg)
Theoretical analysis
• The SWEP algorithm is correctness• The SWEP algorithm is complete• Space/time complexity
36
Naïve SWEP
Time Complexity
Best case O(M3N3T2) O(MNT)
Worst case O(M4N4T4) O(M2N2T2)
Space Complexity
Best Case O(M3N3T2) O(MNT)
Worst case O(M4N4T4) O(M2N2T2)