learning spatial decision trees for geographical classification student: zhe jiang advisor: prof....
TRANSCRIPT
1
Learning Spatial Decision Trees for Geographical Classification
Student: Zhe JiangAdvisor: Prof. Shashi Shekhar
Thesis Committee Members: Prof. Shashi Shekhar, Prof. Vipin Kumar,
Prof. Arindam Banerjee, Prof. Joseph Knight,Prof. Snigdhansu Chatterjee
2
Biography
• Education:– PhD student in Computer Science, University of Minnesota, 2010 – now– B.E. in Electrical Engineering, University of Science and Technology of
China (USTC), 2006 – 2010
• Current Project:– Understanding Climate Change: A Data Driven Approach (2010 - now )
• Awards:– Doctoral Dissertation Fellowship, University of Minnesota, 2015 – 2016– NSF Travel Awards for SSTD 2011, ACM GIS 2012, IEEE ICDM 2014
3
Thesis related publications:[1] Jiang, Zhe, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran. "Focal-Test-Based Spatial Decision Tree Learning." IEEE Transactions on Knowledge & Data Engineering (TKDE) 6 (2015): 1547-1559.
[2] Jiang, Zhe, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran. "Focal-test-based spatial decision tree learning: A summary of results." In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp. 320-329. IEEE, 2013.
[3] Jiang, Zhe, Shashi Shekhar, Azamat Kamzin, and Joseph Knight. "Learning a Spatial Ensemble of Classifiers for Raster Classification: A Summary of Results." In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on, pp. 15-18. IEEE, 2014.
[4] Jiang, Zhe, Shashi Shekhar, Pradeep Mohan, Joseph Knight, and Jennifer Corcoran. "Learning spatial decision tree for geographical classification: a summary of results." In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (GIS), pp. 390-393. ACM, 2012.
Other selected publications:[5] Shekhar, Shashi, Zhe Jiang, Reem Ali, Emre Eftelioglu, Xun Tang, Viswanath Gunturi, Xun Zhou, “Spatiotemporal Data Science: A Computational Perspective”, Special Issue on Advances in Spatio-Temporal Data Analysis and Mining, ISPRS International Journal of Geo-Information, 2015. (minor revision)
[6] Ramnath, Sarnath, Zhe Jiang, Hsuan-Heng Wu, Venkata MV Gunturi, and Shashi Shekhar. ”A Spatio-Temporally Opportunistic Approach to Best-Start-Time Lagrangian Shortest Path.” In Advances in Spatial and Temporal Databases (SSTD), pp. 274-291. Springer, 2015.
[7] Mohan, Pradeep, Shashi Shekhar, James A. Shine, James P. Rogers, Zhe Jiang, and Nicole Wayant. “A neighborhood graph based approach to regional co-location pattern discovery: A summary of results.” 19th ACM SIGSPATIAL GIS, pp. 122-132. ACM, 2011.
[8] Jiang, Zhe, Micheal Evans, Dev Oliver, Shashi Shekhar. “Identifying K Primary Corridors from Urban Bicycle GPS Trajectories on a Road Network.” Special Issues on Mining Urban Data, Information Systems Journal, Elsevier (major revision).
WPE exam
this talk
Publications
4
Outline
• Motivation• Problem Statement• Challenges• Related Work• Proposed Approach• Evaluation• Conclusion, Future Work
5
Motivation• Civil earth observation – a national priority
– Geo-referenced digital information about Earth
– Societal benefit areas
• Other potential application– lesion classification and brain tissue segmentation in MRI images
Climate science,methane emission
Disaster management, flood control
Maintain biodiversity Agriculture, crop yield prediction
Water quality monitor, Algal bloom
Satellite imagery Aerial photos Stream flow data at stations Temperature data from models
6
Motivation Example in Wetland Mapping
(a) aerial photo (NIR,G,B) in spring
(b) aerial photo (R,G,B) in summer
(c) ground truth (d) decision tree prediction
o Salt-and-pepper noise in decision tree predictiono Labor intensive pre/post-processing is needed.
wetland dry land
7
Outline
• Motivation• Problem Statement• Challenges• Related Work• Proposed Approach• Evaluation• Conclusion, Future Work
8
Problem Formulation: Basic Concepts
• Spatial raster framework (F)– tessellation of 2-D plane into a regular grid– feature layers, class label layer
• Spatial data sample– a pixel or location on F– <feature vector, class, location>
• Neighborhood relationship– range of dependency, W-Matrix
• Salt-and-pepper noise– in class map, pixel or pixel group distinct from neighbors
9
Problem Statement• Given:
– training & test samples from a raster spatial framework– spatial neighborhood, its maximum size
• Find:– a (spatial) decision tree
• Objective:– minimize classification error and salt-and-pepper noise
• Constraint:– spatial autocorrelation exists in class map (pixel size << parcel size)– large training set consists of contiguous patches
10
Problem Example with Decision TreeInput: Output:
Decision Tree
C
Pixel ID
G
A
B
I
M D
E
L
F
J
P
Q
R
H
f1 ≤ 1
green redyes no
A B C D E FG H I J K LM N O P Q R
A B C D E FG H I J K LM N O P Q R
Predicted map
Salt-and-pepper noise H and K (Queen-neighborhood)
N
Table columns:1. features f1, f2
2. classes: red, green
Table rows:spatial data samples
feature testinformation gain
feature testinformation gain
f1 ≤ 1 0.50
feature testinformation gain
f1 ≤ 1 0.50
f2 ≤ 1 0.46
feature testinformation gain
f1 ≤ 1 0.50
f2 ≤ 1 0.46
f2 ≤ 2 0.19
ID f1 f2 classA 1 1 greenB 1 1 greenC 1 3 greenD 3 2 redE 3 2 redF 3 2 redG 1 1 greenH 3 1 greenI 1 3 greenJ 3 2 redK 1 2 redL 3 2 redM 1 1 greenN 1 1 greenO 1 3 greenP 3 2 redQ 3 2 redR 3 2 red
ID f1 f2 classA 1 1 greenB 1 1 greenC 1 3 greenG 1 1 greenI 1 3 greenK 1 2 redM 1 1 greenN 1 1 greenO 1 3 greenD 3 2 redE 3 2 redF 3 2 redH 3 1 greenJ 3 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 red
ID f1 f2 classA 1 1 greenB 1 1 greenG 1 1 greenH 3 1 greenM 1 1 greenN 1 1 greenD 3 2 redE 3 2 redF 3 2 redJ 3 2 redK 1 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 redC 1 3 greenI 1 3 greenO 1 3 green
ID f1 f2 classA 1 1 greenB 1 1 greenG 1 1 greenH 3 1 greenM 1 1 greenN 1 1 greenD 3 2 redE 3 2 redF 3 2 redJ 3 2 redK 1 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 redC 1 3 greenI 1 3 greenO 1 3 green
ID f1 f2 classA 1 1 greenB 1 1 greenC 1 3 greenG 1 1 greenI 1 3 greenK 1 2 redM 1 1 greenN 1 1 greenO 1 3 greenD 3 2 redE 3 2 redF 3 2 redH 3 1 greenJ 3 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 red
O
K
11
Outline
• Motivation• Problem Statement• Challenges• Related Work• Proposed Approach• Evaluation• Conclusion, Future Work
12
Challenges
• Spatial autocorrelation effect– Tobler’s first law of Geography– violating assumption of identical and
independent distribution (i.i.d.)– salt-and-pepper noise
• Spatial anisotropy– spatial dependency varies with direction– asymmetric spatial neighborhood
• High computational cost– large amount of computation with spatial
neighborhoods of different sizes
Decision tree predictionGround truth classes
13
Related Work & Limitations
Decision tree classifiers
Traditional non-spatial tree
Spatial entropy and information gain
Local-test-based decision tree Focal-test-based decision tree
(ID3 1986, CART 1984, C4.5 1993)
(Jiang 2012, Li 2006, Stojanova 2011 & 2012)
o i.i.d. assumptiono ignoring spatial autocorrelationo salt-and-pepper noise
o tree nodes test each pixel independentlyo spatial autocorrelation in selection of node testo still salt-and-pepper noise when all candidate tests poor
(Jiang ICDM 2013, Jiang TKDE 2015)
Local Focal
14
Summary Of Contributions
• Focal-test-based Spatial Decision Tree– focal tree node test– adaptive spatial neighborhoods
• Computational refinement in learning algorithm– theoretical proof of correctness– computational cost models
• Evaluation on real world dataset– classification accuracy, salt-and-pepper noise– computational scalability
15
Outline
• Motivation• Problem Statement• Challenges• Related Work• Proposed Approach• Evaluation• Conclusion, Future Work
Proposed Approach: Focal Test
16
Local Focal Zonal
Local Test Focal Test
Spatial Domain
Test Statistics
Test Outcome
Example
3 3 33 1 33 3 3
1 0.310.3-1 0.3
1 0.31
-1 -1 -1-1 1 -1-1 -1 -1
(a) feature f (b) focal function Γ
(c) local indicator I(f ≤ 1) (d) focal test I X Γ
-1 -.3-1-.3-1 -.3-1 -.3-1
(g) local prediction (h) focal prediction
I(f ≤ 1)
green red
+ -
(e) local tree
I X Γ
green red
+ -
(f) focal tree
Local Test Focal Test
Spatial Domain a cell itself spatial neighborhood
Test Statistics
Test Outcome
Example
Local Test Focal Test
Spatial Domain a cell itself spatial neighborhood
Test Statisticslocal indicator:
local autocorrelation: e.g., Gamma index, Moran’s I, Geary’s C
Test Outcome
Example
Local Test Focal Test
Spatial Domain a cell itself spatial neighborhood
Test Statisticslocal indicator:
local autocorrelation: e.g., Gamma index, Moran’s I, Geary’s C
Test Outcome independentdependency within neighborhoods
Example
17
ID f1 f2 Γ1 classA 1 1 greenB 1 1 greenC 1 3 greenD 3 2 redE 3 2 redF 3 2 redG 1 1 greenH 3 1 greenI 1 3 greenJ 3 2 redK 1 2 redL 3 2 redM 1 1 greenN 1 1 greenO 1 3 greenP 3 2 redQ 3 2 redR 3 2 red
Proposed Approach: Illustrative ExampleTraditional decision tree Spatial decision tree
Inputs: table of records Inputs: • feature maps, class map• Rook neighborhoodID f1 f2 Γ1 class
A 1 1 greenB 1 1 greenC 1 3 greenG 1 1 greenI 1 3 greenK 1 2 redM 1 1 greenN 1 1 greenO 1 3 greenD 3 2 redE 3 2 redF 3 2 redH 3 1 greenJ 3 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 red
1 1 1 3 3 31 3 1 3 1 31 1 1 3 3 3
1 1 3 2 2 21 1 3 2 2 21 1 3 2 2 2
Feature f1
Feature f2
Class map
1 .3 .3 .3 .3 1.3 -1 0 0 -1 .31 .3 .3 .3 .3 1
Focal function Γ1
I(f1 ≤ 1)
green red+ -
C
G
A
B
I
M D
E
L
P
F
J
Q
R
N
H
I(f1 ≤ 1) * Γ1
green red+ -
A B C D E FG H I J K LM N O P Q R
Predicted mapA B C D E FG H I J K LM N O P Q R
Predicted map
ID f1 f2 Γ1 classA 1 1 1 greenB 1 1 0.3 greenC 1 3 0.3 greenG 1 1 0.3 greenI 1 3 0 greenK 1 2 -1 redM 1 1 1 greenN 1 1 0.3 greenO 1 3 0.3 greenD 3 2 0.3 redE 3 2 0.3 redF 3 2 1 redH 3 1 -1 greenJ 3 2 0 redL 3 2 0.3 redP 3 2 0.3 redQ 3 2 0.3 redR 3 2 1 red
O
K
C
G
A
B
I
M D
E
L
P
F
J
Q
R
N
K
O
H
1 1 1 3 3 31 3 1 3 1 31 1 1 3 3 3
18
Proposed Approach: Adaptive Spatial Neighborhoods• Why adaptive spatial neighborhoods?
– rich texture in high resolution feature images– fixed neighborhoods cause over-smoothing effect
• How to form adaptive spatial neighborhood?1. create indicator map of local test results I(f ≤ δ)
2. draw a square window of size Smax by Smax
3. segment window into connected components of same indicator
4. outmost connected component within which current pixel lies
1 1 1 -1 1 1 11 1 1 -1 -1 1 11 1 -1 -1 -1 1 11 1 -1 1 -1 1 11 1 -1 -1 -1 1 11 1 -1 -1 -1 1 11 -1 -1 -1 -1 -1 1
1 1 1 -1 1 1 11 1 1 -1 -1 1 11 1 -1 -1 -1 1 11 1 -1 1 -1 1 11 1 -1 -1 -1 1 11 1 -1 -1 -1 1 11 -1 -1 -1 -1 -1 1
True class map Local test (numbers), fixed neighborhood (blue)
Local test (numbers),adaptive neighborhood(blue)
+ + + + + + ++ + + + + + ++ + + + + + ++ + + + + + ++ + - + - + ++ + - + - + ++ - - + - - +Focal test results and predictions
+ + + - + + ++ + + - - + ++ + - - - + ++ + - - - + ++ + - - - + ++ + - - - + ++ - - - - - +
Focal test results and predictions
1 1 -1 -1 -1 1 11 -1 -1 1 -1 -1 1-1 -1 1 1 1 -1 -1-1 1 1 1 1 1 -1-1 -1 1 1 1 -1 -11 -1 -1 1 -1 -1 11 1 -1 -1 -1 1 1
+ + - - - + ++ - - - - - +- - - - - - -- - - - - - -- - - - - - -+ - - - - - ++ + - - - + +
+ + + + + + ++ + + + + + ++ + - - - + ++ + - + - + ++ + - - - + ++ + + + + + ++ + + + + + +
1 1 -1 -1 -1 1 11 -1 -1 1 -1 -1 1-1 -1 1 1 1 -1 -1-1 1 1 1 1 1 -1-1 -1 1 1 1 -1 -11 -1 -1 1 -1 -1 11 1 -1 -1 -1 1 1
True class map Local test (numbers), fixed neighborhood (blue)
Local test (numbers),adaptive neighborhood(blue)
Focal test results and predictions
Focal test results and predictions
19
Proposed Approach: Computational Refinement
• Computational bottleneck: number of focal computation– quadratic to number of samples– linear to number of distinct feature values– linear to number of features
• Bottleneck test:– focal cost increases superlinearly– focal cost dominates
Variable Example
# of samples: N 104 106
Min tree node size: No 50 50
# of features: F 10 10
# of feature values: Nd 256 256
# of focal computation:around N2 Nd F / N0
5*107 5*1011
20
• Key idea– sort all candidate test thresholds (feature values) in order– focal function values mostly same across two thresholds– Cross-threshold-reuse and incremental update
• Illustrative example
Proposed Approach: Computational Refinement
1 9 9 92 9 9 93 8 7 63 4 5 5
1 -1 -1 -1-1 -1 -1 -1-1 -1 -1 -1-1 -1 -1 -1
-1 0.6 1 10.60.8 1 11 1 1 11 1 1 1
1 -1 -1 -11 -1 -1 -1-1 -1 -1 -1-1 -1 -1 -1
-.3 0.2 1 1-.6 0.5 1 10.60.8 1 11 1 1 1
(a) feature values (b) indicators, focal values for δ=1 (c) indicators, focal values for δ=2
1 -1 -1 -11 -1 -1 -11 -1 -1 -1-1 -1 -1 -1
-.3 0.2 1 1-.2 .25 1 1-.6 0.5 1 10.30.6 1 1
1 -1 -1 -11 -1 -1 -11 -1 -1 -11 -1 -1 -1
-.3 0.2 1 1-.2 .25 1 1-.2 .25 1 1-.3 0.2 1 1
(d) indicators, focal values for δ=3(intermediate)
(e) indicators, focal values for δ=3(final)
Candidate threshold δ: {1, 2, 3, 4, 5, 6, 7, 8}
Queen neighborhood.
21
Proposed Approach: Theoretical Analysis• Theorem:
– The incremental update algorithm is correct.
• Computational cost models:– Baseline algorithm Incremental update algorithm
Symbol Explanation
F number of features
N number of samples
Nd number of distinct feature values
Smax maximum neighborhood size
N0 minimum tree node size
22
Outline
• Motivation• Problem Statement• Challenges• Related Work• Proposed Approach• Evaluation• Conclusion, Future Work
23
Evaluation: Experiment Design• Goals:
– Classification performance comparison• Spatial decision tree (SDT) versus decision tree• classification accuracy• salt-and-pepper noise level
– Computational performance comparison• SDT baseline v.s. refined algorithms
• Dataset:– Classes: wetland, dry land; – Features: high resolution (3m*3m) aerial
photos (RGB, NIR, NDVI) in 2003, 2005, 2008– Training set: systematic cluster sampling; Test
set: remaining pixels on the scene; – Max neighborhood size: 11 pixels by 11 pixels
Chanhassen, MN
24
Classification Performance Evaluation
Model Confusion Matrix Prec. Recall F-score Autocorrelation
DT 99,141 10,688 0.81 0.75 0.78 0.87
15,346 45,805
FTSDT-Fixed neighborhood
99,755 10,074 0.83 0.80 0.81 0.96
12,470 48,681
FTSDT-adaptive-
neighborhood
99,390 10,439 0.83 0.83 0.83 0.93
10,618 50,533
Note: red true negative, green true positive, blue false negative, black false positive
Model Khat Khat Variance
DT 0.66 3.6*10-6
FTSDT-Fixed 0.71 3.2*10-6
FTSDT-Adaptive 0.73 3.0*10-6
Model Z-Score Significant
DT v.s. FTSDT-Fixed
18.2 yes
FTSDT-Fixed v.s. FTSDT-Adapt
8.6 yes
decision tree spatial decision tree (adaptive)
Trends:1. DT: salt-and-pepper noise2. SDT improve accuracy, salt-and-pepper noise levels3. SDT with adaptive neighborhoods best accuracy
Classification Performance Evaluation
spatial decision tree (fixed)
true wetland
true dry land
false dry land
false wetland
26
Computational Performance Evaluation: Effect on Number of Training Samples
Analysis of result:• Refined algorithm much faster
than baseline algorithm• Time costs of both algorithms
increase with training set size, but rate of increase much smaller for refined algorithm
Fixed variables: 12 features, 256 distinct feature values, max neighborhood 11*11, min node size 50
27
Computational Performance Evaluation: Effect on Minimum Tree Node Size
Analysis of result:• Refined algorithm much faster
than baseline algorithm• Time costs of both algorithms
decrease with min tree node size
Fixed variables: 12 features, 256 distinct feature values, max neighborhood 11*11, 7000 samples
28
Computational Performance Evaluation: Effect on Maximum Neighborhood Size
Analysis of result:• Refined algorithm much faster than
baseline algorithm• Time costs of both algorithms
increase with max neighborhood size
Fixed variables: 12 features, 256 distinct feature values, min tree node size 50, 7000 training samples
29
Computational Performance Evaluation: Effect on Number of Distinct Feature Values
Analysis of result:• Refined algorithm much faster than
baseline algorithm• Time cost of baseline algorithm
increase almost linearly with # of distinct feature values
• Time cost of refined algorithm unchanged with # of distinct feature values
Fixed variables: 12 features, max neighborhood size 11*11, min tree node size 50, 7000 samples
30
Conclusion• Proposed a novel spatial decision tree (SDT)
– focal feature test– adaptive spatial neighborhoods
• Computational optimization on training algorithm– theoretical analysis on correctness– computational cost model
• Evaluation on real world datasets– SDT reduces salt-and-pepper noise, improves accuracy– computational refinement reduces training time costs
Decision tree Spatial decision tree
Assumption on sample distribution
identical & independent distribution
spatial autocorrelation, anisotropy
Candidate node test local feature test focal feature test
Candidate test selection information gain spatial information gain
Computational structure linear scanning incremental map update
31
Future Work
• Address the challenge of spatial heterogeneity– spatial ensemble of (spatial) decision trees– geographical partitioning and local models
• Spatiotemporal decision trees– temporal autocorrelation– temporal non-stationarity
• Novel applications– precision agriculture