learning spatial decision trees for geographical classification student: zhe jiang advisor: prof....

1

Learning Spatial Decision Trees for Geographical Classification

Student: Zhe JiangAdvisor: Prof. Shashi Shekhar

Thesis Committee Members: Prof. Shashi Shekhar, Prof. Vipin Kumar,

Prof. Arindam Banerjee, Prof. Joseph Knight,Prof. Snigdhansu Chatterjee

2

Biography

• Education:– PhD student in Computer Science, University of Minnesota, 2010 – now– B.E. in Electrical Engineering, University of Science and Technology of

China (USTC), 2006 – 2010

• Current Project:– Understanding Climate Change: A Data Driven Approach (2010 - now )

• Awards:– Doctoral Dissertation Fellowship, University of Minnesota, 2015 – 2016– NSF Travel Awards for SSTD 2011, ACM GIS 2012, IEEE ICDM 2014

3

Thesis related publications:[1] Jiang, Zhe, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran. "Focal-Test-Based Spatial Decision Tree Learning." IEEE Transactions on Knowledge & Data Engineering (TKDE) 6 (2015): 1547-1559.

[2] Jiang, Zhe, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran. "Focal-test-based spatial decision tree learning: A summary of results." In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp. 320-329. IEEE, 2013.

[3] Jiang, Zhe, Shashi Shekhar, Azamat Kamzin, and Joseph Knight. "Learning a Spatial Ensemble of Classifiers for Raster Classification: A Summary of Results." In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on, pp. 15-18. IEEE, 2014.

[4] Jiang, Zhe, Shashi Shekhar, Pradeep Mohan, Joseph Knight, and Jennifer Corcoran. "Learning spatial decision tree for geographical classification: a summary of results." In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (GIS), pp. 390-393. ACM, 2012.

Other selected publications:[5] Shekhar, Shashi, Zhe Jiang, Reem Ali, Emre Eftelioglu, Xun Tang, Viswanath Gunturi, Xun Zhou, “Spatiotemporal Data Science: A Computational Perspective”, Special Issue on Advances in Spatio-Temporal Data Analysis and Mining, ISPRS International Journal of Geo-Information, 2015. (minor revision)

[6] Ramnath, Sarnath, Zhe Jiang, Hsuan-Heng Wu, Venkata MV Gunturi, and Shashi Shekhar. ”A Spatio-Temporally Opportunistic Approach to Best-Start-Time Lagrangian Shortest Path.” In Advances in Spatial and Temporal Databases (SSTD), pp. 274-291. Springer, 2015.

[7] Mohan, Pradeep, Shashi Shekhar, James A. Shine, James P. Rogers, Zhe Jiang, and Nicole Wayant. “A neighborhood graph based approach to regional co-location pattern discovery: A summary of results.” 19th ACM SIGSPATIAL GIS, pp. 122-132. ACM, 2011.

[8] Jiang, Zhe, Micheal Evans, Dev Oliver, Shashi Shekhar. “Identifying K Primary Corridors from Urban Bicycle GPS Trajectories on a Road Network.” Special Issues on Mining Urban Data, Information Systems Journal, Elsevier (major revision).

WPE exam

this talk

Publications

4

Outline

• Motivation• Problem Statement• Challenges• Related Work• Proposed Approach• Evaluation• Conclusion, Future Work

5

Motivation• Civil earth observation – a national priority

– Geo-referenced digital information about Earth

– Societal benefit areas

• Other potential application– lesion classification and brain tissue segmentation in MRI images

Climate science,methane emission

Disaster management, flood control

Maintain biodiversity Agriculture, crop yield prediction

Water quality monitor, Algal bloom

Satellite imagery Aerial photos Stream flow data at stations Temperature data from models

6

Motivation Example in Wetland Mapping

(a) aerial photo (NIR,G,B) in spring

(b) aerial photo (R,G,B) in summer

(c) ground truth (d) decision tree prediction

o Salt-and-pepper noise in decision tree predictiono Labor intensive pre/post-processing is needed.

wetland dry land

7

Outline


8

Problem Formulation: Basic Concepts

• Spatial raster framework (F)– tessellation of 2-D plane into a regular grid– feature layers, class label layer

• Spatial data sample– a pixel or location on F– <feature vector, class, location>

• Neighborhood relationship– range of dependency, W-Matrix

• Salt-and-pepper noise– in class map, pixel or pixel group distinct from neighbors

9

Problem Statement• Given:

– training & test samples from a raster spatial framework– spatial neighborhood, its maximum size

• Find:– a (spatial) decision tree

• Objective:– minimize classification error and salt-and-pepper noise

• Constraint:– spatial autocorrelation exists in class map (pixel size << parcel size)– large training set consists of contiguous patches

10

Problem Example with Decision TreeInput: Output:

Decision Tree

C

Pixel ID

G

A

B

I

M D

E

L

F

J

P

Q

R

H

f1 ≤ 1

green redyes no

A B C D E FG H I J K LM N O P Q R


Predicted map

Salt-and-pepper noise H and K (Queen-neighborhood)

N

Table columns:1. features f1, f2

2. classes: red, green

Table rows:spatial data samples

feature testinformation gain


f1 ≤ 1 0.50


f1 ≤ 1 0.50

f2 ≤ 1 0.46


f1 ≤ 1 0.50

f2 ≤ 1 0.46

f2 ≤ 2 0.19

ID f1 f2 classA 1 1 greenB 1 1 greenC 1 3 greenD 3 2 redE 3 2 redF 3 2 redG 1 1 greenH 3 1 greenI 1 3 greenJ 3 2 redK 1 2 redL 3 2 redM 1 1 greenN 1 1 greenO 1 3 greenP 3 2 redQ 3 2 redR 3 2 red

ID f1 f2 classA 1 1 greenB 1 1 greenC 1 3 greenG 1 1 greenI 1 3 greenK 1 2 redM 1 1 greenN 1 1 greenO 1 3 greenD 3 2 redE 3 2 redF 3 2 redH 3 1 greenJ 3 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 red

ID f1 f2 classA 1 1 greenB 1 1 greenG 1 1 greenH 3 1 greenM 1 1 greenN 1 1 greenD 3 2 redE 3 2 redF 3 2 redJ 3 2 redK 1 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 redC 1 3 greenI 1 3 greenO 1 3 green

ID f1 f2 classA 1 1 greenB 1 1 greenG 1 1 greenH 3 1 greenM 1 1 greenN 1 1 greenD 3 2 redE 3 2 redF 3 2 redJ 3 2 redK 1 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 redC 1 3 greenI 1 3 greenO 1 3 green

ID f1 f2 classA 1 1 greenB 1 1 greenC 1 3 greenG 1 1 greenI 1 3 greenK 1 2 redM 1 1 greenN 1 1 greenO 1 3 greenD 3 2 redE 3 2 redF 3 2 redH 3 1 greenJ 3 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 red

O

K

11

Outline


12

Challenges

• Spatial autocorrelation effect– Tobler’s first law of Geography– violating assumption of identical and

independent distribution (i.i.d.)– salt-and-pepper noise

• Spatial anisotropy– spatial dependency varies with direction– asymmetric spatial neighborhood

• High computational cost– large amount of computation with spatial

neighborhoods of different sizes

Decision tree predictionGround truth classes

13

Related Work & Limitations

Decision tree classifiers

Traditional non-spatial tree

Spatial entropy and information gain

Local-test-based decision tree Focal-test-based decision tree

(ID3 1986, CART 1984, C4.5 1993)

(Jiang 2012, Li 2006, Stojanova 2011 & 2012)

o i.i.d. assumptiono ignoring spatial autocorrelationo salt-and-pepper noise

o tree nodes test each pixel independentlyo spatial autocorrelation in selection of node testo still salt-and-pepper noise when all candidate tests poor

(Jiang ICDM 2013, Jiang TKDE 2015)

Local Focal

14

Summary Of Contributions

• Focal-test-based Spatial Decision Tree– focal tree node test– adaptive spatial neighborhoods

• Computational refinement in learning algorithm– theoretical proof of correctness– computational cost models

• Evaluation on real world dataset– classification accuracy, salt-and-pepper noise– computational scalability

15

Outline


Proposed Approach: Focal Test

16

Local Focal Zonal

Local Test Focal Test

Spatial Domain

Test Statistics

Test Outcome

Example

3 3 33 1 33 3 3

1 0.310.3-1 0.3

1 0.31

-1 -1 -1-1 1 -1-1 -1 -1

(a) feature f (b) focal function Γ

(c) local indicator I(f ≤ 1) (d) focal test I X Γ

-1 -.3-1-.3-1 -.3-1 -.3-1

(g) local prediction (h) focal prediction

I(f ≤ 1)

green red

+ -

(e) local tree

I X Γ

green red

+ -

(f) focal tree


Spatial Domain a cell itself spatial neighborhood

Test Statistics

Test Outcome

Example



Test Statisticslocal indicator:

local autocorrelation: e.g., Gamma index, Moran’s I, Geary’s C

Test Outcome

Example



Test Statisticslocal indicator:

local autocorrelation: e.g., Gamma index, Moran’s I, Geary’s C

Test Outcome independentdependency within neighborhoods

Example

17

ID f1 f2 Γ1 classA 1 1 greenB 1 1 greenC 1 3 greenD 3 2 redE 3 2 redF 3 2 redG 1 1 greenH 3 1 greenI 1 3 greenJ 3 2 redK 1 2 redL 3 2 redM 1 1 greenN 1 1 greenO 1 3 greenP 3 2 redQ 3 2 redR 3 2 red

Proposed Approach: Illustrative ExampleTraditional decision tree Spatial decision tree

Inputs: table of records Inputs: • feature maps, class map• Rook neighborhoodID f1 f2 Γ1 class

A 1 1 greenB 1 1 greenC 1 3 greenG 1 1 greenI 1 3 greenK 1 2 redM 1 1 greenN 1 1 greenO 1 3 greenD 3 2 redE 3 2 redF 3 2 redH 3 1 greenJ 3 2 redL 3 2 redP 3 2 redQ 3 2 redR 3 2 red

1 1 1 3 3 31 3 1 3 1 31 1 1 3 3 3

1 1 3 2 2 21 1 3 2 2 21 1 3 2 2 2

Feature f1

Feature f2

Class map

1 .3 .3 .3 .3 1.3 -1 0 0 -1 .31 .3 .3 .3 .3 1

Focal function Γ1

I(f1 ≤ 1)

green red+ -

C

G

A

B

I

M D

E

L

P

F

J

Q

R

N

H

I(f1 ≤ 1) * Γ1

green red+ -


Predicted mapA B C D E FG H I J K LM N O P Q R

Predicted map

ID f1 f2 Γ1 classA 1 1 1 greenB 1 1 0.3 greenC 1 3 0.3 greenG 1 1 0.3 greenI 1 3 0 greenK 1 2 -1 redM 1 1 1 greenN 1 1 0.3 greenO 1 3 0.3 greenD 3 2 0.3 redE 3 2 0.3 redF 3 2 1 redH 3 1 -1 greenJ 3 2 0 redL 3 2 0.3 redP 3 2 0.3 redQ 3 2 0.3 redR 3 2 1 red

O

K

C

G

A

B

I

M D

E

L

P

F

J

Q

R

N

K

O

H

1 1 1 3 3 31 3 1 3 1 31 1 1 3 3 3

18

Proposed Approach: Adaptive Spatial Neighborhoods• Why adaptive spatial neighborhoods?

– rich texture in high resolution feature images– fixed neighborhoods cause over-smoothing effect

• How to form adaptive spatial neighborhood?1. create indicator map of local test results I(f ≤ δ)

2. draw a square window of size Smax by Smax

3. segment window into connected components of same indicator

4. outmost connected component within which current pixel lies

1 1 1 -1 1 1 11 1 1 -1 -1 1 11 1 -1 -1 -1 1 11 1 -1 1 -1 1 11 1 -1 -1 -1 1 11 1 -1 -1 -1 1 11 -1 -1 -1 -1 -1 1

1 1 1 -1 1 1 11 1 1 -1 -1 1 11 1 -1 -1 -1 1 11 1 -1 1 -1 1 11 1 -1 -1 -1 1 11 1 -1 -1 -1 1 11 -1 -1 -1 -1 -1 1

True class map Local test (numbers), fixed neighborhood (blue)

Local test (numbers),adaptive neighborhood(blue)

+ + + + + + ++ + + + + + ++ + + + + + ++ + + + + + ++ + - + - + ++ + - + - + ++ - - + - - +Focal test results and predictions

+ + + - + + ++ + + - - + ++ + - - - + ++ + - - - + ++ + - - - + ++ + - - - + ++ - - - - - +

Focal test results and predictions

1 1 -1 -1 -1 1 11 -1 -1 1 -1 -1 1-1 -1 1 1 1 -1 -1-1 1 1 1 1 1 -1-1 -1 1 1 1 -1 -11 -1 -1 1 -1 -1 11 1 -1 -1 -1 1 1

+ + - - - + ++ - - - - - +- - - - - - -- - - - - - -- - - - - - -+ - - - - - ++ + - - - + +

+ + + + + + ++ + + + + + ++ + - - - + ++ + - + - + ++ + - - - + ++ + + + + + ++ + + + + + +

1 1 -1 -1 -1 1 11 -1 -1 1 -1 -1 1-1 -1 1 1 1 -1 -1-1 1 1 1 1 1 -1-1 -1 1 1 1 -1 -11 -1 -1 1 -1 -1 11 1 -1 -1 -1 1 1

True class map Local test (numbers), fixed neighborhood (blue)

Local test (numbers),adaptive neighborhood(blue)



19

Proposed Approach: Computational Refinement

• Computational bottleneck: number of focal computation– quadratic to number of samples– linear to number of distinct feature values– linear to number of features

• Bottleneck test:– focal cost increases superlinearly– focal cost dominates

Variable Example

# of samples: N 104 106

Min tree node size: No 50 50

# of features: F 10 10

# of feature values: Nd 256 256

# of focal computation:around N2 Nd F / N0

5*107 5*1011

20

• Key idea– sort all candidate test thresholds (feature values) in order– focal function values mostly same across two thresholds– Cross-threshold-reuse and incremental update

• Illustrative example

Proposed Approach: Computational Refinement

1 9 9 92 9 9 93 8 7 63 4 5 5

1 -1 -1 -1-1 -1 -1 -1-1 -1 -1 -1-1 -1 -1 -1

-1 0.6 1 10.60.8 1 11 1 1 11 1 1 1

1 -1 -1 -11 -1 -1 -1-1 -1 -1 -1-1 -1 -1 -1

-.3 0.2 1 1-.6 0.5 1 10.60.8 1 11 1 1 1

(a) feature values (b) indicators, focal values for δ=1 (c) indicators, focal values for δ=2

1 -1 -1 -11 -1 -1 -11 -1 -1 -1-1 -1 -1 -1

-.3 0.2 1 1-.2 .25 1 1-.6 0.5 1 10.30.6 1 1

1 -1 -1 -11 -1 -1 -11 -1 -1 -11 -1 -1 -1

-.3 0.2 1 1-.2 .25 1 1-.2 .25 1 1-.3 0.2 1 1

(d) indicators, focal values for δ=3(intermediate)

(e) indicators, focal values for δ=3(final)

Candidate threshold δ: {1, 2, 3, 4, 5, 6, 7, 8}

Queen neighborhood.

21

Proposed Approach: Theoretical Analysis• Theorem:

– The incremental update algorithm is correct.

• Computational cost models:– Baseline algorithm Incremental update algorithm

Symbol Explanation

F number of features

N number of samples

Nd number of distinct feature values

Smax maximum neighborhood size

N0 minimum tree node size

22

Outline


23

Evaluation: Experiment Design• Goals:

– Classification performance comparison• Spatial decision tree (SDT) versus decision tree• classification accuracy• salt-and-pepper noise level

– Computational performance comparison• SDT baseline v.s. refined algorithms

• Dataset:– Classes: wetland, dry land; – Features: high resolution (3m*3m) aerial

photos (RGB, NIR, NDVI) in 2003, 2005, 2008– Training set: systematic cluster sampling; Test

set: remaining pixels on the scene; – Max neighborhood size: 11 pixels by 11 pixels

Chanhassen, MN

24

Classification Performance Evaluation

Model Confusion Matrix Prec. Recall F-score Autocorrelation

DT 99,141 10,688 0.81 0.75 0.78 0.87

15,346 45,805

FTSDT-Fixed neighborhood

99,755 10,074 0.83 0.80 0.81 0.96

12,470 48,681

FTSDT-adaptive-

neighborhood

99,390 10,439 0.83 0.83 0.83 0.93

10,618 50,533

Note: red true negative, green true positive, blue false negative, black false positive

Model Khat Khat Variance

DT 0.66 3.6*10-6

FTSDT-Fixed 0.71 3.2*10-6

FTSDT-Adaptive 0.73 3.0*10-6

Model Z-Score Significant

DT v.s. FTSDT-Fixed

18.2 yes

FTSDT-Fixed v.s. FTSDT-Adapt

8.6 yes

decision tree spatial decision tree (adaptive)

Trends:1. DT: salt-and-pepper noise2. SDT improve accuracy, salt-and-pepper noise levels3. SDT with adaptive neighborhoods best accuracy

Classification Performance Evaluation

spatial decision tree (fixed)

true wetland

true dry land

false dry land

false wetland

26

Computational Performance Evaluation: Effect on Number of Training Samples

Analysis of result:• Refined algorithm much faster

than baseline algorithm• Time costs of both algorithms

increase with training set size, but rate of increase much smaller for refined algorithm

Fixed variables: 12 features, 256 distinct feature values, max neighborhood 11*11, min node size 50

27

Computational Performance Evaluation: Effect on Minimum Tree Node Size

Analysis of result:• Refined algorithm much faster

than baseline algorithm• Time costs of both algorithms

decrease with min tree node size

Fixed variables: 12 features, 256 distinct feature values, max neighborhood 11*11, 7000 samples

28

Computational Performance Evaluation: Effect on Maximum Neighborhood Size

Analysis of result:• Refined algorithm much faster than

baseline algorithm• Time costs of both algorithms

increase with max neighborhood size

Fixed variables: 12 features, 256 distinct feature values, min tree node size 50, 7000 training samples

29

Computational Performance Evaluation: Effect on Number of Distinct Feature Values

Analysis of result:• Refined algorithm much faster than

baseline algorithm• Time cost of baseline algorithm

increase almost linearly with # of distinct feature values

• Time cost of refined algorithm unchanged with # of distinct feature values

Fixed variables: 12 features, max neighborhood size 11*11, min tree node size 50, 7000 samples

30

Conclusion• Proposed a novel spatial decision tree (SDT)

– focal feature test– adaptive spatial neighborhoods

• Computational optimization on training algorithm– theoretical analysis on correctness– computational cost model

• Evaluation on real world datasets– SDT reduces salt-and-pepper noise, improves accuracy– computational refinement reduces training time costs

Decision tree Spatial decision tree

Assumption on sample distribution

identical & independent distribution

spatial autocorrelation, anisotropy

Candidate node test local feature test focal feature test

Candidate test selection information gain spatial information gain

Computational structure linear scanning incremental map update

31

Future Work

• Address the challenge of spatial heterogeneity– spatial ensemble of (spatial) decision trees– geographical partitioning and local models

• Spatiotemporal decision trees– temporal autocorrelation– temporal non-stationarity

• Novel applications– precision agriculture

learning spatial decision trees for geographical classification student: zhe jiang advisor: prof....

Documents

zhe jiangadvisor

joseph knight

spatial decision tree

mining urban data

data mining icdm

ieee international conference

spatiotemporal data

ieee icdm