1 introduction to spatial databases donghui zhang ccis northeastern university

Post on 02-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Introduction to Spatial Databases

Donghui Zhang

CCIS

Northeastern University

2

What is spatial database?• A database system that is optimized to store and query

spatial objects: – Point: a hotel, a car– Line: a road segment– Polygon: landmarks, layout of VLSI

VLSI LayoutRoad Network Satellite Image

3

Are spatial databases useful?• Geographical Information Systems

– e.g. data: road network and places of interest.– e.g. usage: driving directions, emergency calls, standalone

applications. • Environmental Systems

– e.g. data: land cover, climate, rainfall, and forest fire.– e.g. usage: find total rainfall precipitation.

• Corporate Decision-Support Systems – e.g. data: store locations and customer locations.– e.g. usage: determine the optimal location for a new store.

• Battlefield Soldier Monitoring Systems– e.g. data: locations of soldiers (w/wo medical equipments).– e.g. usage: monitor soldiers that may need help from each

one with medical equipment.

4

MapQuest.com

Shortest-Path Query Fastest-Path Query

5

• Driving directions as you go.• Find nearest Wal-Mart or hospital. NN Query

6ArcGIS 9.2, ESRI

Range query

7

Are spatial databases useful?• Geographical Information Systems

– e.g. data: road network and places of interest.– e.g. usage: driving directions, emergency calls, standalone

applications. • Environmental Systems

– e.g. data: land cover, climate, rainfall, and forest fire.– e.g. usage: find total rainfall precipitation.

• Corporate Decision-Support Systems – e.g. data: store locations and customer locations.– e.g. usage: determine the optimal location for a new store.

• Battlefield Soldier Monitoring Systems– e.g. data: locations of soldiers (w/wo medical equipments).– e.g. usage: monitor soldiers that may need help from each

one with medical equipment.

8

Aggregation query

9

Are spatial databases useful?• Geographical Information Systems

– e.g. data: road network and places of interest.– e.g. usage: driving directions, emergency calls, standalone

applications. • Environmental Systems

– e.g. data: land cover, climate, rainfall, and forest fire.– e.g. usage: find total rainfall precipitation.

• Corporate Decision-Support Systems – e.g. data: store locations and customer locations.– e.g. usage: determine the optimal location for a new store.

• Battlefield Soldier Monitoring Systems– e.g. data: locations of soldiers (w/wo medical equipments).– e.g. usage: monitor soldiers that may need help from each

one with medical equipment.

11

Are spatial databases useful?• Geographical Information Systems

– e.g. data: road network and places of interest.– e.g. usage: driving directions, emergency calls, standalone

applications. • Environmental Systems

– e.g. data: land cover, climate, rainfall, and forest fire.– e.g. usage: find total rainfall precipitation.

• Corporate Decision-Support Systems – e.g. data: store locations and customer locations.– e.g. usage: determine the optimal location for a new store.

• Battlefield Soldier Monitoring Systems– e.g. data: locations of soldiers (w/wo medical equipments).– e.g. usage: monitor soldiers that may need help from each

one with medical equipment.

13

Bob John

George

Bill

Mike

RNN query

Who will seek help from me?

RNN(Bob) = {John, Mike}

14

And beyond the “space” …• 2004 NBA dataset*: each player has 17 attributes• “Spatial Data”: an object is a point in a 17-dimensional space• Who are the best players?

– i.e. not “dominated” by any other player.

Name Points Rebounds Assists Steals ……

Tracy McGrady 2003 484 448 135 ……

Kobe Bryant 1819 392 398 86 ……

Shaquille O'Neal 1669 760 200 36 ……

Yao Ming 1465 669 61 34 ……

Dwyane Wade 1854 397 520 121 ……

Steve Nash 1165 249 861 74 ……

…… …… …… …… …… ……

* www.databaseBasketball.com

Skyline query

15

And beyond the “space” …• 2004 NBA dataset*: each player has 17 attributes• “Spatial Data”: an object is a point in a 17-dimensional space• Who are the best players?

– i.e. not “dominated” by any other player.

Name Points Rebounds Assists Steals ……

Tracy McGrady 2003 484 448 135 ……

Kobe Bryant 1819 392 398 86 ……

Shaquille O'Neal 1669 760 200 36 ……

Yao Ming 1465 669 61 34 ……

Dwyane Wade 1854 397 520 121 ……

Steve Nash 1165 249 861 74 ……

…… …… …… …… …… ……

* www.databaseBasketball.com

Skyline query

16

And beyond the “space” …• 2004 NBA dataset*: each player has 17 attributes• “Spatial Data”: an object is a point in a 17-dimensional space• Who are the best players?

– i.e. not “dominated” by any other player.

Name Points Rebounds Assists Steals ……

Tracy McGrady 2003 484 448 135 ……

Kobe Bryant 1819 392 398 86 ……

Shaquille O'Neal 1669 760 200 36 ……

Yao Ming 1465 669 61 34 ……

Dwyane Wade 1854 397 520 121 ……

Steve Nash 1165 249 861 74 ……

…… …… …… …… …… ……

* www.databaseBasketball.com

Skyline query

17

And beyond the “space” …• 2004 NBA dataset*: each player has 17 attributes• “Spatial Data”: an object is a point in a 17-dimensional space• Who are the best players?

– i.e. not “dominated” by any other player.

Name Points Rebounds Assists Steals ……

Tracy McGrady 2003 484 448 135 ……

Kobe Bryant 1819 392 398 86 ……

Shaquille O'Neal 1669 760 200 36 ……

Yao Ming 1465 669 61 34 ……

Dwyane Wade 1854 397 520 121 ……

Steve Nash 1165 249 861 74 ……

…… …… …… …… …… ……

* www.databaseBasketball.com

Skyline query

18

And beyond the “space” …• 2004 NBA dataset*: each player has 17 attributes• “Spatial Data”: an object is a point in a 17-dimensional space• Who are the best players?

– i.e. not “dominated” by any other player.

Name Points Rebounds Assists Steals ……

Tracy McGrady 2003 484 448 135 ……

Kobe Bryant 1819 392 398 86 ……

Shaquille O'Neal 1669 760 200 36 ……

Yao Ming 1465 669 61 34 ……

Dwyane Wade 1854 397 520 121 ……

Steve Nash 1165 249 861 74 ……

…… …… …… …… …… ……

* www.databaseBasketball.com

Skyline query

19

And beyond the “space” …• 2004 NBA dataset*: each player has 17 attributes• “Spatial Data”: an object is a point in a 17-dimensional space• Who are the best players?

– i.e. not “dominated” by any other player.

Name Points Rebounds Assists Steals ……

Tracy McGrady 2003 484 448 135 ……

Kobe Bryant 1819 392 398 86 ……

Shaquille O'Neal 1669 760 200 36 ……

Yao Ming 1465 669 61 34 ……

Dwyane Wade 1854 397 520 121 ……

Steve Nash 1165 249 861 74 ……

…… …… …… …… …… ……

* www.databaseBasketball.com

Skyline query

20

Research goals in spatial databases

• Support spatial database queries efficiently!– range query, aggregation query, NN query, RNN query,

optimal-location query, fastest-path query, skyline query, …

• Which statement is the best in a large spatial database?(a) Both an O(n2) algorithm and an O(n) algorithm are efficient.

(b) An O(n2) algorithm is not efficient, but an O(n) algorithm is.

(c) Neither an O(n2) algorithm nor an O(n) algorithm is efficient.

Answer: (c)! Even a linear algorithm is not efficient!

21

Research goals in spatial databases

• Example of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world.

• Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated).

• Sample scenario:– Disk page size: 8KB.– Database size: 1GB = 131,072 disk page.– Let each disk I/O be 10-3 second.

• O(n): 131 seconds 2 minutes. (Not efficient!)• O(n2): 200 days! (Out of the question!)

22

How can you do better than O(n)?

• Answer: use (disk-based) index structures!

• However, 1-dim index structures, e.g. the B+-tree, are not efficient.

• E.g. to search for hotels in Boston…

23

A 1-dim index is not good enough

Suppose a B+-tree exists on X.

24Suppose a B+-tree exists on X.

A 1-dim index is not good enough

25

Content

• The R-tree– Range Query– Aggregation Query

• NN Query

• Skyline Query

• Highlights of Our Research

26

R-Tree Motivation

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

d

e f

g h

i j

k

l

m

Range query: find the objects in a given range.E.g. find all hotels in Boston.

No index: scan through all objects. NOT EFFICIENT!

27

R-Tree: Clustering by Proximity

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

d

e f

g h

i j

k

l

m

l m

E7

i j k

E6

E6 E7

Minimum Bounding Rectangle (MBR)

28

R-Tree

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

E4

E5

E6

E7

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

29

R-Tree

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

30

Range Query

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

31

Range Query

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

32

Aggregation Query

• Given a range, find some aggregate value of objects in this range.

• COUNT, SUM, AVG, MIN, MAX• E.g. find the total number of hotels in

Massachusetts.

• Straightforward approach: reduce to a range query.

• Better approach: along with each index entry, store aggregate of the sub-tree.

33

Aggregation Query

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E :81 E :52

E :33 E :24 E :35

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E :36 E :27

34

Aggregation Query

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E :81 E :52

E :33 E :24 E :35

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E :36 E :27

Subtree pruned!

35

Content

• The R-tree– Range Query– Aggregation Query

• NN Query

• Skyline Query

• Highlights of Our Research

36

Nearest Neighbor (NN) Query

• Given a query location q, find the nearest object.

• E.g.: given a hotel, find its nearest bar.

q

a

37

• Minimum distance between q and an MBR.

• It is an lower bound of d(o, q) for every object o in E1.

A Useful Metric: MINDIST

E1 q

MINDIST(q, E1)

38

NN Basic Algorithm

• Keep a heap H of index entries and objects, ordered by MINDIST.

• Initially, H contains the root.

• While H – Extract the element with minimum MINDIST– If it is an index entry, insert its children into H.– If it is an object, return it as NN.

• End whileE1 q

39

NN Query Example

E 11 E 22Visit Root

Action Heap

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

queryE4

E5

E1

E2

E6

E7

1 2

5 9 5 2 13

a b cd e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

2 10 13

40

NN Query Example

E 11 E 22Visit Root

follow E1 E 22

E 53E 55

E 94

Action Heap

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

queryE4

E5

E1

E2

E6

E7

1 2

5 9 5 2 13

a b cd e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

2 10 13

41

NN Query Example

E 11 E 22Visit Root

follow E1 E 22

E 53E 55

E 94

Action Heap

follow E2 E 26

E 53E 55

E 94 E 137

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

queryE4

E5

E1

E2

E6

E7

1 2

5 9 5 2 13

a b cd e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

2 10 13

42

NN Query Example

E 11 E 22Visit Root

follow E1 E 22

E 53E 55

E 94

Action Heap

follow E2 E 26

E 53E 55

E 94 E 137

follow E6 j 10 i 2 E 53

E 55 E 94E 137

k 13

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

queryE4

E5

E1

E2

E6

E7

1 2

5 9 5 2 13

a b cd e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

2 10 13

43

NN Query Example

E 11 E 22Visit Root

follow E1 E 22

E 53E 55

E 94

Action Heap

follow E2 E 26

E 53E 55

E 94 E 137

follow E6

Report i and terminate

j 10 i 2 E 53E 55 E 94

E 137 k 13

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

queryE4

E5

E1

E2

E6

E7

1 2

5 9 5 2 13

a b cd e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

2 10 13

44

Content

• The R-tree– Range Query– Aggregation Query

• NN Query

• Skyline Query

• Highlights of Our Research

45

Skyline of Manhattan

• Which buildings can we see?– not dominated (further away and shorter)

46

• Which one is better?– i or h? (i, because its price and distance

dominate those of h)– i or k?

A skyline example: best hotels

x

yb

a

i k

h

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

n

price

distance to the beach

47

• The skyline: a, i, k.

x

yb

a

i k

h

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

n

price

distance

A skyline example: best hotels

48

Branched and Bound Skyline (BBS)

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

• Assume all points are indexed in an R-tree.

• mindist(MBR) = the L1 distance between its lower-left corner and the origin.

49

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

action heap contents S access root <e7,4><e6,6>

• Each heap entry keeps the mindist of the MBR.

Branched and Bound Skyline (BBS)

50

Example of BBS

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

action heap contents S access root <e7,4><e6,6> expand e7 <e3,5><e6,6><e5,8><e4,10>

• Process entries in ascending order of their mindists.

51

Example of BBS

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

action heap contents S access root <e7,4><e6,6> expand e7 <e3,5><e6,6><e5,8><e4,10> expand e3 <i,5><e6,6><e5,8> <e4,10> {i}

52

Example of BBS

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

action heap contents S access root <e7,4><e6,6> expand e7 <e3,5><e6,6><e5,8><e4,10> expand e3 <i,5><e6,6><e5,8> <e4,10> {i} expand e6 <e5,8><e1,9><e4,10> {i}

53

Example of BBS

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

nN5m

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

{i} remove e5 <e1,9><e4,10>

action heap contents S access root <e7,4><e6,6> expand e7 <e3,5><e6,6><e5,8><e4,10> expand e3 <i,5><e6,6><e5,8> <e4,10> {i} expand e6 <e5,8><e1,9><e4,10> {i}

54

Example of BBS

x

yb

a

i k

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

{i} remove e5 <e1,9><e4,10>expand 1e <a,10><e4,10> {i,a}

action heap contents S access root <e7,4><e6,6> expand e7 <e3,5><e6,6><e5,8><e4,10> expand e3 <i,5><e6,6><e5,8> <e4,10> {i} expand e6 <e5,8><e1,9><e4,10> {i}

55

Example of BBS

kx

yb

a

i

N2N1

N3

N4

h

N6

N7

g

d

f

ec

l

o1 2 3 4 5 6 7 8 9 10

12

3

4

5

6

7

8

9

10

m

nN5

a b c d e f g h i l k

e1 e2 e3 e4

e6 e7

N1N2

N6

N3 N4

N7

R

m n

N5

e5

{i} remove e5 <e1,9><e4,10>expand 1e <a,10><e4,10> {i,a}expand e4 {i,a,k}

action heap contents S access root <e7,4><e6,6> expand e7 <e3,5><e6,6><e5,8><e4,10> expand e3 <i,5><e6,6><e5,8> <e4,10> {i} expand e6 <e5,8><e1,9><e4,10> {i}

<k,10>

56

Content

• The R-tree– Range Query– Aggregation Query

• NN Query

• Skyline Query

• Highlights of Our Research

57

The Compressed Skycube [SIGMOD’06]

• Goal: support skyline queries for an arbitrary subset of dimensions.

• Pre-computing all skylines:– too much space

– expensive update

• The Compressed Skycube is a very compact representation of all skylines, with efficient query and update support.

58

The Optimal-Location Query [SSTD’05, VLDB’06]

• The optimal location, of a potential new store, can be defined as a location which– maximizes the number of customers who will

be “attracted”, or – maximizes the combined saving for the

customers in their traveling distance to the nearest store.

• There seem to have infinite number of candidate locations to check.

• Efficient algorithms to find exact answers.

59

Continuous RNN Monitoring [ICDE’06, ICDE’07]

• In a battlefield, the RNNs of a soldier with medical equipment are the soldiers that may need to receive help from him.

• To continuously monitor the RNNs in real time while all objects are moving is challenging.

• We proposed solution to the monochromatic case.

• Cooperated with Univ. of Minnesota to solve the bichromatic case.

60

Fastest-path computation[ICDE’06]

• MapQuest provides driving directions without asking leaving time.

• During rush hour, the best route should be different.

• Suppose each road segment has a speed pattern.• We provide solutions for finding the fastest path,

with a leaving time INTERVAL.• “I may leave for work some time between 7 and 9.

Suggest all fastest paths, e.g. if leaving during [7:43, 8:06], take route A, otherwise take route B”.

61

Summary

• Spatial database has many practical applications.

• Spatial database research aims to design efficient algorithms for various queries.

• The talk mentioned a few (range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, and skyline query).

• There are much more -- an on-going research field.

top related