the planar point location problem

71
Michal Balas 1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University (2003)

Upload: jock

Post on 17-Jan-2016

64 views

Category:

Documents


4 download

DESCRIPTION

I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University (2003). The Planar Point Location Problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Planar Point Location Problem

Michal Balas 1

I/O-efficient Point Location using

Persistent B-TreesLars Arge, Andrew Danner, and Sha-Mayn Teh

Department of Computer Science, Duke University (2003)

Page 2: The Planar Point Location Problem

Michal Balas 2

The Planar Point Location Problem

Storing a planar subdivision defined by N line segments such that the region containing a query point p can be computed efficiently

Page 3: The Planar Point Location Problem

Michal Balas 3

Geographic Information systems (GIS) Spatial Databases Graphics

Usually the datasets are larger than the size of physical memory and must reside on disk

Planar Point Location Applications

Page 4: The Planar Point Location Problem

Michal Balas 4

Previous Works

So far, few theoretically I/O efficient structures were developed, but all are relatively complicated and none of them was implemented

Vahrenhold and Hinrichs (2001) suggested a heuristic structure that is simple and efficient but theoretically non optimal

Page 5: The Planar Point Location Problem

Michal Balas 5

Goal

find a planar point location structure that minimizes the number of I/Os needed to answer a query, which is efficient both in theory and in practice.

Page 6: The Planar Point Location Problem

Michal Balas 6

Lecture’s Road Map

Motivation The Vertical Ray Shooting problem and the

need of persistent data structures Review:

B-trees, B+ trees, and I/O model Persistent B-trees

The modified Persistent B-tree Experimental results Open problems

Page 7: The Planar Point Location Problem

Michal Balas 7

Vertical Ray Shooting

A generalized version of the Planar Point Location problem

Given a set of N non-intersecting segments in the plane, construct a data structure such that the segment directly above a query point p can be found efficiently

We will consider this problem.

Page 8: The Planar Point Location Problem

Michal Balas 8

Example

Page 9: The Planar Point Location Problem

Michal Balas 9

Vertical Ray Shooting

Based on the persistent search tree idea of Sarnak and Tarjan (1986).

Any vertical line l in the plane introduces an “above-below” order on the segments it intersects.

We will “sweep” the plane from left to right with a vertical line

Our “critical” x-axis points are the endpoints of all segments

Page 10: The Planar Point Location Problem

Michal Balas 10

Vertical Ray Shooting & Persistent Search Trees

Sort critical points by x-values For each critical point pi=(xi,yi) we can build a

search tree for the segments intersecting a vertical line at xi according to the y-values (at xi)

Until the next critical point pi+1 the tree is static – it will change only in the next begin/end point of a segment

Page 11: The Planar Point Location Problem

Michal Balas 11

Vertical Ray Shooting & Persistent Search Trees

Worst case analysis: Hold a search tree to each critical point Space: O(n2)

Page 12: The Planar Point Location Problem

Michal Balas 12

Vertical Ray Shooting & Persistent Search Trees

We should use the fact that two consecutive trees (versions) differ only by one insertion or deletion (assuming distinct x-values for all endpoints).

Page 13: The Planar Point Location Problem

Michal Balas 13

Vertical Ray Shooting & Persistent Search Trees

Persistent data structure Preserves versions. In ordinary (ephemeral) data

structures there is only one last version (every update changes the data structure so its state before the update can no longer be accessed)

Each update creates a version The current version of the structure can be

modified and all versions of the structure, past and present, can be accessed.

Page 14: The Planar Point Location Problem

Michal Balas 14

Vertical Ray Shooting & Persistent Search Trees

We would like to save a version of the search tree for each critical point. Since we want to be space efficient, we will use persistent search tree.

A persistent search tree differs from an ordinary search tree in that after an insertion or deletion, the old version of the tree can still be accessed.

Here the persistent search tree should supports insertions and deletions in the present and queries in the past. (partially persistent)

Page 15: The Planar Point Location Problem

Michal Balas 15

Vertical Ray Shooting & Persistent Search Trees

We will insert a segment into the persistent search tree when its left endpoint is encountered

We will delete a segment persistently from the tree when its right endpoint is encountered.

Two consecutive versions of the tree differ only by a certain number of deletions and insertions (in the distinct x-values case by 1 only)

Page 16: The Planar Point Location Problem

Michal Balas 16

Vertical Ray Shooting & Persistent Search Trees

Given a query point p=(x,y) , we will search for the position of y in the version of the search tree when the sweep line was at x.

Page 17: The Planar Point Location Problem

Michal Balas 17

Vertical Ray Shooting & Persistent Search Trees

Path Copying: A balanced search tree When x is inserted the changes are only on the

path from the root to x Instead of copying the whole tree we will copy

only the updated path The roots will be ordered by version

x

Page 18: The Planar Point Location Problem

Michal Balas 18

Vertical Ray Shooting & Persistent Search Trees

Path Copying: Space: O(nlogn) – better, but not good enough

r1

x

r2

Page 19: The Planar Point Location Problem

Michal Balas 19

Vertical Ray Shooting & Persistent Search Trees

Extra Pointers : Instead of copying the path, we will save for each

node a few pointers ( a list of left children and right children, thought it’s a binary tree)

left rightt1 t2

Page 20: The Planar Point Location Problem

Michal Balas 20

Vertical Ray Shooting & Persistent Search Trees

Extra Pointers : Here there is no limitation on the # of pointers per

node In the worst case, it will take O(logn) time to find

the relevant version per node (the pointers are in a binary search tree) – which is not optimal

We need constant time per node

Page 21: The Planar Point Location Problem

Michal Balas 21

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution: Limited node copying, k extra pointers per node k should be a small positive number (k=1 will do) When a pointer is added to a node, if there is no

empty slot for a new pointer, we copy the node, setting the initial left and right pointers of the copy to their latest values.

Update the parent with the new copy, if the parent has no free slot the process is repeated.

Page 22: The Planar Point Location Problem

Michal Balas 22

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution - Space analysis Amortized analysis: we will see that every set of

m operations takes O(m) space. The potential of the structure is defined to be:

= # live nodes – (1/k)*(# free slots in the live nodes)

amortized space cost of update = (actual # of nodes it creates) –

Page 23: The Planar Point Location Problem

Michal Balas 23

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution - Space analysis We will show that amortized space cost of an update

is bounded by O(1) per update. If a new unused slot in node v is used, but the node is

still not full, then the actual # of new nodes created is 0, is (-1/k) (#free slots in live nodes decreased by 1), thus amortized space cost of this update is 1/k.

If node copying has occurred, the actual # of new nodes created is 1, is 1 (#free slots in live nodes increased by k), thus amortized space cost of this update is 0.

Page 24: The Planar Point Location Problem

Michal Balas 24

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution - Space analysis During an update, node copying continues in the path

from node to root until the root is copied or a node with a free slot is reached.

The amortized space cost of node copying is 0 and of occupying a free slot is 1/k

Page 25: The Planar Point Location Problem

Michal Balas 25

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution - Space analysis The total amortized space cost of an update is

constant (0 or 1/k) The space of rebalance information per node is

constant In red-black trees, rebalancing after deletion or

insertion can be done in O(1) rotations and O(1) color changes per update in the amortized case

Since an insertion or deletion requires O(1) new pointers not counting node copying, the amortized space cost of an update is O(1)

Page 26: The Planar Point Location Problem

Michal Balas 26

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution - Space analysis sum up over all updates:

amortized space cost over all updates

= cn = required space – (end – start)

start=0 (we start with an empty data structure)

end=O(n) (according to the potential function definition, this is an upper bound on the potential in the end)

Required space = cn + O(n) = O(n) (this is a bound on the number of nodes created)

Page 27: The Planar Point Location Problem

Michal Balas 27

Vertical Ray Shooting & Persistent Search Trees

Sarnak & Tarjan solution – Complexity O(log m) query time (m is the total # of updates) O(log n) update time (n is the current size of the

set) O(1) amortized space per update O(nlogn) preprocessing time

Page 28: The Planar Point Location Problem

Michal Balas 28

Where are we going?

The use of Persistent Data structures

(always preserves the previous version of itself when it is modified)

The use of B-trees in the I/O Model

(B-tree is the I/O model equivalent of a search tree)

I/O efficient Persistent B-tree(works great with totally ordered elements)

Modified I/O efficient Persistent B-tree(only elements present in the same version of the structure need to

be comparable)

Page 29: The Planar Point Location Problem

Michal Balas 29

Vertical Ray Shooting & Persistent Search Trees

Two segments that cannot be intersected with the same vertical line are not comparable ( “above-below”)

Corollary: Not all segments stored in the persistent structure over its lifespan are comparable

An I/O efficient structure cannot directly be obtained using a persistent B-tree (because standard persistent B-trees require total order on all elements)

Page 30: The Planar Point Location Problem

Michal Balas 30

Vertical Ray Shooting & Persistent Search Trees

To make the structure I/O-efficient, we need to modify the tree so it will only require elements present in the same version of the structure to be comparable

Page 31: The Planar Point Location Problem

Michal Balas 31

Lecture’s Road Map

Motivation The Vertical Ray Shooting problem and the

need of persistent data structures Review:

B-trees, B+ trees, and I/O model Persistent B-trees

The modified Persistent B-tree Experimental results Open problems

Page 32: The Planar Point Location Problem

Michal Balas 32

Review: The I/O Model

Infinite disk size M - Main Memory size B - Block size N - elements in the structure

DMBlock I/O

Page 33: The Planar Point Location Problem

Michal Balas 33

Review: The I/O Model - Cont

Computation can only occur on data stored in main memory.

We are interested in the number of I/Os used to answer a query.

The B-tree is the external memory equivalent of the balanced search tree in internal memory.

Page 34: The Planar Point Location Problem

Michal Balas 34

Review: B-tree

A balanced search tree All leaves are on the same level All internal nodes (except the root) have

between B/2 and B children ((B)) A node/leaf can be stored in O(1) blocks

Page 35: The Planar Point Location Problem

Michal Balas 35

Review: B-tree - Cont

Space complexity of the tree: O(N/B) blocks (where N is the number of elements) – linear

Tree height: O(logBN)

Insert/Delete can be done with O(logBN) I/Os

Page 36: The Planar Point Location Problem

Michal Balas 36

Review: B+-tree

It is a B-tree in which all elements are stored in the leaves.

The internal nodes contain “routing elements”.

Page 37: The Planar Point Location Problem

Michal Balas 37

B-tree Example (B+-tree)

3 5

6 74 51 2 3

d1 d2 d3 d4 d5 d6 d7

Page 38: The Planar Point Location Problem

Michal Balas 38

Where are we going?

The use of Persistent Data structures

(always preserves the previous version of itself when it is modified)

The use of B-trees in the I/O Model

(B-tree is the I/O model equivalent of a search tree)

I/O efficient Persistent B-tree(works great with totally ordered elements)

Modified I/O efficient Persistent B-tree(only elements present in the same version of the structure need to

be comparable)

Page 39: The Planar Point Location Problem

Michal Balas 39

Review: Persistent B-tree

Directed acyclic graph The elements are in the sinks (leaves) “routing elements” in internal nodes

Elements (and nodes) augmented with “existence interval” In this interval the element is “alive” An element is “alive” - between its insert and its

delete version

Page 40: The Planar Point Location Problem

Michal Balas 40

Review: Persistent B-tree - Cont

Nodes “alive” at time t form a (B,B) B-tree, We will work with

Additional invariant: A new node must contain between B and

()B alive elements ( For new node contains between

(3/8)B and (7/8)B alive elements We require that

Page 41: The Planar Point Location Problem

Michal Balas 41

Review: Persistent B-tree - Cont

In order to find the appropriate root at time t, the roots are stored in a standard B-tree Takes O(logBN) I/Os

A node/leaf contains O(B) elements = O(1) blocks

# Blocks needed to hold the structure: O(N/B)

Page 42: The Planar Point Location Problem

Michal Balas 42

Persistent B-tree Insert

x is the element to insert into the current version of the tree

Search the leaf l and insert x (O(logBN) I/Os) if l contains > B elements -> Block overflow

Version-Split (copy all k alive elements from l to a new node v and mark l as dead)

(a) If k is in [(3/8)B,(7/8)B] - simple(b) If k > (7/8)B – strong overflow(c) If k < (3/8)B – strong underflow

Strong overflow/underflow violates the additional invariant we defined earlier

Page 43: The Planar Point Location Problem

Michal Balas 43

Persistent B-tree Insert

a) If k is in [(3/8)B,(7/8)B] :

recursively update parent(l): persistently delete the reference to l and insert a reference to v

Page 44: The Planar Point Location Problem

Michal Balas 44

Persistent B-tree Insert - Cont

b) If k > (7/8)B – strong overflow split

create nodes v1, v2 each with k/2 elements.k/2 is in ((3/8)B,(7/8)B) (this is not tight)

Update parent(l) recursively: persistently delete the reference to l and insert two references to v1, v2

Page 45: The Planar Point Location Problem

Michal Balas 45

Persistent B-tree Insert - Cont

b) If k < (3/8)B – strong underflow Version-split of sibling l’ of l -> obtain k’ other alive elements

(k’ is in [B,B])k+k’ >= 2B, and , thus k+k’ > ()B (the invariant…)1) if k+k’ <= (merge - create a new leaf with k+k’ elements2) if k+k’ >(1-share – split to create two new leaves.

Update parent(l) recursively: persistently delete two references and insert one or two

Page 46: The Planar Point Location Problem

Michal Balas 46

Persistent B-tree Delete x is the element to delete from the current version of the tree Search the leaf l that contains and mark x as dead (O(logBN)

I/Os) if l contains < (1/4)B alive elements -> Block underflow (this is

also a strong underflow, since k < (3/8)B ) Version-Split on a sibling node to obtain k+k’ elements.

k+k’ >= 2B -1 , and , thus k+k’ > ()B (the invariant…)mark l dead and create a new node v with k+k’ elements (merge)if there is a strong overflow in v – share (as in insert)

Update parent(l) recursively: persistently delete two references and insert one or two references

Page 47: The Planar Point Location Problem

Michal Balas 47

Persistent B-tree – Rebalance Operations

Insert Delete

Block UnderflowBlock Overflow

Version-splitVersion-split

Strong Underflow

Merge

Done -1,+1 Strong Overflow

Split

Done -1,+2

Done -2,+1

Strong Overflow

Split

Done -2,+2

Done0,0

Page 48: The Planar Point Location Problem

Michal Balas 48

Persistent B-tree - Complexity

Updates: O(logBN) I/Os search and rebalance on one path from root to leaf

What about the required space?

Page 49: The Planar Point Location Problem

Michal Balas 49

Persistent B-tree - Complexity A few observations:

A rebalance operation on leaf creates <= 2 new nodes Once a leaf is created, at least B updates have to be performed on it before

another rebalance operation will occur. Two version-splits might only create one new leaf Each time a leaf is created or a leaf version-split performed, a corresponding

insertion or deletion is performed recursively one level up the tree. During N updates:

# leaves created <= 2N/B = O(N/B) # leaf version-splits<= 2N/B # nodes created one level up the tree <= 22N/(B)2

By induction: # nodes created i levels up the tree <= 2i+1N/(B)i+1

Total # nodes created <=

(it is also the # of blocks used after N updates)

Space: O(N/B) blocks

B

NNB

iN

i

B

2

log

0 B

2

B

2

Page 50: The Planar Point Location Problem

Michal Balas 50

Lecture’s Road Map

Motivation The Vertical Ray Shooting problem and the

need of persistent data structures Review:

B-trees, B+ trees, and I/O model Persistent B-trees

The modified Persistent B-tree Experimental results Open problems

Page 51: The Planar Point Location Problem

Michal Balas 51

Where are we going?

The use of Persistent Data structures

(always preserves the previous version of itself when it is modified)

The use of B-trees in the I/O Model

(B-tree is the I/O model equivalent of a search tree)

I/O efficient Persistent B-tree(works great with totally ordered elements)

Modified I/O efficient Persistent B-tree(only elements present in the same version of the structure need to

be comparable)

Page 52: The Planar Point Location Problem

Michal Balas 52

The modified Persistent B-tree

Why do we need to modify the standard Persistent B-tree?

Before, a few facts about standard B-tree: The elements are in the leaves Internal nodes contain “routing elements” When a node v is created a reference is added to

parent(v) – normally a copy of the maximal element in v is used as a routing element in parent(v)

Page 53: The Planar Point Location Problem

Michal Balas 53

The modified Persistent B-tree

The structure contains multiple live copies of the same element.

There may be copies of an element as routing elements long after the element is deleted

When searching for an element in the structure at version t we might be comparing to a copy of a dead element.

Page 54: The Planar Point Location Problem

Michal Balas 54

The modified Persistent B-tree

In this application (vertical ray shooting) not all elements (segments) stored in the data structure during its entire lifespan are above-below comparable

We cannot use the standard version of a persistent B-tree, since it requires all elements in the structure to be comparable.

Modification is needed!

Page 55: The Planar Point Location Problem

Michal Balas 55

The modified Persistent B-tree

We want the structure to only require elements present in the same version to be comparable

The modified structure: Alive elements in time t form a B-tree with elements in

all nodes - internal + leaves. (not just in leaves) # live copies of an element at any given time t <= 1

Page 56: The Planar Point Location Problem

Michal Balas 56

The modified Persistent B-tree

There will be some modification to the rebalance operations

The Insert algorithm remains The delete algorithm is slightly modified:

Page 57: The Planar Point Location Problem

Michal Balas 57

The modified Persistent B-tree

The modified delete algorithm:When deleting an element x which is in internal node u we need to be careful since x is associated with a reference to a child uc of u that is still alive

1. Find y : the predecessor of x in a leaf below u

2. Persistently delete y

3. Persistently delete x from u

4. Insert a live copy of y with a reference to the child uc

5. Perform the needed rebalance operations

Page 58: The Planar Point Location Problem

Michal Balas 58

The modified Persistent B-tree- rebalance operations Version-Split: copying all alive elements of u to a new node v

x

u

x

u v

We can use x as the element associated with the reference to the new node v , since the elements in v are a subsets of the elements in u

Page 59: The Planar Point Location Problem

Michal Balas 59

The modified Persistent B-tree- rebalance operations

Split: when a strong overflow occurs after a version-split of u, two new nodes v, v’ are created

we promote the maximal element y in v to be associated with the reference to v in parent(u) (instead of storing y in v).x will be associated with the reference to v’ in parent(u).

u

y

x y x

u v v’

v has one less element than it would have had using the regular split, but O(B) updates are still required on v before further structural changes are needed

Page 60: The Planar Point Location Problem

Michal Balas 60

The modified Persistent B-tree- rebalance operations Merge: when a strong underflow occurs after a version-split of u, a version-split of u’s sibling u’

is performed, and a new node v is created with the alive elements from u,u’

The maximal between x and y , say y, is used as the reference to the new node v. x is demoted and stored in the new node v

u u’

x y y

u u’

X

v

v has one more element than it would have had using the regular merge. But as in split, O(B) updates are still needed on v before further structural changes are needed

Page 61: The Planar Point Location Problem

Michal Balas 61

The modified Persistent B-tree- rebalance operations Share: when a merge would result in a new node with a strong overflow, instead a version-split

on the two sibling nodes u and u’ is performed, and two new nodes v, v’ are created.

The maximal element y can be reused as the reference to v’ but x cannot be used as a reference to v. x is demoted to v and the maximal element z in v is promoted to parent(u).

u u’

x y z

u u’

X

v

z

y

v’

# of elements in the new node v is identical to the # of elements we would have had using the regular share.

Page 62: The Planar Point Location Problem

Michal Balas 62

The modified Persistent B-tree- Complexity

Even though there is a difference in the number of elements, the previous space arguments still apply

Space: O(N/B) blocks Update on the newest version: O(logBN) I/Os

Page 63: The Planar Point Location Problem

Michal Balas 63

The modified Persistent B-tree- Summary

A set of N non-intersecting segments in the plane can be processed into a data structure of size O(N/B) in O(NlogBN) I/Os such that a vertical ray-shooting query can be answered in O(logBN) I/Os

Page 64: The Planar Point Location Problem

Michal Balas 64

The modified Persistent B-tree- Summary

N updates on a persistent B-tree (standard or modified) takes I/Os

Goodrich et al. showed how to construct a persistent B-tree structure (different from the basic one described earlier) in I/Os (the sorting bound)

The structure by Goodrich et al., requires that all elements in the structure over its lifespan are comparable

In the modified tree we cannot use that, since the elements are not totally ordered, so this construction complexity is not reached (so far)

B

N

B

NO BM /log

NNO Blog

Page 65: The Planar Point Location Problem

Michal Balas 65

Where have we come from?

The use of Persistent Data structures

(always preserves the previous version of itself when it is modified)

The use of B-trees in the I/O Model

(B-tree is the I/O model equivalent of a search tree)

I/O efficient Persistent B-tree(works great with totally ordered elements)

Modified I/O efficient Persistent B-tree(only elements present in the same version of the structure need to

be comparable)

Page 66: The Planar Point Location Problem

Michal Balas 66

Lecture’s Road Map

Motivation The Vertical Ray Shooting problem and the

need of persistent data structures Review:

B-trees, B+ trees, and I/O model Persistent B-trees

The modified Persistent B-tree Experimental results Open problems

Page 67: The Planar Point Location Problem

Michal Balas 67

Experimental Results

Compared the persistent B-tree and the grid structure of Vahrenhold and Hinrichs

Implemented both using TPIE library Used road data , containing all roads in the

US. Roads are broken at intersections The query points were randomly sampled

from the datasets Used also worst case artificially generated

dataset

Page 68: The Planar Point Location Problem

Michal Balas 68

Experimental Results

In terms of query efficiency: # I/Os per query, Time per query – both are much

lower in the persistent B-tree than in the Grid structure. In synthetically generated worst case dataset B-tree uses significantly fewer I/Os

size, Construction efficiency – grid construction algorithm outperforms the persistent B-tree on the real life datasets, though on the worst case dataset the persistent B-tree was significantly better.

Page 69: The Planar Point Location Problem

Michal Balas 69

Lecture’s Road Map

Motivation The Vertical Ray Shooting problem and the

need of persistent data structures Review:

B-trees, B+ trees, and I/O model Persistent B-trees

The modified Persistent B-tree Experimental results Open problems

Page 70: The Planar Point Location Problem

Michal Balas 70

Open Problems

One major open problem is to construct the structure in I/Os (here we saw a trivial algorithm that constructs in

)log(B

N

B

NO BM

)log( NNO B

Page 71: The Planar Point Location Problem

Michal Balas 71

Questions? ...