i/o-algorithms

39
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009

Upload: toni

Post on 09-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

I/O-Algorithms. Lars Arge Spring 2009 March 3, 2009. I/O-Model. Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem We often assume that M>B 2 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: I/O-Algorithms

I/O-Algorithms

Lars Arge

Spring 2009

March 3, 2009

Page 2: I/O-Algorithms

Lars Arge

I/O-algorithms

2

I/O-Model

• ParametersN = # elements in problem instanceB = # elements that fits in disk blockM = # elements that fits in main memory

T = # output size in searching problem

• We often assume that M>B2

• I/O: Movement of block between memory and disk

D

P

M

Block I/O

Page 3: I/O-Algorithms

Lars Arge

I/O-algorithms

3

Fundamental Bounds Internal External

• Scanning: N

• Sorting: N log N• Permuting

• Searching: NBlog

BN

BN

BMlog

BN

log,min BN

BN

BMNN

N2log

Page 4: I/O-Algorithms

Lars Arge

I/O-algorithms

4

Fundamental Data Structures

• B-trees: Node degree (B) queries in– Rebalancing using split/fuse updates in

• Weight-balanced B-tress: Weight rather than degree constraint Ω(w(v)) updates below v between rebalancing operations on v

• Persistent B-trees:– Update in current version in – Search in all previous versions in

• Buffer trees– Batching of operations to obtain bounds construction algorithms

)(log BT

B NO )(log NO B

)(log BT

B NO )(log NO B

)log( 1BN

BMBO)log( B

NBN

BMO

Page 5: I/O-Algorithms

Lars Arge

I/O-algorithms

5

Last time: Interval management• Maintain N intervals with unique endpoints dynamically such that

stabbing query with point x can be answered efficiently

• Static solution: Persistent B-tree– Linear space and query

• Dynamic solution: External interval tree– update

x

)(log BT

B NO

)(log NO B

Page 6: I/O-Algorithms

Lars Arge

I/O-algorithms

6

• Base tree on endpoints – “slab” Xv associated with each node v

• Interval stored in highest node v where it contains midpoint of Xv

• Intervals Iv associated with v stored in– Left slab list sorted by left endpoint (search tree)– Right slab list sorted by right endpoint (search tree) Linear space and O(log N) update

Internal Interval Tree

Page 7: I/O-Algorithms

Lars Arge

I/O-algorithms

7

• Query with x on left side of midpoint of Xroot

– Search left slab list left-right until finding non-stabbed interval– Recurse in left child

O(log N+T) query bound

x

Internal Interval Tree

Page 8: I/O-Algorithms

Lars Arge

I/O-algorithms

8

Externalizing Interval Tree

• Natural idea:– Block tree– Use B-tree for slab lists

• Number of stabbed intervals in large slab list may be small (or zero)– We can be forced to do I/O in each of O(log N) nodes

Page 9: I/O-Algorithms

Lars Arge

I/O-algorithms

9

Externalizing Interval Tree

• Idea:– Decrease fan-out to height remains – slabs define multislabs– Interval stored in two slab lists (as before) and one multislab list– Intervals in small multislab lists collected in underflow structure– Query answered in v by looking at 2 slab lists and not O(log N)

)( B )(log NO B

)( B )(B

)( B

multislab

Page 10: I/O-Algorithms

Lars Arge

I/O-algorithms

10

External Interval Tree• Linear space, query, update

• General solution techniques:– Filtering: Charge part of query cost to output– Bootstrapping:

* Use O(B2) size structure in each internal node* Constructed using persistence* Dynamic using global rebuilding

– Weight-balanced B-tree: Split/fuse in amortized O(1)

)(log BT

B NO )(log NO B

Page 11: I/O-Algorithms

Lars Arge

I/O-algorithms

11

Last time: Three-Sided Range Queries• Interval management: “1.5 dimensional” search

• More general 2d problem: Dynamic 3-sidede range searching– Maintain set of points in plane such

that given query (q1, q2, q3), all points

(x,y) with q1 x q2 and y q3 can

be found efficiently• Linear space, O(logB N+T/B) query static solution using persistence

– Dynamic: External priority search tree

(x,x)

(x1,x2)

x

x1 x2

q3

q2q1

Page 12: I/O-Algorithms

Lars Arge

I/O-algorithms

12

• Base tree on x-coordinates with nodes augmented with points• Heap on y-coordinates

– Decreasing y values on root-leaf path– (x,y) on path from root to leaf holding x– If v holds point then parent(v) holds point

Linear space and O(log N) update

Internal Priority Search Tree9

16.20

1619,9

1313,3

1920,3

45,6

59,4

11,2

201916139544,1

1

Page 13: I/O-Algorithms

Lars Arge

I/O-algorithms

13

Internal Priority Search Tree

• Query with (q1, q2, q3) starting at root v:– Report point in v if satisfying query– Visit both children of v if point reported– Always visit child(s) of v on path(s) to q1 and q2

O(log N+T) query

916.20

1619,9

1313,3

1920,3

45,6

59,4

11,2

201916139544,1

1

4

194

Page 14: I/O-Algorithms

Lars Arge

I/O-algorithms

14

• Natural idea: Block tree• Problem:

– I/Os to follow paths to to q1 and q2

– But O(T) I/Os may be used to visit other nodes (“overshooting”) query

Externalizing Priority Search Tree9

16.20

1619,9

1313,3

1920,3

45,6

59,4

11,2

201916139544,1

1

)(log NO B

)(log TNO B

Page 15: I/O-Algorithms

Lars Arge

I/O-algorithms

15

Externalizing Priority Search Tree

• Solution idea:– Store B points in each node

* O(B2) points stored in each supernode* B output points can pay for “overshooting”

– Bootstrapping:* Store O(B2) points in each supernode in static structure

916.20

1619,9

1313,3

1920,3

45,6

59,4

11,2

201916139544,1

1

Page 16: I/O-Algorithms

Lars Arge

I/O-algorithms

16

Summary/Conclusion: Priority Search Tree• We have now discussed structures for special cases of two-

dimensional range searching– Space: O(N/B)– Query: – Updates:

• Cannot be obtained for general (4-sided) 2d range searching:– query requires space– space requires query

q3

q2q1q

q

q3

q2q1

q4

)( logloglog

NN

BN

BB

BΩ)(log NO cB

)( BNO )( B

N

)(log NO B

)(log BT

B NO

Page 17: I/O-Algorithms

Lars Arge

I/O-algorithms

17

• Base tree: Weight balanced tree with branching parameter and leaf parameter B on x-coordinates height

• Points below each node stored in 4 linear space secondary structures:– “Right” priority search tree– “Left” priority search tree– B-tree on y-coordinates– Interval (priority search) tree

space

External Range Tree

)()(log logloglog

log NN

N BB

BB

ONO

)(log NΘ B

)( logloglog

NN

BN

BB

BO

)(log NB

Page 18: I/O-Algorithms

Lars Arge

I/O-algorithms

18

• Secondary interval tree:– Connect points in each slab in y-order– Project obtained segments in y-axis

– Intervals stored in interval tree* Interval augmented with pointer to corresponding points in y-

coordinate B-tree in corresponding child node

External Range Tree

)(log NB

Page 19: I/O-Algorithms

Lars Arge

I/O-algorithms

19

• Query with (q1, q2, q3 , q4) answered in top node with q1 and q2 in different slabs v1 and v2

• Points in slab v1

– Found with 3-sided query in v1

using right priority search tree• Points in slab v2

– Found with 3-sided query in v2

using left priority search tree• Points in slabs between v1 and v2

– Answer stabbing query with q3 using interval tree

first point above q3 in each of the slabs– Find points using y-coordinate B-tree in slabs

External Range Tree

)(log NO B

)(log NO B

)(log NB

v1 v2

Page 20: I/O-Algorithms

Lars Arge

I/O-algorithms

20

External Range Tree• Query analysis:

– I/Os to find relevant node– I/Os to answer two 3-sided queries– I/Os to query interval tree– I/Os to traverse B-trees

I/Os

)(log NO B)(log B

TB NO

)(log)(log log NONO BBN

BB )(log B

TB NO )(log NO B

)(log BT

B NO )(log NB

v1 v2

Page 21: I/O-Algorithms

Lars Arge

I/O-algorithms

21

External Range Tree• Insert:

– Insert x-coordinate in weight-balanced B-tree* Split of v can be performed in I/Os

I/Os– Update secondary structures in all nodes on one

root-leaf path* Update priority search trees* Update interval tree* Update B-tree

I/Os• Delete:

– Similar and using global rebuilding

)( logloglog

NNBB

BO)( loglog

log2

NNBB

BO))(log)(( vwvwO B

)( logloglog2

NNBB

BO

)(log NB

v1 v2

Page 22: I/O-Algorithms

Lars Arge

I/O-algorithms

22

Summary: External Range Tree• 2d range searching in space

– I/O query– I/O update

• Optimal among query structures

)( logloglog2

NNBB

BO)(log B

TB NO

)( logloglog

NN

BN

BB

BO

)(log BT

B NO

q3

q2q1

q4

Page 23: I/O-Algorithms

Lars Arge

I/O-algorithms

23

kdB-tree

• kd-tree:– Recursive subdivision of point-set into two half using

vertical/horizontal line– Horizontal line on even levels, vertical on uneven levels– One point in each leaf

Linear space and logarithmic height

Page 24: I/O-Algorithms

Lars Arge

I/O-algorithms

24

kd-Tree: Query

• Query– Recursively visit nodes corresponding to regions intersecting query– Report point in trees/nodes completely contained in query

• Query analysis– Horizontal line intersect Q(N) = 2+2Q(N/4) = regions– Query covers T regions I/Os worst-case

)( NO

)( TNO

Page 25: I/O-Algorithms

Lars Arge

I/O-algorithms

25

kdB-tree

• kdB-tree:– Stop subdivision when leaf contains between B/2 and B points– BFS-blocking of internal nodes

• Query as before– Analysis as before but each region now contains Θ(B) points I/O query)( B

TB

NO

Page 26: I/O-Algorithms

Lars Arge

I/O-algorithms

26

Construction of kdB-tree

• Simple algorithm– Find median of y-coordinates (construct root)– Distribute point based on median– Recursively build subtrees– Construct BFS-blocking top-down

• Idea in improved algorithm– Construct levels at a time using O(N/B) I/Os

)log( 2 BN

BNO

)log( BN

BN

BMO

BMlog

Page 27: I/O-Algorithms

Lars Arge

I/O-algorithms

27

Construction of kdB-tree• Sort N points by x- and by y-coordinates using I/Os• Building levels ( nodes) in O(N/B) I/Os:

1. Construct by grid with points in each slab2. Count number of points in each grid cell and store in memory3. Find slab s with median x-coordinate4. Scan slab s to find median x-coordinate and construct node5. Split slab containing median x-coordinate and update counts6. Recurse on each side of median x-coordinate using grid (step 3)

Grid grows to during algorithm Each node constructed in I/Os

BMlog

)log( BN

BN

BMO

BM

BM

BM

N

BM

)( BM

BM

BM

BM

))/(( BNO BM

Page 28: I/O-Algorithms

Lars Arge

I/O-algorithms

28

kdB-tree

• kdB-tree:– Linear space– Query in I/Os – Construction in I/Os– Point search in I/Os

• Dynamic?– Deletions relatively easily in I/Os (partial rebuilding)

)( BT

BNO

)(log NO B

)log( / BN

BMBNO

)(log2 NO B

Page 29: I/O-Algorithms

Lars Arge

I/O-algorithms

29

kdB-tree Insertion using Logarithmic Method• Partition pointset S into subsets S0, S1, … Slog N, |Si| = 2i or |Si| = 0

• Build kdB-tree Di on Si

• Query: Query each Di

• Insert: Find first empty Di and construct Di out of

elements in S0,S1, … Si-1

– I/Os per moved point– Point moved O(log N) times

I/Os amortized

..................................

0 2222 1 2 log N

iij

j 221 10

)()(log

02

BT

BN

BTN

i B OO ii

)log( 2/

2BBMB

iiO )log( /1

BN

BMBO

)(log)loglog( 2/

1 NONO BBN

BMB

Page 30: I/O-Algorithms

Lars Arge

I/O-algorithms

30

kdB-tree Insertion and Deletion• Insert: Use logarithmic method ignoring deletes• Delete: Simply delete point p from relevant Di

– i can be calculated based on # insertions since p was inserted– # insertions calculated by storing insertion number of each point

in separate B-tree extra update cost

• To maintain O(log N) structures Di

– Perform global rebuild after every Θ(N) updates extra update cost

..................................

0 2222 1 2 log N

)(log NO B

)(log)log( /1 NOO BB

NBMB

Page 31: I/O-Algorithms

Lars Arge

I/O-algorithms

31

Summary: kdB-tree

• 2d range searching in O(N/B) space– Query in I/Os – Construction in I/Os– Updates in I/Os

• Optimal query among linear space structures

)( BT

BNO

)log( / BN

BMBNO

)(log2 NO B

q3

q2q1

q4

Page 32: I/O-Algorithms

Lars Arge

I/O-algorithms

32

O-Tree Structure• O-tree:

– B-tree on vertical slabs– B-tree on horizontal slabs in each vertical slab– kdB-tree on points in each leaf

NBBN log

)log( NBBN

)log()( 2)log( 2 NB BN

NBB

N

NBBN 2log

)log( NBBN

NB B2log

Page 33: I/O-Algorithms

Lars Arge

I/O-algorithms

33

O-Tree Query• Perform rangesearch with q1 and q2 in vertical B-tree

– Query all kdB-trees in leaves of two horizontal B-trees with x-interval intersected but not spanned by query

– Perform rangesearch with q3 and q4 horizontal B-trees with x-interval spanned by query* Query all kdB-trees with range intersected by query

NBBN log

NBBN 2log

NB B2log

Page 34: I/O-Algorithms

Lars Arge

I/O-algorithms

34

O-Tree Query Analysis• Vertical B-tree query: • Query of all kdB-trees in leaves of two horizontal B-trees:

• Query horizontal B-trees:

• Query kdB-trees not completely in query

• Query in kdB-trees completelycontained in query:

I/Os

)())log((log BN

BBN

B ONO

)()()log()log( 2BT

BN

BT

BBBN OOBNBONO

)log( NO BBN

)())log((log)log( BN

BBN

BBBN ONONO

)()()log()log(2 2BT

BN

BT

BBBN OOBNBONO

)( BTO

)( BT

BNO

)log(2 NO BBN

Page 35: I/O-Algorithms

Lars Arge

I/O-algorithms

35

O-Tree Update• Insert:

– Search in vertical B-tree: I/Os– Search in horizontal B-tree: I/Os– Insert in kdB-tree: I/Os

• Use global rebuilding when structures grow too big/small– B-trees not contain elements– kdB-trees not contain elements

I/Os

• Deletes can be handledin I/Os similarly

)log( NBBN

)log( 2 NB B

)(log NO B)(log NO B

)(log))log((log 22 NONBO BBB

)(log NO B

)(log NO B

Page 36: I/O-Algorithms

Lars Arge

I/O-algorithms

36

Summary: O-Tree• 2d range searching in linear space

– I/O query– I/O update

• Optimal among structuresusing linear space

• Can be extended to work in d-dimensionswith optimal query bound

q3

q2q1

q4

)(log NO B

)( BT

BNO

))((11

BT

BN dO

Page 37: I/O-Algorithms

Lars Arge

I/O-algorithms

37

Summary/Conclusion: 3 and 4-sided Queries• 3-sided 2d range searching: External priority search tree

– query, space, update

• General (4-sided) 2d range searching:– External range tree: query, space,

update – O-tree: query, space, update

q3

q2q1

q3

q2q1

q4

)( logloglog

NN

BN

BB

BO)(log BT

B NO

)( BNO)( B

TB

NO

)(log NO B)(log BT

B NO

)(log NO B

)( logloglog2

NNBB

BO

)( BNO

Page 38: I/O-Algorithms

Lars Arge

I/O-algorithms

38

Summary/Conclusion: Tools and Techniques• Tools:

– B-trees– Persistent B-trees– Buffer trees– Logarithmic method– Weight-balanced B-trees– Global rebuilding

• Techniques:– Bootstrapping– Filtering

q3

q2q1

q3

q2q1

q4

(x,x)

Page 39: I/O-Algorithms

Lars Arge

I/O-algorithms

39

References

• External Memory Geometric Data StructuresLecture notes by Lars Arge.– Section 8-9