Download - I/O-Algorithms
![Page 2: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/2.jpg)
Lars Arge
I/O-algorithms
2
I/O-Model
• ParametersN = # elements in problem instanceB = # elements that fits in disk blockM = # elements that fits in main memory
T = # output size in searching problem
• We often assume that M>B2
• I/O: Movement of block between memory and disk
D
P
M
Block I/O
![Page 3: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/3.jpg)
Lars Arge
I/O-algorithms
3
Fundamental Bounds Internal External
• Scanning: N
• Sorting: N log N• Permuting
• Searching: NBlog
BN
BN
BMlog
BN
log,min BN
BN
BMNN
N2log
![Page 4: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/4.jpg)
Lars Arge
I/O-algorithms
4
Fundamental Data Structures
• B-trees: Node degree (B) queries in– Rebalancing using split/fuse updates in
• Weight-balanced B-tress: Weight rather than degree constraint Ω(w(v)) updates below v between rebalancing operations on v
• Persistent B-trees:– Update in current version in – Search in all previous versions in
• Buffer trees– Batching of operations to obtain bounds construction algorithms
)(log BT
B NO )(log NO B
)(log BT
B NO )(log NO B
)log( 1BN
BMBO)log( B
NBN
BMO
![Page 5: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/5.jpg)
Lars Arge
I/O-algorithms
5
Last time: Interval management• Maintain N intervals with unique endpoints dynamically such that
stabbing query with point x can be answered efficiently
• Static solution: Persistent B-tree– Linear space and query
• Dynamic solution: External interval tree– update
x
)(log BT
B NO
)(log NO B
![Page 6: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/6.jpg)
Lars Arge
I/O-algorithms
6
• Base tree on endpoints – “slab” Xv associated with each node v
• Interval stored in highest node v where it contains midpoint of Xv
• Intervals Iv associated with v stored in– Left slab list sorted by left endpoint (search tree)– Right slab list sorted by right endpoint (search tree) Linear space and O(log N) update
Internal Interval Tree
![Page 7: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/7.jpg)
Lars Arge
I/O-algorithms
7
• Query with x on left side of midpoint of Xroot
– Search left slab list left-right until finding non-stabbed interval– Recurse in left child
O(log N+T) query bound
x
Internal Interval Tree
![Page 8: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/8.jpg)
Lars Arge
I/O-algorithms
8
Externalizing Interval Tree
• Natural idea:– Block tree– Use B-tree for slab lists
• Number of stabbed intervals in large slab list may be small (or zero)– We can be forced to do I/O in each of O(log N) nodes
![Page 9: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/9.jpg)
Lars Arge
I/O-algorithms
9
Externalizing Interval Tree
• Idea:– Decrease fan-out to height remains – slabs define multislabs– Interval stored in two slab lists (as before) and one multislab list– Intervals in small multislab lists collected in underflow structure– Query answered in v by looking at 2 slab lists and not O(log N)
)( B )(log NO B
)( B )(B
)( B
multislab
![Page 10: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/10.jpg)
Lars Arge
I/O-algorithms
10
External Interval Tree• Linear space, query, update
• General solution techniques:– Filtering: Charge part of query cost to output– Bootstrapping:
* Use O(B2) size structure in each internal node* Constructed using persistence* Dynamic using global rebuilding
– Weight-balanced B-tree: Split/fuse in amortized O(1)
)(log BT
B NO )(log NO B
![Page 11: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/11.jpg)
Lars Arge
I/O-algorithms
11
Last time: Three-Sided Range Queries• Interval management: “1.5 dimensional” search
• More general 2d problem: Dynamic 3-sidede range searching– Maintain set of points in plane such
that given query (q1, q2, q3), all points
(x,y) with q1 x q2 and y q3 can
be found efficiently• Linear space, O(logB N+T/B) query static solution using persistence
– Dynamic: External priority search tree
(x,x)
(x1,x2)
x
x1 x2
q3
q2q1
![Page 12: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/12.jpg)
Lars Arge
I/O-algorithms
12
• Base tree on x-coordinates with nodes augmented with points• Heap on y-coordinates
– Decreasing y values on root-leaf path– (x,y) on path from root to leaf holding x– If v holds point then parent(v) holds point
Linear space and O(log N) update
Internal Priority Search Tree9
16.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
![Page 13: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/13.jpg)
Lars Arge
I/O-algorithms
13
Internal Priority Search Tree
• Query with (q1, q2, q3) starting at root v:– Report point in v if satisfying query– Visit both children of v if point reported– Always visit child(s) of v on path(s) to q1 and q2
O(log N+T) query
916.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
4
194
![Page 14: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/14.jpg)
Lars Arge
I/O-algorithms
14
• Natural idea: Block tree• Problem:
– I/Os to follow paths to to q1 and q2
– But O(T) I/Os may be used to visit other nodes (“overshooting”) query
Externalizing Priority Search Tree9
16.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
)(log NO B
)(log TNO B
![Page 15: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/15.jpg)
Lars Arge
I/O-algorithms
15
Externalizing Priority Search Tree
• Solution idea:– Store B points in each node
* O(B2) points stored in each supernode* B output points can pay for “overshooting”
– Bootstrapping:* Store O(B2) points in each supernode in static structure
916.20
1619,9
1313,3
1920,3
45,6
59,4
11,2
201916139544,1
1
![Page 16: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/16.jpg)
Lars Arge
I/O-algorithms
16
Summary/Conclusion: Priority Search Tree• We have now discussed structures for special cases of two-
dimensional range searching– Space: O(N/B)– Query: – Updates:
• Cannot be obtained for general (4-sided) 2d range searching:– query requires space– space requires query
q3
q2q1q
q
q3
q2q1
q4
)( logloglog
NN
BN
BB
BΩ)(log NO cB
)( BNO )( B
N
)(log NO B
)(log BT
B NO
![Page 17: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/17.jpg)
Lars Arge
I/O-algorithms
17
• Base tree: Weight balanced tree with branching parameter and leaf parameter B on x-coordinates height
• Points below each node stored in 4 linear space secondary structures:– “Right” priority search tree– “Left” priority search tree– B-tree on y-coordinates– Interval (priority search) tree
space
External Range Tree
)()(log logloglog
log NN
N BB
BB
ONO
)(log NΘ B
)( logloglog
NN
BN
BB
BO
)(log NB
![Page 18: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/18.jpg)
Lars Arge
I/O-algorithms
18
• Secondary interval tree:– Connect points in each slab in y-order– Project obtained segments in y-axis
– Intervals stored in interval tree* Interval augmented with pointer to corresponding points in y-
coordinate B-tree in corresponding child node
External Range Tree
)(log NB
![Page 19: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/19.jpg)
Lars Arge
I/O-algorithms
19
• Query with (q1, q2, q3 , q4) answered in top node with q1 and q2 in different slabs v1 and v2
• Points in slab v1
– Found with 3-sided query in v1
using right priority search tree• Points in slab v2
– Found with 3-sided query in v2
using left priority search tree• Points in slabs between v1 and v2
– Answer stabbing query with q3 using interval tree
first point above q3 in each of the slabs– Find points using y-coordinate B-tree in slabs
External Range Tree
)(log NO B
)(log NO B
)(log NB
v1 v2
![Page 20: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/20.jpg)
Lars Arge
I/O-algorithms
20
External Range Tree• Query analysis:
– I/Os to find relevant node– I/Os to answer two 3-sided queries– I/Os to query interval tree– I/Os to traverse B-trees
I/Os
)(log NO B)(log B
TB NO
)(log)(log log NONO BBN
BB )(log B
TB NO )(log NO B
)(log BT
B NO )(log NB
v1 v2
![Page 21: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/21.jpg)
Lars Arge
I/O-algorithms
21
External Range Tree• Insert:
– Insert x-coordinate in weight-balanced B-tree* Split of v can be performed in I/Os
I/Os– Update secondary structures in all nodes on one
root-leaf path* Update priority search trees* Update interval tree* Update B-tree
I/Os• Delete:
– Similar and using global rebuilding
)( logloglog
NNBB
BO)( loglog
log2
NNBB
BO))(log)(( vwvwO B
)( logloglog2
NNBB
BO
)(log NB
v1 v2
![Page 22: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/22.jpg)
Lars Arge
I/O-algorithms
22
Summary: External Range Tree• 2d range searching in space
– I/O query– I/O update
• Optimal among query structures
)( logloglog2
NNBB
BO)(log B
TB NO
)( logloglog
NN
BN
BB
BO
)(log BT
B NO
q3
q2q1
q4
![Page 23: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/23.jpg)
Lars Arge
I/O-algorithms
23
kdB-tree
• kd-tree:– Recursive subdivision of point-set into two half using
vertical/horizontal line– Horizontal line on even levels, vertical on uneven levels– One point in each leaf
Linear space and logarithmic height
![Page 24: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/24.jpg)
Lars Arge
I/O-algorithms
24
kd-Tree: Query
• Query– Recursively visit nodes corresponding to regions intersecting query– Report point in trees/nodes completely contained in query
• Query analysis– Horizontal line intersect Q(N) = 2+2Q(N/4) = regions– Query covers T regions I/Os worst-case
)( NO
)( TNO
![Page 25: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/25.jpg)
Lars Arge
I/O-algorithms
25
kdB-tree
• kdB-tree:– Stop subdivision when leaf contains between B/2 and B points– BFS-blocking of internal nodes
• Query as before– Analysis as before but each region now contains Θ(B) points I/O query)( B
TB
NO
![Page 26: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/26.jpg)
Lars Arge
I/O-algorithms
26
Construction of kdB-tree
• Simple algorithm– Find median of y-coordinates (construct root)– Distribute point based on median– Recursively build subtrees– Construct BFS-blocking top-down
• Idea in improved algorithm– Construct levels at a time using O(N/B) I/Os
)log( 2 BN
BNO
)log( BN
BN
BMO
BMlog
![Page 27: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/27.jpg)
Lars Arge
I/O-algorithms
27
Construction of kdB-tree• Sort N points by x- and by y-coordinates using I/Os• Building levels ( nodes) in O(N/B) I/Os:
1. Construct by grid with points in each slab2. Count number of points in each grid cell and store in memory3. Find slab s with median x-coordinate4. Scan slab s to find median x-coordinate and construct node5. Split slab containing median x-coordinate and update counts6. Recurse on each side of median x-coordinate using grid (step 3)
Grid grows to during algorithm Each node constructed in I/Os
BMlog
)log( BN
BN
BMO
BM
BM
BM
N
BM
)( BM
BM
BM
BM
))/(( BNO BM
![Page 28: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/28.jpg)
Lars Arge
I/O-algorithms
28
kdB-tree
• kdB-tree:– Linear space– Query in I/Os – Construction in I/Os– Point search in I/Os
• Dynamic?– Deletions relatively easily in I/Os (partial rebuilding)
)( BT
BNO
)(log NO B
)log( / BN
BMBNO
)(log2 NO B
![Page 29: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/29.jpg)
Lars Arge
I/O-algorithms
29
kdB-tree Insertion using Logarithmic Method• Partition pointset S into subsets S0, S1, … Slog N, |Si| = 2i or |Si| = 0
• Build kdB-tree Di on Si
• Query: Query each Di
• Insert: Find first empty Di and construct Di out of
elements in S0,S1, … Si-1
– I/Os per moved point– Point moved O(log N) times
I/Os amortized
..................................
0 2222 1 2 log N
iij
j 221 10
)()(log
02
BT
BN
BTN
i B OO ii
)log( 2/
2BBMB
iiO )log( /1
BN
BMBO
)(log)loglog( 2/
1 NONO BBN
BMB
![Page 30: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/30.jpg)
Lars Arge
I/O-algorithms
30
kdB-tree Insertion and Deletion• Insert: Use logarithmic method ignoring deletes• Delete: Simply delete point p from relevant Di
– i can be calculated based on # insertions since p was inserted– # insertions calculated by storing insertion number of each point
in separate B-tree extra update cost
• To maintain O(log N) structures Di
– Perform global rebuild after every Θ(N) updates extra update cost
..................................
0 2222 1 2 log N
)(log NO B
)(log)log( /1 NOO BB
NBMB
![Page 31: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/31.jpg)
Lars Arge
I/O-algorithms
31
Summary: kdB-tree
• 2d range searching in O(N/B) space– Query in I/Os – Construction in I/Os– Updates in I/Os
• Optimal query among linear space structures
)( BT
BNO
)log( / BN
BMBNO
)(log2 NO B
q3
q2q1
q4
![Page 32: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/32.jpg)
Lars Arge
I/O-algorithms
32
O-Tree Structure• O-tree:
– B-tree on vertical slabs– B-tree on horizontal slabs in each vertical slab– kdB-tree on points in each leaf
NBBN log
)log( NBBN
)log()( 2)log( 2 NB BN
NBB
N
NBBN 2log
)log( NBBN
NB B2log
![Page 33: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/33.jpg)
Lars Arge
I/O-algorithms
33
O-Tree Query• Perform rangesearch with q1 and q2 in vertical B-tree
– Query all kdB-trees in leaves of two horizontal B-trees with x-interval intersected but not spanned by query
– Perform rangesearch with q3 and q4 horizontal B-trees with x-interval spanned by query* Query all kdB-trees with range intersected by query
NBBN log
NBBN 2log
NB B2log
![Page 34: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/34.jpg)
Lars Arge
I/O-algorithms
34
O-Tree Query Analysis• Vertical B-tree query: • Query of all kdB-trees in leaves of two horizontal B-trees:
• Query horizontal B-trees:
• Query kdB-trees not completely in query
• Query in kdB-trees completelycontained in query:
I/Os
)())log((log BN
BBN
B ONO
)()()log()log( 2BT
BN
BT
BBBN OOBNBONO
)log( NO BBN
)())log((log)log( BN
BBN
BBBN ONONO
)()()log()log(2 2BT
BN
BT
BBBN OOBNBONO
)( BTO
)( BT
BNO
)log(2 NO BBN
![Page 35: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/35.jpg)
Lars Arge
I/O-algorithms
35
O-Tree Update• Insert:
– Search in vertical B-tree: I/Os– Search in horizontal B-tree: I/Os– Insert in kdB-tree: I/Os
• Use global rebuilding when structures grow too big/small– B-trees not contain elements– kdB-trees not contain elements
I/Os
• Deletes can be handledin I/Os similarly
)log( NBBN
)log( 2 NB B
)(log NO B)(log NO B
)(log))log((log 22 NONBO BBB
)(log NO B
)(log NO B
![Page 36: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/36.jpg)
Lars Arge
I/O-algorithms
36
Summary: O-Tree• 2d range searching in linear space
– I/O query– I/O update
• Optimal among structuresusing linear space
• Can be extended to work in d-dimensionswith optimal query bound
q3
q2q1
q4
)(log NO B
)( BT
BNO
))((11
BT
BN dO
![Page 37: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/37.jpg)
Lars Arge
I/O-algorithms
37
Summary/Conclusion: 3 and 4-sided Queries• 3-sided 2d range searching: External priority search tree
– query, space, update
• General (4-sided) 2d range searching:– External range tree: query, space,
update – O-tree: query, space, update
q3
q2q1
q3
q2q1
q4
)( logloglog
NN
BN
BB
BO)(log BT
B NO
)( BNO)( B
TB
NO
)(log NO B)(log BT
B NO
)(log NO B
)( logloglog2
NNBB
BO
)( BNO
![Page 38: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/38.jpg)
Lars Arge
I/O-algorithms
38
Summary/Conclusion: Tools and Techniques• Tools:
– B-trees– Persistent B-trees– Buffer trees– Logarithmic method– Weight-balanced B-trees– Global rebuilding
• Techniques:– Bootstrapping– Filtering
q3
q2q1
q3
q2q1
q4
(x,x)
![Page 39: I/O-Algorithms](https://reader036.vdocument.in/reader036/viewer/2022081603/56815c2c550346895dca082b/html5/thumbnails/39.jpg)
Lars Arge
I/O-algorithms
39
References
• External Memory Geometric Data StructuresLecture notes by Lars Arge.– Section 8-9