external memory algorithms for geometric problems piotr indyk (slides partially by lars arge and...
TRANSCRIPT
![Page 1: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/1.jpg)
External Memory Algorithms for Geometric Problems
Piotr Indyk
(slides partially by Lars Arge and Jeff Vitter)
![Page 2: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/2.jpg)
Lars Arge
External memory data structures
2
Today• 1D data structure for searching in external memory
– O(log N) I/O’s using standard data structures
– Will show how to reduce it to O(log BN)
• 2D problem: finding all intersections among a set of horizontal and vertical segments
– O(N log N)-time in main memory
– O(N log B N) I/O’s using B-trees
– O(N/B * log M/B N) I/O’s using distribution sweeping
• Another 2D problem: off-line range queries
– O(N/B * log M/B N) I/O’s, again using distribution sweeping
![Page 3: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/3.jpg)
Lars Arge
External memory data structures
3
Searching in External Memory
• Dictionary (or successor) data structure for 1D data:
– Maintains elements (e.g., numbers) under insertions and deletions
– Given a key K, reports the successor of K; i.e., the smallest element which is greater or equal to K
![Page 4: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/4.jpg)
Lars Arge
External memory data structures
4
Model• Model as previously
– N : Elements in structure
– B : Elements per block
– M : Elements in main memory
D
P
M
Block I/O
![Page 5: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/5.jpg)
Lars Arge
External memory data structures
5
Search in time
• Binary search tree:
– Standard method for search among N elements
– We assume elements in leaves
– Search traces at least one root-leaf path
Search Trees
)(log2 N
)(log2 N
![Page 6: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/6.jpg)
Lars Arge
External memory data structures
6
• (a,b)-tree uses linear space and has height
Choosing a,b = each node/leaf stored in one disk block
space and query
(a,b)-tree (or B-tree)• T is an (a,b)-tree (a≥2 and b≥2a-1)
– All leaves on the same level (contain between a and b elements)
– Except for the root, all nodes have degree between a and b
– Root has degree between 2 and b
)(log NO a
)( BN )(log NB
)(B
tree
![Page 7: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/7.jpg)
Lars Arge
External memory data structures
7
(a,b)-Tree Insert• Insert:
Search and insert element in leaf v
DO v has b+1 elements
Split v:
make nodes v’ and v’’ with
and elements
insert element (ref) in parent(v)
(make new root if necessary)
v=parent(v)
• Insert touches nodes
bb 2
1 ab 2
1
)(log Na
v
v’ v’’
21b 2
1b
1b
![Page 8: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/8.jpg)
Lars Arge
External memory data structures
8
(a,b)-Tree Delete• Delete:
Search and delete element from leaf v
DO v has a-1 children
Fuse v with sibling v’:
move children of v’ to v
delete element (ref) from parent(v)
(delete root if necessary)
If v has >b (and ≤ a+b-1) children split v
v=parent(v)
• Delete touches nodes )(log Na
v
v
1a
12 a
![Page 9: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/9.jpg)
Lars Arge
External memory data structures
9
B-trees• Used everywhere in databases
• Typical depth is 3 or 4
• Top two levels kept in main memory – only 1-2 I/O’s per element
![Page 10: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/10.jpg)
Lars Arge
External memory data structures
10
Horizontal/Vertical Line Intersection• Given: a set of N horizontal and vertical line segments
• Goal: find all H/V intersections
• Assumption: all x and y coordinates of endpoints different
![Page 11: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/11.jpg)
Lars Arge
External memory data structures
11
Main Memory Algorithm• Presort the points in y-order
• Sweep the plane top down with a horizontal line
• When reaching a V-segment, store its x value in a tree. When leaving it, delete the x value from the tree
• Invariant: the balanced tree stores the V-segments hit by the sweep line
• When reaching an H-segment, search (in the tree) for its endpoints, and report all values/segments in between
• Total time is O(N log N + Z)
![Page 12: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/12.jpg)
Lars Arge
External memory data structures
12
External Memory Issues
• Can use B-tree as a search tree: O(N log B N) I/O’s
• Still much worse than the O(N/B * log M/B N) sorting time….
![Page 13: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/13.jpg)
Lars Arge
External memory data structures
13
1D Version of the Intersection Problem• Given: a set of N 1D horizontal and vertical line segments (i.e.,
intervals and points on a line)
• Goal: find all point/segment intersections
• Assumption: all x coordinates of endpoints different
![Page 14: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/14.jpg)
Lars Arge
External memory data structures
14
Interlude: External Stack• Stack:
– Push
– Pop
• Can implement a stack in external memory using O(P/B) I/O’s per P operations
– Always keep about B top elements in main memory
– Perform disk access only when it is “earned”
![Page 15: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/15.jpg)
Lars Arge
External memory data structures
15
Back to 1D Intersection Problem• Will use fast stack and sorting implementations
• Sort all points and intervals in x-order (of the left endpoint)
• Iterate over consecutive (end)points p
– If p is a left endpoint of I, add I to the stack S
– If p is a point, pop all intervals I from stack S and push them on stack S’, while:
* Eliminating all “dead” intervals
* Reporting all “alive” intervals
– Push the intervals back from S’ to S
![Page 16: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/16.jpg)
Lars Arge
External memory data structures
16
Analysis
• Sorting: O(N/B * log M/B N) I/O’s
• Each interval is pushed/popped when:
– An intersection is reported, or
– Is eliminated as “dead”
• Total stack operations: O(N+Z)
• Total stack I/O’s: O( (N+Z)/B )
![Page 17: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/17.jpg)
Lars Arge
External memory data structures
17
Back to the 2D Case
• Ideas ?
![Page 18: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/18.jpg)
Lars Arge
External memory data structures
18
Algorithm• Divide the x-range into M/B slabs, so that each
slab contains the same number of V-segments
• Each slab has a stack storing V-segments
• Sort all segments in the y-order
• For each segment I:
– If I is a V-segment, add I to the stack in the proper slab
– If I is an H-segment, then for all slabs S which intersect I:
* If I spans S, proceed as in the 1D case
* Otherwise, store the intersection of S and I for later
• For each slab, recurse on the segments stored in that slab
![Page 19: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/19.jpg)
Lars Arge
External memory data structures
19
The recursion• For each slab separately we apply the
same algorithm
• On the bottom level we have only one V-segment , which is easy to handle
• Recursion depth: log M/B N
![Page 20: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/20.jpg)
Lars Arge
External memory data structures
20
Analysis
• Initial presorting: O(N/B * log M/B N) I/O’s
• First level of recursion:
– At most O(N+Z) pop/push operations
– At most 2N of H-segments stored
– Total: O(N/B) I/O’s
• Further recursion levels:
– The total number of H-segment pieces (over all slabs) is at most twice the number of the input H-segments; it does not double at each level
– By the above argument we pay O(N/B) I/O’s per level
• Total: O(N/B * log M/B N) I/O’s
![Page 21: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/21.jpg)
Lars Arge
External memory data structures
21
Off-line Range Queries• Given: N points in 2D and N’ rectangles
• Goal: Find all pairs p, R such that p is in R
![Page 22: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/22.jpg)
Lars Arge
External memory data structures
22
Summary
• On-line queries: O(log B N) I/O’s
• Off-line queries: O(1/B*log M/B N) I/O’s amortized
• Powerful techniques:
– Sorting
– Stack
– Distribution sweep
![Page 23: External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)](https://reader037.vdocument.in/reader037/viewer/2022110210/56649eb05503460f94bb5b4b/html5/thumbnails/23.jpg)
Lars Arge
External memory data structures
23
References• See http://www.brics.dk/MassiveData02, especially:
– First lecture by Lars Arge (for B-trees etc)
– Second lecture by Jeff Vitter (for distribution sweep)