b-trees and hash tables dynamic indexability and the ...yike/talks/external-lb-overview.pdf · hash...
TRANSCRIPT
![Page 1: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/1.jpg)
1-1
Dynamic Indexability and the Optimality ofB-trees and Hash Tables
Ke Yi
Hong Kong University of Science and Technology
Dynamic Indexability and Lower Bounds for Dynamic One-DimensionalRange Query Indexes, PODS ’09
Dynamic External Hashing: The Limit of Buffering, with Zhewei Weiand Qin Zhang, SPAA ’09
+ some latest development
![Page 2: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/2.jpg)
2-1
An index is . . .
An index is a single number calculated from a set of prices
Dow Jones, S & P, Hang Seng
![Page 3: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/3.jpg)
2-2
An index is . . .
An index is a single number calculated from a set of prices
Dow Jones, S & P, Hang Seng
An index is a list of keywords and their page numbers in a book
An index is an exponent
An index is a finger
An index is a list of academic publications and their citations
![Page 4: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/4.jpg)
2-3
An index is . . .
An index (database) is a (disk-based) data structure that im-proves the speed of data retrieval operations (queries) on adatabase table.
An index is a single number calculated from a set of prices
Dow Jones, S & P, Hang Seng
An index is a list of keywords and their page numbers in a book
An index is an exponent
An index is a finger
An index is a list of academic publications and their citations
An index (search engine) is an inverted list from keywords to webpages
![Page 5: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/5.jpg)
3-1
Hash Table and B-tree
Hash tables and B-trees are taught to undergrads and actuallyused in all database systems
![Page 6: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/6.jpg)
3-2
Hash Table and B-tree
Hash tables and B-trees are taught to undergrads and actuallyused in all database systems
B-tree: lookups and range queries; Hash table: lookups
![Page 7: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/7.jpg)
3-3
Hash Table and B-tree
Hash tables and B-trees are taught to undergrads and actuallyused in all database systems
B-tree: lookups and range queries; Hash table: lookups
External memory model (I/O model):
Each I/O reads/writes a block
Memory of size M
Disk partitioned into blocks of size B
Memory
Disk
![Page 8: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/8.jpg)
4-1
The B-tree
![Page 9: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/9.jpg)
4-2
The B-tree
A range query in O(logB N +K/B) I/Os
K: output size
![Page 10: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/10.jpg)
4-3
The B-tree
A range query in O(logB N +K/B) I/Os
K: output size
memory
logB N − logBM = logBNM
![Page 11: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/11.jpg)
4-4
The B-tree
A range query in O(logB N +K/B) I/Os
K: output size
memory
logB N − logBM = logBNM
The height of B-tree never goes beyond 5 (e.g., if B = 100, thena B-tree with 5 levels stores n = 10 billion records). We willassume logB
NM = O(1).
![Page 12: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/12.jpg)
5-1
External Hashing
null
null
null null
null
null
null
1 21
32 82
64 34 24
55
h(x) = last digit of x
14 null null
![Page 13: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/13.jpg)
5-2
External Hashing
null
null
null null
null
null
null
1 21
32 82
64 34 24
55
h(x) = last digit of x
14 null null
Ideal hash function assumption: h maps each object to a hash valueuniformly independently at random
![Page 14: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/14.jpg)
5-3
External Hashing
null
null
null null
null
null
null
1 21
32 82
64 34 24
55
h(x) = last digit of x
14 null null
Ideal hash function assumption: h maps each object to a hash valueuniformly independently at random
Expected average cost of a successful (or unsuccessful) lookup is1 + 1/2Ω(B) disk accesses, provided the load factor is less than aconstant smaller than 1 [Knuth, 1973]
![Page 15: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/15.jpg)
6-1
Exact Numbers Calculated by Knuth
The Art of Computer Programming, volume 3, 1998, page 542
![Page 16: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/16.jpg)
6-2
Exact Numbers Calculated by Knuth
The Art of Computer Programming, volume 3, 1998, page 542
Extremely close to ideal
![Page 17: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/17.jpg)
7-1
Now Let’s Go Dynamic
B-tree: Split blocks when necessary
Focus on insertions first: Both the B-tree and hash table do asearch first, then insert into the appropriate block
Hashing: Rebuild the hash table when too full; extensible hashing[Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin,80]
![Page 18: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/18.jpg)
7-2
Now Let’s Go Dynamic
B-tree: Split blocks when necessary
Focus on insertions first: Both the B-tree and hash table do asearch first, then insert into the appropriate block
These resizing operations only add O(1/B) I/Os amortized perinsertion; bottleneck is the first search + insert
Hashing: Rebuild the hash table when too full; extensible hashing[Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin,80]
![Page 19: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/19.jpg)
7-3
Now Let’s Go Dynamic
Cannot hope for lower than 1 I/O per insertion only if thechanges must be committed to disk right away (necessary?)
B-tree: Split blocks when necessary
Focus on insertions first: Both the B-tree and hash table do asearch first, then insert into the appropriate block
These resizing operations only add O(1/B) I/Os amortized perinsertion; bottleneck is the first search + insert
Hashing: Rebuild the hash table when too full; extensible hashing[Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin,80]
![Page 20: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/20.jpg)
7-4
Now Let’s Go Dynamic
Cannot hope for lower than 1 I/O per insertion only if thechanges must be committed to disk right away (necessary?)
B-tree: Split blocks when necessary
Otherwise we probably can lower the amortized insertion cost bybuffering, like numerous problems in external memory, e.g. stack,priority queue,... All of them support an insertion in O(1/B) I/Os— the best possible
Focus on insertions first: Both the B-tree and hash table do asearch first, then insert into the appropriate block
These resizing operations only add O(1/B) I/Os amortized perinsertion; bottleneck is the first search + insert
Hashing: Rebuild the hash table when too full; extensible hashing[Fagin, Nievergelt, Pippenger, Strong, 79]; linear hashing [Litwin,80]
![Page 21: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/21.jpg)
8-1
Dynamic B-trees
Dynamic Hash Tables
![Page 22: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/22.jpg)
9-1
Dynamic B-trees for Fast Insertions
LSM-tree [O’Neil, Cheng, Gawlick,O’Neil, Acta Informatica’96]: Log-arithmic method + B-tree
memory
`2M
`M
M
![Page 23: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/23.jpg)
9-2
Dynamic B-trees for Fast Insertions
LSM-tree [O’Neil, Cheng, Gawlick,O’Neil, Acta Informatica’96]: Log-arithmic method + B-tree
memory
`2M
`M
M
Insertion: O( `B
log`NM
)
Query: O(log`NM
) (omit the KB
output term)
![Page 24: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/24.jpg)
9-3
Dynamic B-trees for Fast Insertions
LSM-tree [O’Neil, Cheng, Gawlick,O’Neil, Acta Informatica’96]: Log-arithmic method + B-tree
memory
`2M
`M
M
Insertion: O( `B
log`NM
)
Query: O(log`NM
) (omit the KB
output term)
Stepped merge tree [Jagadish, Narayan, Seshadri, Sudar-shan, Kannegantil, VLDB’97]: variant of LSM-tree
Insertion: O( 1B
log`NM
)
Query: O(` log`NM
)
![Page 25: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/25.jpg)
9-4
Dynamic B-trees for Fast Insertions
LSM-tree [O’Neil, Cheng, Gawlick,O’Neil, Acta Informatica’96]: Log-arithmic method + B-tree
memory
`2M
`M
M
Insertion: O( `B
log`NM
)
Query: O(log`NM
) (omit the KB
output term)
Stepped merge tree [Jagadish, Narayan, Seshadri, Sudar-shan, Kannegantil, VLDB’97]: variant of LSM-tree
Insertion: O( 1B
log`NM
)
Query: O(` log`NM
)
Usually ` is set to be a constant, then they both haveO( 1
B log NM ) insertion and O(log N
M ) query
![Page 26: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/26.jpg)
10-1
More Dynamic B-trees
Y-tree [Jermaine, Datta, Omiecinski, VLDB’99]
Buffer-tree (buffered-repository tree) [Arge, WADS’95; Buchsbaum,
Goldwasser, Venkatasubramanian, Westbrook, SODA’00]
Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,
Nelson, SPAA’07]
![Page 27: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/27.jpg)
10-2
More Dynamic B-trees
Y-tree [Jermaine, Datta, Omiecinski, VLDB’99]
Buffer-tree (buffered-repository tree) [Arge, WADS’95; Buchsbaum,
Goldwasser, Venkatasubramanian, Westbrook, SODA’00]
Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,
Nelson, SPAA’07]
q ulogB 1
B logB1 1
BBε
Bε 1B
![Page 28: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/28.jpg)
10-3
More Dynamic B-trees
Y-tree [Jermaine, Datta, Omiecinski, VLDB’99]
Buffer-tree (buffered-repository tree) [Arge, WADS’95; Buchsbaum,
Goldwasser, Venkatasubramanian, Westbrook, SODA’00]
Deletions? Standard trick: inserting “delete signals”
Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,
Nelson, SPAA’07]
q ulogB 1
B logB1 1
BBε
Bε 1B
![Page 29: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/29.jpg)
10-4
More Dynamic B-trees
Y-tree [Jermaine, Datta, Omiecinski, VLDB’99]
No better solutions known ...
Buffer-tree (buffered-repository tree) [Arge, WADS’95; Buchsbaum,
Goldwasser, Venkatasubramanian, Westbrook, SODA’00]
Deletions? Standard trick: inserting “delete signals”
Streaming B-tree [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul,
Nelson, SPAA’07]
q ulogB 1
B logB1 1
BBε
Bε 1B
Cache-oblivious model [Demaine, Fineman, Iacono, Langerman, Munro,
SODA’10]
![Page 30: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/30.jpg)
11-1
Compare with the rich results in RAM!
Θ(√
logN/ log logN) insertion and query [Andersson, Thorup,JACM’07]
Predecessor
Range reporting
O(logN/ log logN) insertion and O(log logN) query [Mortensen,Pagh, Patrascu, STOC’05]
O(√
logN/ log logN) insertion and query [Andersson, Thorup,JACM’07]
Partial-sum
Θ(logN) insertion query [Patrascu, Demaine, SODA’04]
Other results that depend on the word size w
![Page 31: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/31.jpg)
12-1
Are the EM and DB people just dumb?
![Page 32: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/32.jpg)
13-1
Our Main Result
For any dynamic range query index with a query cost of q and anamortized insertion cost of u, the following tradeoff holdsq · log(uB/q) = Ω(logB), for q < α logB,α is any constant;uB · log q = Ω(logB), for all q.
![Page 33: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/33.jpg)
13-2
Our Main Result
For any dynamic range query index with a query cost of q and anamortized insertion cost of u, the following tradeoff holdsq · log(uB/q) = Ω(logB), for q < α logB,α is any constant;uB · log q = Ω(logB), for all q.
Current upper bounds:q u
logB 1B logB
1 1BB
ε
Bε 1B
Assuming logBNM
= O(1), all the bounds are tight!
![Page 34: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/34.jpg)
13-3
Our Main Result
For any dynamic range query index with a query cost of q and anamortized insertion cost of u, the following tradeoff holdsq · log(uB/q) = Ω(logB), for q < α logB,α is any constant;uB · log q = Ω(logB), for all q.
Current upper bounds:q u
logB 1B logB
1 1BB
ε
Bε 1B
Assuming logBNM
= O(1), all the bounds are tight!
Can’t be true for B = o(√
log n log log n), since the exponentialtree achieves u = q = O(
√log n/ log log n) [Andersson, Thorup,
JACM’07]. (n = N/M)
1B
log NM
log NM
![Page 35: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/35.jpg)
14-1
The real question
How large does B need to be for buffer-treeto be optimal for range reporting?
Known: somewhere betweenΩ(√
log n log log n) and O(nε)
![Page 36: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/36.jpg)
15-1
Lower Bound Model: Dynamic Indexability
Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS’97, JACM’02]
![Page 37: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/37.jpg)
15-2
Lower Bound Model: Dynamic Indexability
Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS’97, JACM’02]
4 7 9 1 2 4 3 5 8 2 6 7 1 8 9 4 5
Objects are stored in disk blocks of size up to B, possibly withredundancy.
![Page 38: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/38.jpg)
15-3
Lower Bound Model: Dynamic Indexability
Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS’97, JACM’02]
4 7 9 1 2 4 3 5 8 2 6 7 1 8 9 4 5
Objects are stored in disk blocks of size up to B, possibly withredundancy.
a query reports 2,3,4,5
![Page 39: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/39.jpg)
15-4
Lower Bound Model: Dynamic Indexability
Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS’97, JACM’02]
4 7 9 1 2 4 3 5 8 2 6 7 1 8 9 4 5
Objects are stored in disk blocks of size up to B, possibly withredundancy.
a query reports 2,3,4,5
The query cost is the minimum number of blocks that cancover all the required results (search time ignored!).
cost = 2
![Page 40: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/40.jpg)
15-5
Lower Bound Model: Dynamic Indexability
Indexability: [Hellerstein, Koutsoupias, Papadimitriou, PODS’97, JACM’02]
4 7 9 1 2 4 3 5 8 2 6 7 1 8 9 4 5
Objects are stored in disk blocks of size up to B, possibly withredundancy.
a query reports 2,3,4,5
The query cost is the minimum number of blocks that cancover all the required results (search time ignored!).
cost = 2
Similar in spirit to popular lower bound models: cell probemodel, semigroup model
![Page 41: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/41.jpg)
16-1
Previous Results on Static Indexability
Nearly all external indexing lower bounds are under this model
Tradeoff between space (s) and query time (q)
![Page 42: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/42.jpg)
16-2
Previous Results on Static Indexability
Nearly all external indexing lower bounds are under this model
2D range queries: s/N · log q = Ω(log(N/B)) [Hellerstein,Koutsoupias, Papadimitriou, PODS’97], [Koutsoupias, Taylor, PODS’98],[Arge, Samoladas, Vitter, PODS’99]
Tradeoff between space (s) and query time (q)
![Page 43: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/43.jpg)
16-3
Previous Results on Static Indexability
2D stabbing queries: q · log(s/N) = Ω(log(N/B)) [Arge, Samo-ladas, Yi, ESA’04, Algorithmica’99]
Nearly all external indexing lower bounds are under this model
2D range queries: s/N · log q = Ω(log(N/B)) [Hellerstein,Koutsoupias, Papadimitriou, PODS’97], [Koutsoupias, Taylor, PODS’98],[Arge, Samoladas, Vitter, PODS’99]
Tradeoff between space (s) and query time (q)
![Page 44: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/44.jpg)
16-4
Previous Results on Static Indexability
2D stabbing queries: q · log(s/N) = Ω(log(N/B)) [Arge, Samo-ladas, Yi, ESA’04, Algorithmica’99]
1D range queries: s = N, q = 1 trivially
Nearly all external indexing lower bounds are under this model
2D range queries: s/N · log q = Ω(log(N/B)) [Hellerstein,Koutsoupias, Papadimitriou, PODS’97], [Koutsoupias, Taylor, PODS’98],[Arge, Samoladas, Vitter, PODS’99]
Tradeoff between space (s) and query time (q)
![Page 45: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/45.jpg)
16-5
Previous Results on Static Indexability
2D stabbing queries: q · log(s/N) = Ω(log(N/B)) [Arge, Samo-ladas, Yi, ESA’04, Algorithmica’99]
1D range queries: s = N, q = 1 trivially
Adding dynamization makes it much more interesting!
Nearly all external indexing lower bounds are under this model
2D range queries: s/N · log q = Ω(log(N/B)) [Hellerstein,Koutsoupias, Papadimitriou, PODS’97], [Koutsoupias, Taylor, PODS’98],[Arge, Samoladas, Vitter, PODS’99]
Tradeoff between space (s) and query time (q)
![Page 46: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/46.jpg)
17-1
Dynamic Indexability
Still consider only insertions
![Page 47: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/47.jpg)
17-2
Dynamic Indexability
Still consider only insertions
4 5time t:
memory of size M
4 7 9 ← snapshot1 2 7
blocks of size B = 3
![Page 48: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/48.jpg)
17-3
Dynamic Indexability
Still consider only insertions
4 5time t:
memory of size M
4 7 9 ← snapshot1 2 7
blocks of size B = 3
4 5time t+ 1: 4 7 91 2 6 7 6 inserted
![Page 49: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/49.jpg)
17-4
Dynamic Indexability
Still consider only insertions
4 5time t:
memory of size M
4 7 9 ← snapshot1 2 7
1 2 6 7
8 inserted
blocks of size B = 3
4 5time t+ 1: 4 7 91 2 6 7 6 inserted
1 2 5time t+ 2: 4 7 9 6 8
![Page 50: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/50.jpg)
17-5
Dynamic Indexability
Still consider only insertions
4 5time t:
memory of size M
4 7 9 ← snapshot1 2 7
1 2 6 7
8 inserted
blocks of size B = 3
4 5time t+ 1: 4 7 91 2 6 7 6 inserted
1 2 5time t+ 2: 4 7 9 6 8
transition cost = 2
![Page 51: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/51.jpg)
17-6
Dynamic Indexability
Still consider only insertions
4 5time t:
memory of size M
4 7 9 ← snapshot1 2 7
1 2 6 7
8 inserted
blocks of size B = 3
4 5time t+ 1: 4 7 91 2 6 7 6 inserted
1 2 5time t+ 2: 4 7 9 6 8
transition cost = 2
Update cost: u = amortized transition cost per insertion
![Page 52: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/52.jpg)
18-1
The Ball-Shuffling Problem
→B balls q bins
![Page 53: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/53.jpg)
18-2
The Ball-Shuffling Problem
→B balls q bins
→ cost = 1
![Page 54: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/54.jpg)
18-3
The Ball-Shuffling Problem
→B balls q bins
→ cost = 1
→ cost = 2
cost of putting the ball directly into a bin = # balls in the bin + 1
![Page 55: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/55.jpg)
19-1
The Ball-Shuffling Problem
→B balls q bins
![Page 56: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/56.jpg)
19-2
The Ball-Shuffling Problem
→B balls q bins
→ cost = 5Shuffle:
![Page 57: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/57.jpg)
19-3
The Ball-Shuffling Problem
→B balls q bins
→ cost = 5
Cost of shuffling = # balls in the involved bins
Shuffle:
![Page 58: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/58.jpg)
19-4
The Ball-Shuffling Problem
→B balls q bins
→ cost = 5
Cost of shuffling = # balls in the involved bins
Shuffle:
Putting a ball directly into a bin is a special shuffle
![Page 59: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/59.jpg)
19-5
The Ball-Shuffling Problem
→B balls q bins
→ cost = 5
Cost of shuffling = # balls in the involved bins
Shuffle:
Putting a ball directly into a bin is a special shuffle
Goal: Accommodating all B balls using q bins with minimum cost
![Page 60: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/60.jpg)
20-1
The Workload Construction
round 1:
keys
time
![Page 61: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/61.jpg)
20-2
The Workload Construction
round 1:
round 2:
keys
time
![Page 62: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/62.jpg)
20-3
The Workload Construction
round 1:
round 2:
keys
time
round 3:
· · ·
round B:
![Page 63: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/63.jpg)
20-4
The Workload Construction
round 1:
round 2:
keys
time
round 3:
· · ·
round B:
Queries that we require the index to cover with q blocks# queries ≥ 2MB
![Page 64: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/64.jpg)
20-5
The Workload Construction
round 1:
round 2:
keys
time
round 3:
· · ·
round B:
Queries that we require the index to cover with q blocks# queries ≥ 2MB
snapshot
snapshot
snapshot
snapshot
Snapshots of the dynamic index considered
![Page 65: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/65.jpg)
21-1
The Workload Construction
round 1:
keys
time
round 2:
round 3:· · ·
round B:
There exists a query such that
• The ≤ B objects of the query reside in ≤ q blocksin all snapshots
• All of its objects are on disk in all B snapshots (wehave ≥MB queries)
• The index moves its objects uB2 times in total
![Page 66: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/66.jpg)
22-1
The Reduction
An index with update cost u and query A gives us asolution to the ball-shuffling game with cost uB2 for Bballs and q bins
![Page 67: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/67.jpg)
22-2
The Reduction
An index with update cost u and query A gives us asolution to the ball-shuffling game with cost uB2 for Bballs and q bins
Lower bound on the ball-shuffling problem:
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
![Page 68: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/68.jpg)
22-3
The Reduction
An index with update cost u and query A gives us asolution to the ball-shuffling game with cost uB2 for Bballs and q bins
Lower bound on the ball-shuffling problem:
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
q · log(uB/q) = Ω(logB), for q < α logB,α is any constant;uB · log q = Ω(logB), for all q.
⇒
![Page 69: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/69.jpg)
23-1
Ball-Shuffling Lower Bounds
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
q
cost lower bound
![Page 70: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/70.jpg)
23-2
Ball-Shuffling Lower Bounds
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
q
cost lower bound
1
B2
![Page 71: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/71.jpg)
23-3
Ball-Shuffling Lower Bounds
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
q
cost lower bound
1
B2
B4/3
2
![Page 72: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/72.jpg)
23-4
Ball-Shuffling Lower Bounds
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
q
cost lower bound
1
B2
B4/3
2 logB
B logB
![Page 73: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/73.jpg)
23-5
Ball-Shuffling Lower Bounds
Theorem: The cost of any solution for the ball-shuffling problemis at least
Ω(q ·B1+Ω(1/q)), for q < α logB where α is any constant;Ω(B logq B), for any q.
q
cost lower bound
1
B2
B4/3
2 logB
B logB
Bε
B
![Page 74: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/74.jpg)
24-1
Dynamic B-trees
Dynamic Hash Tables
![Page 75: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/75.jpg)
25-1
Dynamic Hash Tables
B-tree query I/O: O(logBNM )
Hash table query I/O: 1 + 1/2Ω(B); insertion the same
![Page 76: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/76.jpg)
25-2
Dynamic Hash Tables
B-tree query I/O: O(logBNM )
Hash table query I/O: 1 + 1/2Ω(B); insertion the same
A long-time conjecture in the external memory community:
The insertion cost must be Ω(1) I/Os if the query cost is requiredto be O(1) I/Os.
![Page 77: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/77.jpg)
25-3
Dynamic Hash Tables
B-tree query I/O: O(logBNM )
Hash table query I/O: 1 + 1/2Ω(B); insertion the same
A long-time conjecture in the external memory community:
The insertion cost must be Ω(1) I/Os if the query cost is requiredto be O(1) I/Os.
Buffering is useless?
![Page 78: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/78.jpg)
26-1
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
8m
![Page 79: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/79.jpg)
26-2
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
8m
![Page 80: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/80.jpg)
26-3
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
![Page 81: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/81.jpg)
26-4
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
x x/β
For some parameter β = Bc, c ≤ 1
![Page 82: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/82.jpg)
26-5
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
x x/β
For some parameter β = Bc, c ≤ 1
![Page 83: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/83.jpg)
26-6
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
x x/β
For some parameter β = Bc, c ≤ 1
2x2x
![Page 84: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/84.jpg)
26-7
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
x x/β
For some parameter β = Bc, c ≤ 1
2x2xInsertion: O(Bc−1)
![Page 85: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/85.jpg)
26-8
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
x x/β
For some parameter β = Bc, c ≤ 1
2x2xInsertion: O(Bc−1)
Query: 1+O(1/Bc)
![Page 86: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/86.jpg)
26-9
Dynamic Hash Tables (for successful queries)
Logarithmic method (folklore?)memory
4m
2m
m
Insertion: O( 1B log N
M )Expected average query: O(1)
Improving query time
Idea: Keep one table large enough
8m
x x/β
For some parameter β = Bc, c ≤ 1
2x2xInsertion: O(Bc−1)
Query: 1+O(1/Bc)
Still far from the target 1+1/Ω(2B)
![Page 87: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/87.jpg)
27-1
Query-Insertion Tradeoff for Successful queries
1 + 1/2Ω(B)
1−O(1/B(c−1)/4)
Ω(Bc−1)
O(Bc−1)
Ω(1)
O(1)
Insertion
Query
1 + Θ(1/B)
1 + Θ(1/Bc), c < 11
upper bounds
lower bounds
1+Θ(1/Bc)c > 1
[Wei, Yi, Zhang, SPAA’09]
standard hashing
![Page 88: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/88.jpg)
28-1
Indexability Too Strong!
Naıve solution: For every B items, write to a block.
Query cost is 1, insertion is 1/B
![Page 89: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/89.jpg)
28-2
Indexability Too Strong!
Naıve solution: For every B items, write to a block.
Query cost is 1, insertion is 1/B
Too many possiblemappings!
![Page 90: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/90.jpg)
28-3
Indexability Too Strong!
Naıve solution: For every B items, write to a block.
Query cost is 1, insertion is 1/B
Indexabilty + information-theoretical argument
If with only the information in memory, the hash table cannotlocate the item, then querying it takes at least 2 I/Os.
Too many possiblemappings!
![Page 91: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/91.jpg)
29-1
The Abstraction
Consider the layout of a hash table at any snapshot. Denote allthe blocks on disk by B1, B2, . . . , Bd. Let f : U → 1, . . . , dbe any function computable within memory.
We divide items inserted into 3 zones with respect to f .
Memory zone M : set of items stored in memory. tq = 0.
Fast zone F : set of items x such that x ∈ Bf(x). tq = 1.
Slow zone S: The rest of items. tq = 2.
![Page 92: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/92.jpg)
30-1
The Key
The hash table can employ afamily F of at most 2M dis-tinct f ’s.
Note that the current fadopted by the hash table isdependent upon the already in-serted items, but the family Fhas to be fixed beforehand.
![Page 93: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/93.jpg)
31-1
How about All Queries? (Latest results)
We are essentially talking about the membership problem
Can’t use indexability model
Have to use cell probe model
![Page 94: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/94.jpg)
32-1
All queries (the membership problem)
(The cell probe model)
Query
Insert0
upper bounds
lower bounds
1B
log n 11B
logMB
n
`B
log` n, log` n
nε
truncated buffer tree
buffer tree
hashing
1 + 1/2Ω(B)
Bε
Blog n
![Page 95: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/95.jpg)
32-2
All queries (the membership problem)
(The cell probe model)
Query
Insert0
upper bounds
lower bounds
1B
log n 11B
logMB
n
`B
log` n, log` n
nε
truncated buffer tree
buffer tree
hashing
1.1
[Yi, Zhang, SODA’10]
0.91
1 + 1/2Ω(B)
Bε
Blog n
![Page 96: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/96.jpg)
32-3
All queries (the membership problem)
(The cell probe model)
Query
Insert0
upper bounds
lower bounds
1B
log n 11B
logMB
n
`B
log` n, log` n
nε
truncated buffer tree
buffer tree
hashing
1.1
[Yi, Zhang, SODA’10]
logB logn n
0.91
1 + 1/2Ω(B)[Verbin, Zhang, STOC’10]
Bε
Blog n
![Page 97: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/97.jpg)
33-1
THE BIG BOLD CONJECTURE
All these fundamental data structure prob-lems have the same query-update tradeoffin external memory when u = o(1), for suf-ficiently large B.
Partial-sum: all B; Range reporting: B > nε; Predecessor: unknown.
![Page 98: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/98.jpg)
33-2
THE BIG BOLD CONJECTURE
All these fundamental data structure prob-lems have the same query-update tradeoffin external memory when u = o(1), for suf-ficiently large B.
Strong implication: The buffer tree (and many
of the log method based structures) is simple, practical,versatile, and optimal!
Partial-sum: all B; Range reporting: B > nε; Predecessor: unknown.
![Page 99: B-trees and Hash Tables Dynamic Indexability and the ...yike/talks/external-lb-overview.pdf · Hash Table and B-tree Hash tables and B-trees are taught to undergrads and actually](https://reader036.vdocument.in/reader036/viewer/2022063017/5fd97e948c20543a421ee324/html5/thumbnails/99.jpg)
34-1
The End
T HANK YOU
Q and A