1 data-intensive computing systems operators for data access shivnath babu
TRANSCRIPT
![Page 1: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/1.jpg)
1
Data-intensive Data-intensive Computing SystemsComputing Systems
Operators for Data AccessOperators for Data AccessShivnath BabuShivnath Babu
![Page 2: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/2.jpg)
ProblemProblem
Relation: Employee (ID, Name, Dept, Relation: Employee (ID, Name, Dept, …)…)
10 M tuples10 M tuples (Filter) Query:(Filter) Query:
SELECT *SELECT * FROM Employee FROM Employee WHERE Name = WHERE Name = ““BobBob””
![Page 3: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/3.jpg)
3
Solution #1: Full Table Solution #1: Full Table ScanScan
Storage:Storage: Employee relation stored in Employee relation stored in contiguouscontiguous
blocksblocks Query plan:Query plan:
Scan the entire relation, output tuples with Scan the entire relation, output tuples with Name = Name = ““BobBob””
Cost:Cost: Size of each record = 100 bytesSize of each record = 100 bytes Size of relation = 10 M x 100 = 1 GBSize of relation = 10 M x 100 = 1 GB Time @ 20 MB/s ≈ 1 Minute Time @ 20 MB/s ≈ 1 Minute
![Page 4: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/4.jpg)
4
Solution #2Solution #2
Storage:Storage: Employee relation Employee relation sortedsorted on Name on Name
attributeattribute Query plan:Query plan:
Binary searchBinary search
![Page 5: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/5.jpg)
5
Solution #2Solution #2
Cost:Cost: Size of a block: 1024 bytesSize of a block: 1024 bytes Number of records per block: 1024 / Number of records per block: 1024 /
100 = 10100 = 10 Total number of blocks: 10 M / 10 = 1 Total number of blocks: 10 M / 10 = 1
MM Blocks accessed by binary search: 20Blocks accessed by binary search: 20 Total time: 20 ms x 20 = 400 msTotal time: 20 ms x 20 = 400 ms
![Page 6: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/6.jpg)
6
Solution #2: IssuesSolution #2: Issues
Filters on different attributes:Filters on different attributes:
SELECT * SELECT * FROM EmployeeFROM EmployeeWHERE Dept = WHERE Dept = ““SalesSales””
Inserts and DeletesInserts and Deletes
![Page 7: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/7.jpg)
7
IndexesIndexes
Data structures that efficiently evaluate Data structures that efficiently evaluate a class of filter predicates over a relationa class of filter predicates over a relation
Class of filter predicates:Class of filter predicates: Single or multi-attributes (Single or multi-attributes (index-key index-key
attributesattributes)) Range and/or equality predicatesRange and/or equality predicates
(Usually) independent of physical (Usually) independent of physical storage of relation:storage of relation: Multiple indexes per relationMultiple indexes per relation
![Page 8: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/8.jpg)
8
IndexesIndexes
Disk residentDisk resident Large to fit in memoryLarge to fit in memory PersistentPersistent
Updated when indexed relation Updated when indexed relation updatedupdated Relation updates costlierRelation updates costlier Query cheaperQuery cheaper
![Page 9: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/9.jpg)
ProblemProblem
Relation: Employee (ID, Name, Dept, Relation: Employee (ID, Name, Dept, …)…)
(Filter) Query:(Filter) Query: SELECT *SELECT * FROM Employee FROM Employee WHERE Name = WHERE Name = ““BobBob””
Single-Attribute Index on NameName that supports equality predicates
![Page 10: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/10.jpg)
10
RoadmapRoadmap
MotivationMotivation Single-Attribute Indexes: OverviewSingle-Attribute Indexes: Overview Order-based IndexesOrder-based Indexes
B-TreesB-Trees Hash-based Indexes (May cover in Hash-based Indexes (May cover in
future)future) Extensible HashingExtensible Hashing Linear HashingLinear Hashing
Multi-Attribute Indexes (Chapter 14 Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)GMUW, May cover in future)
![Page 11: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/11.jpg)
Single Attribute Index: General Single Attribute Index: General ConstructionConstruction
b1
2b
ib
nb
a1
2a
ia
na
A B
![Page 12: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/12.jpg)
Single Attribute Index: General Single Attribute Index: General ConstructionConstruction
b1
2b
ib
nb
a1
2a
ia
na
a1
2a
ia
na
A B
A = val
A > lowA < high
![Page 13: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/13.jpg)
13
ExceptionsExceptions
Sparse IndexesSparse Indexes Require specific physical layout of Require specific physical layout of
relationrelation Example: Relation sorted on indexed Example: Relation sorted on indexed
attributeattribute More efficientMore efficient
![Page 14: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/14.jpg)
14
Single Attribute Index: General Single Attribute Index: General ConstructionConstruction
b1
2b
ib
nb
a1
2a
ia
na
a1
2a
ia
na
A B
A = val
A > lowA < high
Textbook: Dense Index
![Page 15: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/15.jpg)
Single Attribute Index: General Single Attribute Index: General ConstructionConstruction
a1
2a
ia
na
A = val
A > lowA < high
How do we organize(attribute, pointer) pairs?
Idea: Use dictionary data structures
Issue: Disk resident?
![Page 16: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/16.jpg)
16
RoadmapRoadmap
MotivationMotivation Single-Attribute Indexes: OverviewSingle-Attribute Indexes: Overview Order-based IndexesOrder-based Indexes
B-TreesB-Trees Hash-based IndexesHash-based Indexes
Extensible HashingExtensible Hashing Linear HashingLinear Hashing
Multi-Attribute IndexesMulti-Attribute Indexes
![Page 17: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/17.jpg)
17
B-TreesB-Trees
Adaptation of search tree data Adaptation of search tree data structurestructure 2-3 trees2-3 trees
Supports range predicates (and Supports range predicates (and equality)equality)
![Page 18: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/18.jpg)
Use Binary Search Tree Use Binary Search Tree Directly?Directly?
16 32 54 71 74 83 92
16 74
71
54
32
92
83
![Page 19: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/19.jpg)
19
Use Binary Search Tree Use Binary Search Tree Directly?Directly?
Store records of type Store records of type <key, left-ptr, right-ptr, data-ptr><key, left-ptr, right-ptr, data-ptr>
Remember position of rootRemember position of root Question: will this work?Question: will this work?
YesYes But we can do better!But we can do better!
![Page 20: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/20.jpg)
20
Use Binary Search Tree Use Binary Search Tree Directly?Directly?
Number of keys: 1 MNumber of keys: 1 M Number of levels: log (2^20) = 20Number of levels: log (2^20) = 20 Total cost index lookup: 20 random Total cost index lookup: 20 random
disk I/Odisk I/O 20 x 20 ms = 400 ms20 x 20 ms = 400 ms
B-Tree: less than 3 random disk I/OB-Tree: less than 3 random disk I/O
![Page 21: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/21.jpg)
21
B-Tree vs. Binary Search B-Tree vs. Binary Search TreeTree
k k1 k2 k3 k40
1 Random I/O prunes tree by half
1 Random I/O prunes tree by 40
![Page 22: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/22.jpg)
22
B-Tree ExampleB-Tree Example
15 36 57 63 76 87 92 100
![Page 23: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/23.jpg)
23
B-Tree ExampleB-Tree Example
null
63
36
15 36 57
84
63 76 87
91
92 100
![Page 24: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/24.jpg)
24
Meaning of Internal Meaning of Internal NodeNode
84 91
key < 84 84 ≤ key < 91 91 ≤ key
![Page 25: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/25.jpg)
25
B-Tree ExampleB-Tree Example
null
63
36
15 36 57
84
63 76 87
91
92 100
![Page 26: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/26.jpg)
26
Meaning of Leaf NodesMeaning of Leaf Nodes
63 76
pointer to record 63pointer to record 76
Next leaf
![Page 27: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/27.jpg)
27
Equality PredicatesEquality Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
key = 87
![Page 28: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/28.jpg)
28
Equality PredicatesEquality Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
key = 87
![Page 29: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/29.jpg)
29
Equality PredicatesEquality Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
key = 87
![Page 30: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/30.jpg)
30
Equality PredicatesEquality Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
key = 87
![Page 31: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/31.jpg)
31
Range PredicatesRange Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
57 ≤ key < 95
![Page 32: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/32.jpg)
32
Range PredicatesRange Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
57 ≤ key < 95
![Page 33: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/33.jpg)
33
Range PredicatesRange Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
57 ≤ key < 95
![Page 34: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/34.jpg)
34
Range PredicatesRange Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
57 ≤ key < 95
![Page 35: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/35.jpg)
35
Range PredicatesRange Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
57 ≤ key < 95
![Page 36: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/36.jpg)
36
Range PredicatesRange Predicates
null
63
36
15 36 57
84
63 76 87
91
92 100
57 ≤ key < 95
![Page 37: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/37.jpg)
37
General B-TreesGeneral B-Trees
Fixed parameter: nFixed parameter: n Number of keys: nNumber of keys: n Number of pointers: n + 1Number of pointers: n + 1
![Page 38: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/38.jpg)
38
B-Tree ExampleB-Tree Example
null
63
36
15 36 57
84
63 76 87
91
92 100
n = 2
![Page 39: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/39.jpg)
39
General B-TreesGeneral B-Trees
Fixed parameter: nFixed parameter: n Number of keys: nNumber of keys: n Number of pointers: n + 1Number of pointers: n + 1 All leaves at same depthAll leaves at same depth All (key, record pointer) in leavesAll (key, record pointer) in leaves
![Page 40: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/40.jpg)
40
B-Tree ExampleB-Tree Example
null
63
36
15 36 57
84
63 76 87
91
92 100
n = 2
![Page 41: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/41.jpg)
41
General B-Trees: General B-Trees: Space related constraintsSpace related constraints
Use at leastUse at least
Root: 2 pointersRoot: 2 pointers
Internal:Internal: (n+1)/2(n+1)/2pointerspointers
Leaf:Leaf: (n+1)/2(n+1)/2 pointers to pointers to datadata
![Page 42: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/42.jpg)
42
n=3
5 15 21 15
31 42 56 31 42
Internal
Leaf
Max Min
![Page 43: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/43.jpg)
43
Leaf NodesLeaf Nodes
n key slots
(n+1) pointer slots
![Page 44: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/44.jpg)
44
Leaf NodesLeaf Nodes
n key slots
(n+1) pointer slots
k kk1 2 mk3 … …
unused
record of k1record of k
2
… …
record of k m
…nextleaf
![Page 45: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/45.jpg)
45
Leaf NodesLeaf Nodes
n key slots
(n+1) pointer slots
k kk1 2 mk3 … …
unused
record of k1record of k
2
… …
record of k m
…
m ≥ (n+1)
2
nextleaf
![Page 46: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/46.jpg)
46
Internal NodesInternal Nodes
n key slots
(n+1) pointer slots
![Page 47: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/47.jpg)
47
Internal NodesInternal Nodes
n key slots
(n+1) pointer slots
k kk1 2 mk3
key < k 1k ≤ key < k1 2 k ≤ keym
unused
![Page 48: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/48.jpg)
48
Internal NodesInternal Nodes
n key slots
(n+1) pointer slots
k kk1 2 mk3
key < k 1k ≤ key < k1 2 k ≤ keym
unused(m+1) ≥
(n+1)
2
![Page 49: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/49.jpg)
49
Root NodeRoot Node
n key slots
(n+1) pointer slots
k kk1 2 mk3
key < k 1k ≤ key < k1 2 k ≤ keym
unused(m+1) ≥2
![Page 50: 1 Data-intensive Computing Systems Operators for Data Access Shivnath Babu](https://reader035.vdocument.in/reader035/viewer/2022062720/56649f145503460f94c2916b/html5/thumbnails/50.jpg)
50
LimitsLimits
Why the specific limitsWhy the specific limits (n+1)/2(n+1)/2 and and (n+1)/2(n+1)/2 ? ?
Why different limits for leaf and Why different limits for leaf and internal nodes?internal nodes?
Can we reduce each limit?Can we reduce each limit? Can we increase each limit?Can we increase each limit? What are the implications?What are the implications?