cs 44321 cs4432: database systems ii. cs 44322 index definition in sql create index name on rel...
Post on 22-Dec-2015
217 views
TRANSCRIPT
CS 4432 1
CS4432: Database Systems II
CS 4432 2
Index definition in SQL
• Create index name on rel (attr)
(Check online for index definitions in SQL)
• Drop INDEX name
CS 4432 3
ATTRIBUTE LIST MULTIKEY INDEX
e.g., CREATE INDEX foo ON R(A,B,C)
Note
CS 4432 4
Motivation: Find records where DEPT = “Toy” AND SAL >
50k
Multi-key Index
CS 4432 5
Strategy I:
• Use one index, say Dept.• Get all Dept = “Toy” records
and check their salary
I1
CS 4432 6
• Use 2 Indexes; Manipulate Pointers
Toy Sal>
50k
Strategy II:
CS 4432 7
• Multiple Key Index
One idea:
Strategy III:
I1
I2
I3
CS 4432 8
Example
ExampleRecord
DeptIndex
SalaryIndex
Name=JoeDEPT=SalesSAL=15k
ArtSalesToy
10k15k17k21k
12k15k15k19k
CS 4432 9
For which queries is this index good?
Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20k
CS 4432 10
Many alternate methods for indexing
CS 4432 11
key h(key)
Hashing
<key>
.
.
Buckets(typically 1disk block)
CS 4432 12
One example hash function
• Key = ‘x1 x2 … xn’ n-byte character string
• Have b buckets
• Hash function :– h: add (x1 + x2 + ….. Xn) modulo b
CS 4432 13
This may not be best function … Read Knuth Vol. 3 if you really
need to select a good function.
Good hash Expected number of
function: keys/bucket is thesame for all
buckets
CS 4432 14
Within a bucket:
• Do we keep keys sorted?
• Yes, if CPU time critical & Inserts/Deletes not too frequent
CS 4432 15
Next: example to illustrateinserts, overflows,
deletes
h(K)
CS 4432 16
EXAMPLE 2 records/bucket
INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0
0
1
2
3
d
ac
b
h(e) = 1
e
CS 4432 17
0
1
2
3
a
bc
e
d
EXAMPLE: deletion
Delete:ef
fg
maybe move“g” up
cd
CS 4432 18
Rule of thumb:• Try to keep space utilization
between 50% and 80% Utilization = # keys used
total # keys that fit
• If < 50%, wasting space• If > 80%, overflows significant
depends on how good hash function is & on # keys/bucket
CS 4432 19
How do we cope with growth?
• Overflows and reorganizations• Dynamic hashing
• Extensible hashing• Others …
CS 4432 20
Extensible hashing : idea 1
(a) Use i of b bits output by hash function
b h(K)
use i grows over time….
Note: enables future doubling of space !
00110101
CS 4432 21
(b) Hash to directory of pointers to buckets (instead of buckets directly)
h(K)[i ] to bucket
Note : Double space by doubling the directory !
.
.
.
.
Extensible hashing : idea 2
CS 4432 22
Example: h(k) is 4 bits; 2 keys/bucket
i = 1
1
1
0001
1001
1100
Insert 1010
11100
1010
New directory
200
01
10
11
i =
2
2
01
CS 4432 23
10001
21001
1010
21100
Insert:
0111
0000
00
01
10
11
2i =
Example continued
0111
0000
0111
0001
2
2
CS 4432 24
00
01
10
11
2i =
21001
1010
21100
20111
20000
0001
Insert:
1001
Example continued
1001
1001
1010
000
001
010
011
100
101
110
111
3i =
3
3
CS 4432 25
Extensible hashing: deletion
• Merge blocks and cut directory if possible
(Reverse insert procedure)
CS 4432 26
Extensible hashing
If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files
- with less wasted space- with no full reorganizations
Summary
+
Indirection(Not bad if directory in
memory)
Directory doubles in size(Now it fits, now it does not)
-
-
+
CS 4432 27
Use what when :
• Indexing : Tree-Structures vs Hashing
CS 4432 28
• Hashing good for probes given keye.g., SELECT …
FROM RWHERE R.A = 5
Indexing vs Hashing
CS 4432 29
• INDEXING (Including B Trees) good for
Range Searches:e.g., SELECT
FROM RWHERE R.A > 5
Indexing vs Hashing
CS 4432 30
Reading Chapter 14
• Read – 14.3.1 and 14.3.2
CS 4432 31
The BIG picture….
• Chapters 11 & 12: Storage, records, blocks...
• Chapter 13 & 14: Access Mechanisms - Indexes
- B trees - Hashing - Multi key
• Chapter 15 & 16: Query ProcessingNEXT