hashing
DESCRIPTION
Hashing. Motivating Applications. Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching. Direct Address Tables. If the keys domain is U Create an array T of size U - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/1.jpg)
Hashing
![Page 2: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/2.jpg)
Motivating Applications
• Large collection of datasets
• Datasets are dynamic (insert, delete)
• Goal: efficient searching/insertion/deletion
• Hashing is ONLY applicable for exact-match searching
![Page 3: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/3.jpg)
Direct Address Tables
• If the keys domain is U Create an array T of size U• For each key K add the object to T[K] • Supports insertion/deletion/searching in O(1)
![Page 4: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/4.jpg)
Direct Address TablesAlg.: DIRECT-ADDRESS-SEARCH(T, k)
return T[k]
Alg.: DIRECT-ADDRESS-INSERT(T, x)T[key[x]] ← x
Alg.: DIRECT-ADDRESS-DELETE(T, x)T[key[x]] ← NIL
• Running time for these operations: O(1)
Drawbacks>> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible)>> Limited to integer values and does not support duplication
Solution is to use hashing tables
![Page 5: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/5.jpg)
Direct Access Tables: ExampleExample 1:
Example 2:
U is the domain
K is the actual number of keys
![Page 6: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/6.jpg)
Hashing
• A data structure that maps values from a certain domain or range to another domain or range
Hash function3
15
20
55
Domain: String values
Domain: Integer values
![Page 7: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/7.jpg)
Hashing
• A data structure that maps values from a certain domain or range to another domain or range
Hash function
Domain: numbers [950,000 … 960,000]
Student IDs
950000…..
960000Domain: numbers [0 … 10,000]
Range
0…..
10000
![Page 8: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/8.jpg)
Hash Tables
• When K is much smaller than U, a hash table requires much less space than a direct-address table– Can reduce storage requirements to |K|– Can still get O(1) search time, but on the average case,
not the worst case
![Page 9: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/9.jpg)
Hash Tables: Main Idea
• Use a hash function h to compute the slot for each key k
• Store the element in slot h(k)
• Maintain a hash table of size m T [0…m-1]
• A hash function h transforms a key into an index in a hash table T[0…m-1]:
h : U → {0, 1, . . . , m - 1}
• We say that k hashes to slot h(k)
![Page 10: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/10.jpg)
Hash Tables: Main Idea
U(universe of keys)
K(actualkeys)
0
m - 1
h(k3)
h(k2) = h(k5)
h(k1)h(k4)
k1k4 k2
k5k3
Hash Table (of size m)
>> m is much smaller that U (m <<U)
>> m can be even smaller than |K|
![Page 11: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/11.jpg)
Example
• Back to the example of 100 students, each with 9-digit SSN
• All what we need is a hash table of size 100
![Page 12: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/12.jpg)
What About Collisions
U(universe of keys)
K(actualkeys)
0
m - 1
h(k3)
h(k2) = h(k5)
h(k1)h(k4)
k1k4 k2
k5k3
Collisions!
• Collision means two or more keys will go to the same slot
![Page 13: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/13.jpg)
Handling Collisions
• Many ways to handle it– Chaining– Open addressing• Linear probing• Quadratic probing• Double hashing
![Page 14: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/14.jpg)
Chaining: Main Idea
• Put all elements that hash to the same slot into a linked list (Chain)• Slot j contains a pointer to the head of the list of all elements that hash to j
![Page 15: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/15.jpg)
Chaining - Discussion• Choosing the size of the hash table– Small enough not to waste space
– Large enough such that lists remain short
– Typically 10% -20% of the total number of elements
• How should we keep the lists: ordered or not?– Usually each list is unsorted linked list
![Page 16: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/16.jpg)
Insertion in Hash TablesAlg.: CHAINED-HASH-INSERT(T, x)
insert x at the head of list T[h(key[x])]
• Worst-case running time is O(1)• May or may not allow duplication based on
the application
![Page 17: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/17.jpg)
Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(key[x])]
• Need to find the element to be deleted.• Worst-case running time:– Deletion depends on searching the corresponding list
![Page 18: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/18.jpg)
Searching in Hash Tables
Alg.: CHAINED-HASH-SEARCH(T, k)
search for an element with key k in list T[h(k)]
• Running time is proportional to the length of
the list of elements in slot h(k)
What is the worst case and average case??
![Page 19: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/19.jpg)
Analysis of Hashing with Chaining:Worst Case
• All keys will go to only one chain
• Chain size is O(n)
• Searching is O(n) + time to apply h(k)
0
m - 1
T
chain
![Page 20: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/20.jpg)
Analysis of Hashing with Chaining:Average Case
• With good hash function and uniform distribution of keys– Any given element is equally likely to hash into any of
the m slots
• All chain will have similar sizes
• Assume n (total # of keys), m is the hash table size– Average chain size O (n/m)
0
m - 1
T
chainchain
chain
chain
Average Search Time O(n/m): The common case
![Page 21: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/21.jpg)
• If m (# of slots) is proportional to n (# of keys):
– m = O(n)
– n/m = O(1)
Searching takes constant time on average
Analysis of Hashing with Chaining:Average Case
![Page 22: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/22.jpg)
Hash Functions
![Page 23: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/23.jpg)
Hash Functions • A hash function transforms a key (k) into a table address (0…
m-1)
• What makes a good hash function?(1) Easy to compute(2) Approximates a random function: for every input, every output is
equally likely (simple uniform hashing)(3) Reduces the number of collisions
![Page 24: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/24.jpg)
Hash Functions• Goal: Map a key k into one of the m slots in the hash table
• Make table size (m) a prime number– Avoids even and power-of-2 numbers
• Common functionh(k) = F(k) mod m
Some function or operation on K (usually generates an integer)
The output of the “mod” is number [0…m-1]
![Page 25: Hashing](https://reader035.vdocument.in/reader035/viewer/2022062520/568163d3550346895dd51d4a/html5/thumbnails/25.jpg)
Examples of Hash FunctionsCollection of images
F(k): Sum of the pixels colors
h(k) = F(k) mod m
Collection of stringsF(k): Sum of the ascii values
h(k) = F(k) mod m
Collection of numbersF(k): just return k
h(k) = F(k) mod m