csc401 – analysis of algorithms lecture notes 5 heaps and hash tables objectives: introduce heaps,...

32
CSC401 – Analysis of Algorithms CSC401 – Analysis of Algorithms Lecture Notes 5 Lecture Notes 5 Heaps and Hash Tables Heaps and Hash Tables Objectives: Objectives: Introduce Heaps, Heap-sorting, and Introduce Heaps, Heap-sorting, and Heap-construction Heap-construction Analyze the performance of operations Analyze the performance of operations on Heap structures on Heap structures Introduce Hash tables and discuss hash Introduce Hash tables and discuss hash functions functions Present collision handling strategies Present collision handling strategies of hash tables and analyze the of hash tables and analyze the performance of hash table operations performance of hash table operations

Post on 19-Dec-2015

225 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

CSC401 – Analysis of Algorithms CSC401 – Analysis of Algorithms Lecture Notes 5Lecture Notes 5

Heaps and Hash TablesHeaps and Hash TablesObjectives:Objectives:

Introduce Heaps, Heap-sorting, and Heap-Introduce Heaps, Heap-sorting, and Heap-constructionconstructionAnalyze the performance of operations on Analyze the performance of operations on Heap structuresHeap structuresIntroduce Hash tables and discuss hash Introduce Hash tables and discuss hash functions functions Present collision handling strategies of hash Present collision handling strategies of hash tables and analyze the performance of hash tables and analyze the performance of hash table operationstable operations

Page 2: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

22

What is a heapWhat is a heapA heap is a binary tree A heap is a binary tree storing keys at its storing keys at its internal nodes and internal nodes and satisfying the following satisfying the following properties:properties:– Heap-Order:Heap-Order: for every for every

internal node v other internal node v other than the root,than the root,keykey((vv)) keykey((parentparent((vv))))

– Complete Binary Tree:Complete Binary Tree: let let hh be the height of the be the height of the heapheap

for for i i 0, … , 0, … , h h 1,1, there there are are 22ii nodes of depth nodes of depth iiat depth at depth hh 1 1, the , the internal nodes are to the internal nodes are to the left of the external nodesleft of the external nodes

2

65

79

The last node of a The last node of a heap is the rightmost heap is the rightmost internal node of depth internal node of depth hh 1 1

last node

Page 3: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

33

Height of a HeapHeight of a HeapTheorem:Theorem: A heap storing A heap storing nn keys has height keys has height OO(log (log nn))

Proof: (we apply the complete binary tree property)Proof: (we apply the complete binary tree property)– Let Let hh be the height of a heap storing be the height of a heap storing n n keyskeys– Since there are Since there are 22ii keys at depth keys at depth ii 0, … , 0, … , h h 2 2 and at least and at least

one key at depth one key at depth h h 11, we have , we have nn 1 1 2 2 4 4 … … 2 2hh2 2 11

– Thus, Thus, nn 22hh1 1 , i.e., , i.e., hh log log n n 11

1

2

2h2

1

keys

0

1

h2

h1

depth

Page 4: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

44

Heaps and Priority QueuesHeaps and Priority Queues

We can use a heap to implement a priority queueWe can use a heap to implement a priority queue

We store a (key, element) item at each internal We store a (key, element) item at each internal nodenode

We keep track of the position of the last nodeWe keep track of the position of the last node

For simplicity, we show only the keys in the picturesFor simplicity, we show only the keys in the pictures

(2, Sue)

(6, Mark)(5, Pat)

(9, Jeff) (7, Anna)

Page 5: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

55

Insertion into a HeapInsertion into a HeapMethod insertItem of Method insertItem of the priority queue ADT the priority queue ADT corresponds to the corresponds to the insertion of a key insertion of a key kk to to the heapthe heapThe insertion algorithm The insertion algorithm consists of three stepsconsists of three steps– Find the insertion node Find the insertion node zz

(the new last node)(the new last node)– Store Store kk at at zz and expand and expand z z

into an internal nodeinto an internal node– Restore the heap-order Restore the heap-order

property (discussed property (discussed next)next)

2

65

79

insertion node

2

65

79 1

z

z

Page 6: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

66

UpheapUpheapAfter the insertion of a new key After the insertion of a new key kk, the heap-order , the heap-order property may be violatedproperty may be violated

Algorithm upheap restores the heap-order property by Algorithm upheap restores the heap-order property by swapping swapping kk along an upward path from the insertion node along an upward path from the insertion node

Upheap terminates when the key Upheap terminates when the key kk reaches the root or a reaches the root or a node whose parent has a key smaller than or equal to node whose parent has a key smaller than or equal to kk

Since a heap has height Since a heap has height OO(log (log nn)), upheap runs in , upheap runs in OO(log (log nn)) timetime

2

15

79 6z

1

25

79 6z

Page 7: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

77

Removal from a HeapRemoval from a HeapMethod removeMin of Method removeMin of the priority queue ADT the priority queue ADT corresponds to the corresponds to the removal of the root removal of the root key from the heapkey from the heapThe removal algorithm The removal algorithm consists of three stepsconsists of three steps– Replace the root key Replace the root key

with the key of the last with the key of the last node node ww

– Compress Compress ww and its and its children into a leafchildren into a leaf

– Restore the heap-order Restore the heap-order property (discussed property (discussed next)next)

2

65

79

last node

w

7

65

9w

Page 8: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

88

DownheapDownheapAfter replacing the root key with the key After replacing the root key with the key kk of the last node, of the last node, the heap-order property may be violatedthe heap-order property may be violated

Algorithm downheap restores the heap-order property by Algorithm downheap restores the heap-order property by swapping key swapping key kk along a downward path from the root along a downward path from the root

Upheap terminates when key Upheap terminates when key kk reaches a leaf or a node reaches a leaf or a node whose children have keys greater than or equal to whose children have keys greater than or equal to kk

Since a heap has height Since a heap has height OO(log (log nn)), downheap runs in , downheap runs in OO(log (log nn)) timetime

7

65

9w

5

67

9w

Page 9: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

99

Updating the Last NodeUpdating the Last NodeThe insertion node can be found by traversing a path of The insertion node can be found by traversing a path of OO(log (log nn) ) nodesnodes– Go up until a left child or the root is reachedGo up until a left child or the root is reached– If a left child is reached, go to the right childIf a left child is reached, go to the right child– Go down left until a leaf is reachedGo down left until a leaf is reached

Similar algorithm for updating the last node after a Similar algorithm for updating the last node after a removalremoval

Page 10: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1010

Heap-SortHeap-SortConsider a priority Consider a priority queue with queue with nn items items implemented by implemented by means of a heapmeans of a heap– the space used is the space used is OO((nn))

– methods methods insertIteminsertItem and and removeMinremoveMin take take OO(log (log nn) ) timetime

– methods methods sizesize, , isEmptyisEmpty, , minKeyminKey, and , and minElementminElement take time take time OO(1) (1) timetime

Using a heap-based Using a heap-based priority queue, we can priority queue, we can sort a sequence of sort a sequence of nn elements in elements in OO((nn log log nn) ) timetime

The resulting algorithm The resulting algorithm is called heap-sortis called heap-sort

Heap-sort is much Heap-sort is much faster than quadratic faster than quadratic sorting algorithms, sorting algorithms, such as insertion-sort such as insertion-sort and selection-sortand selection-sort

Page 11: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1111

Vector-based Heap ImplementationVector-based Heap ImplementationWe can represent a heap with We can represent a heap with nn keys by means of a vector of keys by means of a vector of length length n n 1 1For the node at rank For the node at rank ii– the left child is at rank the left child is at rank 22ii– the right child is at rank the right child is at rank 22i i 1 1

Links between nodes are not Links between nodes are not explicitly storedexplicitly storedThe leaves are not representedThe leaves are not representedThe cell of at rank The cell of at rank 00 is not used is not usedOperation insertItem Operation insertItem corresponds to inserting at corresponds to inserting at rank rank n n 1 1Operation removeMin Operation removeMin corresponds to removing at corresponds to removing at rank rank nnYields in-place heap-sortYields in-place heap-sort

2

65

79

2 5 6 9 7

1 2 3 4 50

Page 12: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1212

Merging Two HeapsMerging Two HeapsWe are given two two We are given two two heaps and a key heaps and a key kk

We create a new heap We create a new heap with the root node with the root node storing storing kk and with the and with the two heaps as subtreestwo heaps as subtrees

We perform downheap We perform downheap to restore the heap-to restore the heap-order property order property

7

3

58

2

64

3

58

2

64

2

3

58

4

67

Page 13: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1313

We can construct a We can construct a heap storing heap storing nn given given keys in using a keys in using a bottom-up bottom-up construction with construction with log log nn phases phases

In phase In phase ii, pairs of , pairs of heaps with heaps with 22i i 11 keys keys are merged into are merged into heaps with heaps with 22ii1111 keys keys

Bottom-up Heap ConstructionBottom-up Heap Construction

2i 1 2i 1

2i11

Page 14: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1414

ExampleExample

1516 124 76 2023

25

1516

5

124

11

76

27

2023

Page 15: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1515

Example (contd.)Example (contd.)

25

1516

5

124

11

96

27

2023

15

2516

4

125

6

911

23

2027

Page 16: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1616

Example (contd.)Example (contd.)

7

15

2516

4

125

8

6

911

23

2027

4

15

2516

5

127

6

8

911

23

2027

Page 17: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1717

Example (end)Example (end)

4

15

2516

5

127

10

6

8

911

23

2027

5

15

2516

7

1210

4

6

8

911

23

2027

Page 18: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1818

AnalysisAnalysisWe visualize the worst-case time of a downheap with a We visualize the worst-case time of a downheap with a proxy path that goes first right and then repeatedly goes proxy path that goes first right and then repeatedly goes left until the bottom of the heap (this path may differ from left until the bottom of the heap (this path may differ from the actual downheap path)the actual downheap path)

Since each node is traversed by at most two proxy paths, Since each node is traversed by at most two proxy paths, the total number of nodes of the proxy paths is the total number of nodes of the proxy paths is OO((nn))

Thus, bottom-up heap construction runs in Thus, bottom-up heap construction runs in OO((nn) ) time time

Bottom-up heap construction is faster than Bottom-up heap construction is faster than nn successive successive insertions and speeds up the first phase of heap-sortinsertions and speeds up the first phase of heap-sort

Page 19: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

1919

Hash Functions and Hash TablesHash Functions and Hash Tables A A hash functionhash function hh maps keys of a given type to maps keys of a given type to integers in a fixed interval integers in a fixed interval [0, [0, NN1]1]– Example: Example: hh((xx) ) xx mod mod N N iis a hash function for integer keyss a hash function for integer keys– The integer The integer hh((xx)) is called the is called the hash valuehash value of key of key xx

A A hash tablehash table for a given key type consists of for a given key type consists of– A hash function A hash function hh– An array (called table) of size An array (called table) of size NN

ExampleExample– We design a hash table for a We design a hash table for a

dictionary storing items (SSN, dictionary storing items (SSN, Name), where SSN (social Name), where SSN (social security number) is a nine-digit security number) is a nine-digit positive integerpositive integer

– Our hash table uses an array of Our hash table uses an array of sizesize NN10,00010,000 and the hash and the hash functionfunctionhh((xx))last four digits of last four digits of xx

01234

999799989999

451-229-0004

981-101-0002

200-751-9998

025-612-0001

Page 20: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2020

Hash FunctionsHash FunctionsA hash function is A hash function is usually specified as usually specified as the composition of the composition of two functions:two functions:

Hash code mapHash code map:: hh11:: keyskeys integersintegers

Compression mapCompression map:: hh22: integers: integers [0, [0, NN1]1]

The hash code map The hash code map is applied first, and is applied first, and the compression the compression map is applied next map is applied next on the result, i.e., on the result, i.e.,

hh((xx) = ) = hh22((hh11((xx))))

The goal of the The goal of the hash function is to hash function is to “disperse” the keys “disperse” the keys in an apparently in an apparently random wayrandom way

Page 21: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2121

Hash Code MapsHash Code MapsMemory addressMemory address::– We reinterpret the memory We reinterpret the memory

address of the key object as address of the key object as an integer (default hash an integer (default hash code of all Java objects)code of all Java objects)

– Good in general, except for Good in general, except for numeric and string keysnumeric and string keys

Integer castInteger cast::– We reinterpret the bits of We reinterpret the bits of

the key as an integerthe key as an integer– Suitable for keys of length Suitable for keys of length

less than or equal to the less than or equal to the number of bits of the number of bits of the integer type (e.g., byte, integer type (e.g., byte, short, int and float in Java)short, int and float in Java)

Component sumComponent sum::– We partition the bits of We partition the bits of

the key into the key into components of fixed components of fixed length (e.g., 16 or 32 length (e.g., 16 or 32 bits) and we sum the bits) and we sum the components (ignoring components (ignoring overflows)overflows)

– Suitable for numeric Suitable for numeric keys of fixed length keys of fixed length greater than or equal greater than or equal to the number of bits to the number of bits of the integer type of the integer type (e.g., long and double (e.g., long and double in Java)in Java)

Page 22: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2222

Hash Code Maps (cont.)Hash Code Maps (cont.)Polynomial accumulationPolynomial accumulation::– We partition the bits of the We partition the bits of the

key into a sequence of key into a sequence of components of fixed length components of fixed length (e.g., 8, 16 or 32 bits)(e.g., 8, 16 or 32 bits) aa0 0 aa11 … … aann11

– We evaluate the polynomialWe evaluate the polynomial

pp((zz)) a a00 aa1 1 zz aa2 2 zz22 … … … …

aann11zznn11

at a fixed value at a fixed value zz, ignoring , ignoring overflowsoverflows

– Especially suitable for strings Especially suitable for strings (e.g., the choice (e.g., the choice z z 3333 gives gives at most 6 collisions on a set at most 6 collisions on a set of 50,000 English words)of 50,000 English words)

Polynomial Polynomial pp((zz)) can can be evaluated in be evaluated in OO((nn)) time using Horner’s time using Horner’s rule:rule:– The following The following

polynomials are polynomials are successively successively computed, each from computed, each from the previous one in the previous one in OO(1)(1) time time

pp00((zz)) a ann11

ppii ((zz)) a annii1 1 zpzpii11((zz))

((i i 1, 2, …, 1, 2, …, n n 1)1)

We have We have pp((zz) ) p pnn11((zz) )

Page 23: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2323

Compression MapsCompression MapsDivisionDivision::– hh2 2 ((yy) ) y y modmod N N

– The size The size NN of the of the hash table is hash table is usually chosen to usually chosen to be a prime be a prime

– The reason has to The reason has to do with number do with number theory and is theory and is beyond the scope beyond the scope of this courseof this course

Multiply, Add and Multiply, Add and Divide (MAD)Divide (MAD)::– hh2 2 ((yy) ) ((ay ay b b)) modmod N N

– aa and and bb are are nonnegative nonnegative integers such thatintegers such that

a a modmod N N 0 0

– Otherwise, every Otherwise, every integer would map integer would map to the same value to the same value bb

Page 24: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2424

Collision HandlingCollision Handling

Collisions occur Collisions occur when different when different elements are elements are mapped to the mapped to the same cellsame cell

ChainingChaining: let each : let each cell in the table cell in the table point to a linked list point to a linked list of elements that of elements that map theremap there

Chaining is simple, Chaining is simple, but requires but requires additional memory additional memory outside the tableoutside the table

01234 451-229-0004 981-101-0004

025-612-0001

Page 25: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2525

Linear ProbingLinear ProbingOpen addressingOpen addressing: the : the colliding item is placed in colliding item is placed in a different cell of the a different cell of the tabletableLinear probingLinear probing handles handles collisions by placing the collisions by placing the colliding item in the next colliding item in the next (circularly) available (circularly) available table celltable cellEach table cell inspected Each table cell inspected is referred to as a is referred to as a “probe”“probe”Colliding items lump Colliding items lump together, causing future together, causing future collisions to cause a collisions to cause a longer sequence of longer sequence of probesprobes

Example:Example:– hh((xx) ) x x modmod 1313

– Insert keys 18, 41, Insert keys 18, 41, 22, 44, 59, 32, 31, 22, 44, 59, 32, 31, 73, in this order73, in this order

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18445932223173 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 26: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2626

Search with Linear ProbingSearch with Linear ProbingConsider a hash Consider a hash table table AA that uses that uses linear probinglinear probing

findElementfindElement((kk))– We start at cell We start at cell hh((kk) )

– We probe consecutive We probe consecutive locations until one of locations until one of the following occursthe following occurs

An item with key An item with key kk is is found, orfound, or

An empty cell is An empty cell is found, orfound, or

NN cells have been cells have been unsuccessfully unsuccessfully probed probed

Algorithm findElement(k)i h(k)p 0repeat

c A[i]if c

return NO_SUCH_KEY else if c.key () k

return c.element()else

i (i 1) mod Np p 1

until p Nreturn NO_SUCH_KEY

Page 27: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2727

Updates with Linear ProbingUpdates with Linear ProbingTo handle insertions and To handle insertions and deletions, we introduce a deletions, we introduce a special object, called special object, called AVAILABLEAVAILABLE, which , which replaces deleted replaces deleted elementselements

removeElementremoveElement((kk))– We search for an item We search for an item

with key with key kk

– If such an item If such an item ((k, ok, o)) is is found, we replace it with found, we replace it with the special item the special item AVAILABLEAVAILABLE and we and we return element return element oo

– Else, we return Else, we return NO_SUCH_KEYNO_SUCH_KEY

insert Iteminsert Item((k, ok, o))– We throw an We throw an

exception if the table exception if the table is fullis full

– We start at cell We start at cell hh((kk) ) – We probe consecutive We probe consecutive

cells until one of the cells until one of the following occursfollowing occurs

A cell A cell ii is found that is found that is either empty or is either empty or stores stores AVAILABLEAVAILABLE, or, orNN cells have been cells have been unsuccessfully unsuccessfully probedprobed

– We store item We store item ((k, ok, o)) in in cell cell ii

Page 28: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2828

Double HashingDouble HashingDouble hashing uses a Double hashing uses a secondary hash function secondary hash function dd((kk) ) and handles collisions by and handles collisions by placing an item in the first placing an item in the first available cell of the series available cell of the series ((ii jdjd((kk)) mod )) mod NN for for jj 0, 1, … , 0, 1, … , N N 1 1The secondary hash function The secondary hash function dd((kk)) cannot have zero values cannot have zero valuesThe table size The table size NN must be a must be a prime to allow probing of all prime to allow probing of all the cellsthe cells

Common choice of Common choice of compression map for the compression map for the secondary hash function: secondary hash function:

dd22((kk) ) qq kk mod mod q q

where where qq N N andand qq is a prime is a prime

The possible values for The possible values for dd22((kk)) are are 1, 2, … , 1, 2, … , qq

ExampleExample– NN1313 – hh((kk) ) k k modmod 1313 – dd((kk) ) 7 7 k k modmod 77

– Insert keys 18, 41, 22, Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in 44, 59, 32, 31, 73, in this orderthis order

0 1 2 3 4 5 6 7 8 9 10 11 12

31 41 183259732244 0 1 2 3 4 5 6 7 8 9 10 11 12

k h (k ) d (k ) Probes18 5 3 541 2 1 222 9 6 944 5 5 5 1059 7 4 732 6 3 631 5 4 5 9 073 8 4 8

Page 29: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

2929

Performance of HashingPerformance of HashingIn the worst case, searches, In the worst case, searches, insertions and removals on a insertions and removals on a hash table take hash table take OO((nn) ) timetimeThe worst case occurs when The worst case occurs when all the keys inserted into the all the keys inserted into the dictionary collidedictionary collideThe load factor The load factor nnN N affects the performance of a affects the performance of a hash tablehash tableAssuming that the hash Assuming that the hash values are like random values are like random numbers, it can be shown numbers, it can be shown that the expected number of that the expected number of probes for an insertion with probes for an insertion with open addressing isopen addressing is

11 (1 (1 ))

The expected The expected running time of all running time of all the dictionary ADT the dictionary ADT operations in a hash operations in a hash table is table is OO(1)(1) In practice, hashing In practice, hashing is very fast provided is very fast provided the load factor is not the load factor is not close to 100%close to 100%Applications of hash Applications of hash tables:tables:– small databasessmall databases– compilerscompilers– browser cachesbrowser caches

Page 30: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

3030

Universal HashingUniversal Hashing

A family of hash functions is A family of hash functions is universaluniversal if, for any if, for any 00<<i,ji,j<<M-1, M-1, Pr(h(j)=h(k)) Pr(h(j)=h(k)) << 1/N. 1/N.

Choose p as a prime between M and 2M.Choose p as a prime between M and 2M.

Randomly select 0<a<p and 0Randomly select 0<a<p and 0<<b<p, and define b<p, and define h(k)=(ak+b mod p) mod Nh(k)=(ak+b mod p) mod N

Theorem: The set of all functions, h, as Theorem: The set of all functions, h, as defined here, is universal.defined here, is universal.

Page 31: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

3131

Proof of Universality (Part 1)Proof of Universality (Part 1)

Let f(k) = ak+b mod pLet f(k) = ak+b mod p

Let g(k) = k mod NLet g(k) = k mod N

So h(k) = g(f(k)).So h(k) = g(f(k)).

f causes no collisions:f causes no collisions:– Let f(k) = f(j).Let f(k) = f(j).– Suppose k<j. ThenSuppose k<j. Then

pp

bakbakp

p

bajbaj

pp

bak

p

bajkja

)(

So a(j-k) is a multiple of So a(j-k) is a multiple of pp

But both are less than pBut both are less than p

So a(j-k) = 0. I.e., j=k. So a(j-k) = 0. I.e., j=k. (contradiction)(contradiction)

Thus, f causes no Thus, f causes no collisionscollisions..

Page 32: CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance

3232

Proof of Universality (Part 2)Proof of Universality (Part 2)If f causes no collisions, only g can make h cause If f causes no collisions, only g can make h cause collisions. collisions. Fix a number x. Of the p integers y=f(k), different Fix a number x. Of the p integers y=f(k), different from x, the number such that g(y)=g(x) is at mostfrom x, the number such that g(y)=g(x) is at most

Since there are p choices for x, the number of h’s Since there are p choices for x, the number of h’s that will cause a collision between j and k is at mostthat will cause a collision between j and k is at most

There are p(p-1) functions h. So probability of There are p(p-1) functions h. So probability of collision is at mostcollision is at most

Therefore, the set of possible h functions is Therefore, the set of possible h functions is universal.universal.

1/ Np

N

ppNpp

)1(1/

Npp

Npp 1

)1(

/)1(