tirgul 7 heaps & priority queues reminder examples hash tables reminder examples

21
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Tirgul 7

Heaps & Priority QueuesReminderExamples

Hash TablesReminderExamples

The heap property & heapify A heap is a complete binary tree, in

which each node is larger than both his sons.

The largest element is the root of the tree.

Notice that this does not mean, however, that the two sons of the root are the 2nd two largest in the heap.

The heap property & heapify Heapify: assumes that both subtrees of the

root are heaps, but the root may be smaller than one of its children:

Heapify(Node x)largest = max {left(x), right(x)}if ( largest > x )

exchange (largest, x)heapify (x)

The heap property & heapify

The node 3 is replaced with 16, then with 12, and then with 8.

3

16 9

5 7

0 6 8 2 4

12 10

Priority Queue Priority Queue applies First In Highest Priority Out. Supports the operations:

insert, maximum, extract-maximum.

It is simple to implement it using heaps: maximum: just return the root

extract-maximum: save the root, move the last leaf to be the root, perform heapify and return the saved root.

insert: add the node as a leaf and then move it up until its value is lower than its parent’s value.

Priority Queue Example for Priority Queue insert operation.

When inserting 11, first add it as a leaf, then exchange it with 5, then with 9.

20

16 9

5 7

11 6 8 2 4

12 10

Performance Priority Queue using heaps:

maximum operation takes O(1). extract-max operation takes O(log n). insert operation takes O(log n).

Priority Queue using ordered list: maximum operation takes O(1). extract-max operation takes O(1). insert operation takes O(n).

Insert versus Build Heap What is the difference between insert

operation of Priority Queue and Build_Heap?

for downto 1 do

while there’s some x not yet in heap

/ 2i length A

heap size A length A

,heapify A i

Build heap A

Insert Build heap A

heap insert x

Insert versus Build Heap and

sometimes create different heaps.

For example, consider the sequence 1,2,3,4 build-heap will create 4,2,3,1, Insert-build-heap will create 4,3,2,1.

Run time Build-heap = O(n). Insert-build-heap = O(nlogn) in the worst-

case.

Build heapInsert Build heap

Questions How to implement a queue/stack with a

priority queue? What are the differences in running times?

How to implement an increase-key operation, which increases the value of some node?

How to delete a given node from the heap in O(logn)?

Dictionary / Map ADT This ADT stores pairs of the form <key, data>

(in java: “value” instead of “data”).

Supports the operations insert(key, data), find(key), and delete(key).

One way to implement it is by search trees. The standard operations take O(log n) this way.

Can we achieve better performance for the standard operations?

Direct addressing Say the keys comes from a (large) set U. one

way to have fast operations is by allocating an array of size |U|. This is of course a waste of memory, since most entries in the array will remain empty.

For example, A Hebrew dictionary (e.g. Even-Shushan) holds less than 100,000 words whereas the number of possible combinations of Hebrew letters is much bigger (225 for 5-letter words only). It’s impractical to try and allocate all this space that will never be used.

Hash table In a hash table, we allocate an array of size m,

which is much smaller than |U|.

We use a hash function h() to determine the entry of each key.

When we want to insert/delete/find a key k we look for it in the entry h(k) in the array.

Notice that this way, it is not necessary to have an order among the elements of the table.

Example of usage Take for example the login names of the

students in the dast course. There are about 300 login names.

If we will use a binary search tree, a tree of height 8 will be created.

We can store the login names in a hash table of 100 entries using h(k)=k mod 100 hash function.

The next slide presents a possible spread of login names on entry.

Example of usage The x-axis describes the number of items in an

entry. The y-axis describes how many entries with

this load exist.

0

5

10

15

20

25

1 2 3 4 5 6 7 8

number of items in entry

nu

mb

er

of

en

trie

s

Example of usage Notice that even if the spread of login names

was perfect, there would have been 3 names in an entry.

(This spread considered to be good.)

Searching a login name with binary search tree: Since half of the elements are in the leaves it

would take 8 operations to find them.

Searching a login name with the hash table: The worst case is still 8, but the search for

most of the elements (about 80% of them) will take about half than that.

Example of usage Two questions arise from the example:

What is the best hash functions to use, and how do we find it?

What happens when several keys have the same entry? (clearly it might happen, since U is much larger than m).

How to choose hash functions The crucial point: the hash function should

“spread” the keys of U equally among all the entries of the array.

Unfortunately, since we don’t know in advance the keys that we’ll get from U, this can be done only approximately.

Remark: the hash functions usually assume that the keys are numbers. We’ll discuss next class what to do if the keys are not numbers.

The division method If we have a table of size m, we can use the hash

function

Some values of m are better than others: Good m’s are prime numbers not too close to 2p. Bad choice is 2p - the function uses only the p

less significant bits of k. Likewise - if keys are decimal using 10p is bad.

A bad choice example: if m=100 and key=3674 a decimal number then gives 74.

mkkh mod

2mod10k

The division method A good choice example:

if we have |U|=2000, and we want each search to take (on average) 3 operations, we can choose the closest primal number to 2000/3, m=701.

0 701,14021 702,1403...700 700…

More About Hash Tables Next class