linked lists and hash tablesaebnenas/teaching/fall2007/cs5321/lectures/… · single and multi...

Post on 16-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linked Lists and Hash Tables

Jon Woods

CS 5321

Stacks and Queues

Stack – LIFO

Queue - FIFO

123456789

1 2 3 4 5 6 7 8 9

Stack

Push – O(1)

Pop – O(1)

Stack Empty - O(1)

3 3 2 1

3 3 2 1

Queue

Enqueue – O(1)

Dequeue - O(1)

3 1 2 3

3 1 2 3

Linked List

Singly Linked List

Doubly Linked List

Circularly Linked List

1 2 3

1 2 3

Head

Head

1 2 3Head

List Search

List-Search(L,k)

x = head[L]while x != NIL and key[x] != k

do x = next[x]return x

List Search = θ(n)

List Insert

List-Insert(L, x)

next[x] = head[L]if head[L] != NIL

then prev[head[L]] = xhead[L] = xprev[x] = NIL

List Insert = O(1)

List Delete

List-Delete(L, x)

if prev[x] != NILthen next[prev[x]] = next[x]else head[L] = next[x]

if next[x] != NILthen prev[next[x]] = prev[x]

List Delete = O(1) or θ(n)? Why?

Single and Multi Arrays

Multi array implementations represent linked lists with three arrays: key, next,

prev

Single array implementations represent linked lists as a single array, with key, next, and prev stored as sequential values within

a single array.

Multi Array Implementation

next

key

prev

1 2 3 4 5 6 7 8

3

4

5

1

2

2

16

7

5

9

L (7)

The variable L represents the index of the head, 7 in this case.

Single Array Implementation

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

4 7 13 1 4 16 4 19 9 13

L (19)

What are the advantages of using this implementation? Disadvantages?

Allocate and Free

Allocate-Object() Free-Object(x)If free != NIL next[x] = freex = free free = xfree = next [x]return x

These functions both take O(1) time.

Allocate

1 2 3 4 5 6 7 8

4

5

1

2

83

16

7

2 1 5

9

6next

key

prev

7

4free

L

3

4

5

1

2

16

7

2 1 5

9

6next

key

prev

4

8free

L

7

25

Allocate-Object() will return 4 (the next item on the free list) and then calls List-Insert(L,4).

The new head of the free list is 8.

4

Free1 2 3 4 5 6 7 8

4

5

1

2

3

16

7

2 1 5

9

6next

key

prev

4

8free

L

7

25

3

4

5

1

2

16

7

2 1 5

9

6next

key

prev

4

5free

L

7

25

4

4

8

After calling List-Delete(L,5), we call Free(5).

Object 5 now becomes the new head of the free list.

Direct Address Table

UUniverse of Keys

KActual Keys

1 9

407

6

2

35 8

0

1

2

3

4

5

6

7

8

9

2

3

5

8

Key SatelliteData

Direct Address Table

DIRECT_ADDRESS_SEARCH(T,x)return T[k]

DIRECT_ADDRESS_INSERT(T,x)T[key[x]] = x

DIRECT_ACCESS_DELETE(T,x)T[key[x]] = NIL

All functions are O(1)

Collisions and Chaining

U

Kk1

k4 k5

k7

k2

k8

k3

k6

k1 k4

k2 k5 k7

k3

k6 k8

h k1=hk4 ,hk2=h k5=hk7 ,hk6=hk8

Analysis of ChainingE [

1n∑i=1

n

1 ∑j=i1

n

1m

]

=11nm

∑i=1

n

n−i

=11nm

∑i=1

n

n−∑i=1

n

i

=11nm

n2−nn1

2

=1n−12m

=1

2−

2n

=2

2−

2n

1

During a search for x, we examine 1 more than the number of elements preceding x.

Assuming uniform hashing, P{h( ) = h( )} = 1/m

Thus, the expected length that we will have to search, E, is 1/m.

If the number of slots is proportional to the number of elements in a table, then n = O(m).

Since α = n/m, O(m)/m = O(1)

ki k j

Hash Functions

Division: h(k) = k mod m

Multiplication: h(k) = m(k A mod 1)

We should choose a power of 2 for m in the multiplication hashing scheme, but NOT for

the division scheme. Why?

Universal Hashing

Randomized hashing functions offer a probabilistic efficiency.

This ensures good average case performance.

With universal hashing, we can achieve θ(1+a) expected search time without

making assumptions based on the keys.

Universal HashingE [Yk]≤ ∑

l∈T , l≠k

1m

if k∉T

nhk =Y k

∣l : l∈T∧l≠k∣=n

E [nhk ]=E[Y k]≤nm

=

if k∈T

nhk =Y k1

∣l : l∈T∧l≠k∣=n−1

E [nhk ]=E[Y k1]≤n−1m

1=1−1m

1

Let Y be the number of keys other than k that hash to the same slot as k.

As before, a single pair of keys collide with a probability of 1/m.

If the key k is not in the table, then the number of keys in the same slot as k is equal to the number of keys in the slot not equal to k. The number of keys in T that are not equal to k is n. If k is not in T, then we must examine α keys to find a spot for k.

If the key k is in the table, then the number of keys in the slot with k includes k. The number of keys in T that are not equal to k is n-1. If k is in T, then we must examine α+1 keys to determine we found k.

Designing a Universal Hash Function

We choose a prime number p such that every possible key is in the range 0 to p-1.

We choose two different values, a and b, from that range.

h(k) = ((ak + b) mod p) mod m

Open Addressing

Instead of storing pointers, we have a computation function which indexes values

by calculating a probing sequence.

By not storing pointers, we may yield fewer collisions and attain faster retrieval.

Truly uniform hashing requires m! distinct probing sequences.

Linear and Quadratic Probing

h(k, i) = (h'(k)+i) mod mPrimary Clustering

Only offers m distinct probing sequences

h(k, i) = (h'(k) + c1i + c2i^2) mod mSecondary Clustering

Also offers only m distinct probing sequences

Double Hashing

h k,i=h1kih2 kmodm

h1 k=kmod13

h2 k=1kmod11

h1 14=1,h2 14=4

79

69

98

72

14

50

0

1

2

3

4

5

6

7

8

9

10

11

12

In double hashing, we calculate two hashes, one for the initial position and one for the offset should that position be full.

In this example, we choose the hash functions depicted at left. After inserting 5 values into the table, we try to insert 14.

Position 1 is full, so we increase by the offset 4. Position 5 is also full, so we put our data into position 9.

Double hashing offers m^2 distinct probing sequences.

Analysis of Open Addressing

E [X ]=∑i=1

P nm

∗n−1m−1

∗...∗n−i2

m−i2

E [X ]≤∑i=1

nm

i−1

E [X ]≤∑i=1

i−1

E [X ]=∑i=0

i

1

1−probes

The expected number of probes necessary to find an empty slot is equal to the sum of the probabilities of each of the cells being empty assuming the previous one was full.

By manipulating the equation, we can bound the expected number of probes.

Thus, we expect at most 1/(1-a) probes on average.

Perfect Hashing

When used with a static set of keys, and two 'universal' hash schemes, we can

construct a structure with no collisions and a O(1) search time.

Why is this better than other hash schemes?

Perfect Hashing

hk=akbmodp modm

a=3,b=42,p=101,m=9

0

1

2

3

4

5

6

8

7 16 23 88 40 52 22 37

m7 a7 b7 S7

1 0

m5 a5 b5

0

9 10

m2 a2 b2

18

1 0

m0 a0 b0

0

70

S5

60 72 75

10

S0

S2

T

h(75) = 2, so 75 hashes to slot 2 of table T.

h'(75) = 7, so 75 hashes to slot 7 of secondary hash table S2.

This man owns the patent on linked lists

Linked List - Patent No. 10260471

Patent Issued April 11, 2006 to LSI Logic Corporation

“A computerized list is provided with auxiliary pointers for traversing the list in different sequences. One or more auxiliary pointers enable a fast, sequential traversal of the list with a minimum of computational time. Such lists may be used in any application where lists may be reordered for various purposes.”

Abhi Talwalkar, CEO LSI Logic

top related