david luebke 1 6/7/2014 itcs 6114 skip lists hashing

31
David Luebke 1 06/17/2 ITCS 6114 Skip Lists Hashing

Upload: kale-sea

Post on 30-Mar-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 1 04/10/23

ITCS 6114

Skip Lists

Hashing

Page 2: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 2 04/10/23David Luebke 2 04/10/23

Red-Black Trees

● Red-black trees do what they do very well● What do you think is the worst thing about

red-black trees?● A: coding them up

Page 3: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 3 04/10/23David Luebke 3 04/10/23

Skip Lists

● A relatively recent data structure■ “A probabilistic alternative to balanced trees”■ A randomized algorithm with benefits of r-b trees

○ O(lg n) expected time for Search, Insert

○ O(1) time for Min, Max, Succ, Pred

■ Much easier to code than r-b trees

■ Fast!

Page 4: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 4 04/10/23David Luebke 4 04/10/23

Linked Lists

● Think about a linked list as a structure for dynamic sets. What is the running time of:■ Min() and Max()?■ Successor()?■ Delete()?

○ How can we make this O(1)?■ Predecessor()?■ Search()?■ Insert()?

Goal: make these O(lg n) time in a linked-list setting

So these all take O(1)time in a linked list. Can you think of a wayto do these in O(1) timein a red-black tree?

Page 5: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 5 04/10/23

Skip Lists

● The basic idea:

● Keep a doubly-linked list of elements■ Min, max, successor, predecessor: O(1) time■ Delete is O(1) time, Insert is O(1)+Search time

● During insert, add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4)

level 1

3 9 12 18 29 35 37

level 2

level 3

Page 6: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 6 04/10/23

Skip List Search

● To search for an element with a given key:■ Find location in top list

○ Top list has O(1) elements with high probability○ Location in this list defines a range of items in next list

■ Drop down a level and recurse

● O(1) time per level on average● O(lg n) levels with high probability● Total time: O(lg n)

Page 7: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 7 04/10/23

Skip List Insert

● Skip list insert: analysis■ Do a search for that key■ Insert element in bottom-level list■ With probability p, recurse to insert in next level■ Expected number of lists = 1+ p + p2 + … = ???

= 1/(1-p) = O(1) if p is constant

■ Total time = Search + O(1) = O(lg n) expected

● Skip list delete: O(1)

Page 8: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 8 04/10/23

Skip Lists

● O(1) expected time for most operations● O(lg n) expected time for insert● O(n2) time worst case (Why?)

■ But random, so no particular order of insertion evokes worst-case behavior

● O(n) expected storage requirements (Why?)● Easy to code

Page 9: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 9 04/10/23

Review: Hashing Tables

● Motivation: symbol tables■ A compiler uses a symbol table to relate symbols

to associated data○ Symbols: variable names, procedure names, etc.○ Associated data: memory location, call graph, etc.

■ For a symbol table (also called a dictionary), we care about search, insertion, and deletion

■ We typically don’t care about sorted order

Page 10: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 10 04/10/23

Review: Hash Tables

● More formally:■ Given a table T and a record x, with key (= symbol) and

satellite data, we need to support:○ Insert (T, x)○ Delete (T, x)○ Search(T, x)

■ We want these to be fast, but don’t care about sorting the records

● The structure we will use is a hash table■ Supports all the above in O(1) expected time!

Page 11: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 11 04/10/23

Hashing: Keys

● In the following discussions we will consider all keys to be (possibly large) natural numbers

● How can we convert floats to natural numbers for hashing purposes?

● How can we convert ASCII strings to natural numbers for hashing purposes?

Page 12: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 12 04/10/23

Review: Direct Addressing

● Suppose:■ The range of keys is 0..m-1 ■ Keys are distinct

● The idea:■ Set up an array T[0..m-1] in which

○ T[i] = x if x T and key[x] = i○ T[i] = NULL otherwise

■ This is called a direct-address table○ Operations take O(1) time!○ So what’s the problem?

Page 13: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 13 04/10/23

The Problem With Direct Addressing

● Direct addressing works well when the range m of keys is relatively small

● But what if the keys are 32-bit integers?■ Problem 1: direct-address table will have

232 entries, more than 4 billion■ Problem 2: even if memory is not an issue, the time to

initialize the elements to NULL may be

● Solution: map keys to smaller range 0..m-1● This mapping is called a hash function

Page 14: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 14 04/10/23David Luebke 14 04/10/23

Hash Functions

● Next problem: collisionT

0

m - 1

h(k1)

h(k4)

h(k2) = h(k5)

h(k3)

k4

k2 k3

k1

k5

U(universe of keys)

K(actualkeys)

Page 15: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 15 04/10/23David Luebke 15 04/10/23

Resolving Collisions

● How can we solve the problem of collisions?● Solution 1: chaining● Solution 2: open addressing

Page 16: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 16 04/10/23David Luebke 16 04/10/23

Open Addressing

● Basic idea (details in Section 12.4): ■ To insert: if slot is full, try another slot, …, until

an open slot is found (probing)■ To search, follow same sequence of probes as

would be used when inserting the element○ If reach element with correct key, return it○ If reach a NULL pointer, element is not in table

● Good for fixed sets (adding but no deletion)■ Example: spell checking

● Table needn’t be much bigger than n

Page 17: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 17 04/10/23David Luebke 17 04/10/23

Chaining

● Chaining puts elements that hash to the same slot in a linked list:

——

——

——

——

——

——

T

k4

k2k3

k1

k5

U(universe of keys)

K(actualkeys)

k6

k8

k7

k1 k4 ——

k5 k2

k3

k8 k6 ——

——

k7 ——

Page 18: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 18 04/10/23David Luebke 18 04/10/23

Chaining

● How do we insert an element?

——

——

——

——

——

——

T

k4

k2k3

k1

k5

U(universe of keys)

K(actualkeys)

k6

k8

k7

k1 k4 ——

k5 k2

k3

k8 k6 ——

——

k7 ——

Page 19: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 19 04/10/23David Luebke 19 04/10/23

Chaining

——

——

——

——

——

——

T

k4

k2k3

k1

k5

U(universe of keys)

K(actualkeys)

k6

k8

k7

k1 k4 ——

k5 k2

k3

k8 k6 ——

——

k7 ——

● How do we delete an element?■ Do we need a doubly-linked list for efficient delete?

Page 20: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 20 04/10/23David Luebke 20 04/10/23

Chaining

● How do we search for a element with a given key?

——

——

——

——

——

——

T

k4

k2k3

k1

k5

U(universe of keys)

K(actualkeys)

k6

k8

k7

k1 k4 ——

k5 k2

k3

k8 k6 ——

——

k7 ——

Page 21: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 21 04/10/23David Luebke 21 04/10/23

Analysis of Chaining

● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot

● Given n keys and m slots in the table: the load factor = n/m = average # keys per slot

● What will be the average cost of an unsuccessful search for a key?

Page 22: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 22 04/10/23David Luebke 22 04/10/23

Analysis of Chaining

● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot

● Given n keys and m slots in the table, the load factor = n/m = average # keys per slot

● What will be the average cost of an unsuccessful search for a key? A: O(1+)

Page 23: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 23 04/10/23David Luebke 23 04/10/23

Analysis of Chaining

● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot

● Given n keys and m slots in the table, the load factor = n/m = average # keys per slot

● What will be the average cost of an unsuccessful search for a key? A: O(1+)

● What will be the average cost of a successful search?

Page 24: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 24 04/10/23David Luebke 24 04/10/23

Analysis of Chaining

● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot

● Given n keys and m slots in the table, the load factor = n/m = average # keys per slot

● What will be the average cost of an unsuccessful search for a key? A: O(1+)

● What will be the average cost of a successful search? A: O(1 + /2) = O(1 + )

Page 25: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 25 04/10/23David Luebke 25 04/10/23

Analysis of Chaining Continued

● So the cost of searching = O(1 + )● If the number of keys n is proportional to the

number of slots in the table, what is ?● A: = O(1)

■ In other words, we can make the expected cost of searching constant if we make constant

Page 26: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 26 04/10/23David Luebke 26 04/10/23

Choosing A Hash Function

● Clearly choosing the hash function well is crucial■ What will a worst-case hash function do?■ What will be the time to search in this case?

● What are desirable features of the hash function?■ Should distribute keys uniformly into slots■ Should not depend on patterns in the data

Page 27: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 27 04/10/23David Luebke 27 04/10/23

Hash Functions:The Division Method

● h(k) = k mod m■ In words: hash k into a table with m slots using the

slot given by the remainder of k divided by m

● What happens to elements with adjacent values of k?

● What happens if m is a power of 2 (say 2P)?● What if m is a power of 10?● Upshot: pick table size m = prime number not

too close to a power of 2 (or 10)

Page 28: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 28 04/10/23David Luebke 28 04/10/23

Hash Functions:The Multiplication Method

● For a constant A, 0 < A < 1:● h(k) = m (kA - kA)

What does this term represent?

Page 29: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 29 04/10/23David Luebke 29 04/10/23

Hash Functions:The Multiplication Method

● For a constant A, 0 < A < 1:● h(k) = m (kA - kA)

● Choose m = 2P

● Choose A not too close to 0 or 1● Knuth: Good choice for A = (5 - 1)/2

Fractional part of kA

Page 30: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 30 04/10/23David Luebke 30 04/10/23

Hash Functions: Worst Case Scenario

● Scenario:■ You are given an assignment to implement hashing■ You will self-grade in pairs, testing and grading

your partner’s implementation■ In a blatant violation of the honor code, your

partner:○ Analyzes your hash function○ Picks a sequence of “worst-case” keys, causing your

implementation to take O(n) time to search

● What’s an honest CS student to do?

Page 31: David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 31 04/10/23David Luebke 31 04/10/23

Hash Functions: Universal Hashing

● As before, when attempting to foil an malicious adversary: randomize the algorithm

● Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored■ Guarantees good performance on average, no

matter what keys adversary chooses