cs 261 – data structures

Post on 02-Jan-2016

19 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CS 261 – Data Structures. Hash Tables Part 1. Open Address Hashing. Can we do better than O(log n) ?. We have seen how skip lists and AVL trees can reduce the time to perform operations from O(n) to O(log n) Can we do better? Can we find a structure that will provide O(1) operations? - PowerPoint PPT Presentation

TRANSCRIPT

CS 261 – Data Structures

Hash Tables

Part 1. Open Address Hashing

Can we do better than O(log n) ?•We have seen how skip lists and AVL trees can reduce the time to perform operations from O(n) to O(log n)

•Can we do better? Can we find a structure that will provide O(1) operations?

•Yes. No. Well, Maybe….

Hash Tables•Hash tables are similar to Arrays except…– Elements can be indexed by values other than integers

– A single position may hold more than one element

•Arbitrary values (hash keys) map to integers by means of a hash function

•Computing a hash function is usually a two-step process:1.Transform the value (or key) to an integer

2.Map that integer to a valid hash table index

•Example: storing names– Compute an integer from a name

– Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables

Say we’re storing names:

Angie

Joe

Abigail

Linda

Mark

Max

Robert

John

Hash Function0 Angie, Robert

1 Linda

2 Joe, Max, John

3

4 Abigail, Mark

Hash Function: Transforming to an Integer•Mapping: Map (a part of) the key into an integer– Example: a letter to its position in the alphabet

•Folding: key partitioned into parts which are then combined using efficient operations (such as add, multiply, shift, XOR, etc.)

– Example: summing the values of each character in a string

•Shifting: get rid of high- or low-order bits that are not random

– Example: if keys are always even, shift off the low order bit

•Casts: converting a numeric type into an integer– Example: casting a character to an int to get its ASCII value

Hash Function: Combinations•Another use for shifting: in combination with folding when the fold operator is commutative:

KeyMapped chars

FoldedShifted and

Folded

eat 5 + 1 + 20 26 20 + 2 + 20 = 42

ate 1 + 20 +

526 4 + 40 + 5 = 49

tea 20 + 5 + 1 26 80 + 10 + 1 = 91

Hash Function: Mapping to a Valid Index•Almost always use modulus operator (%) with table size:– Example: idx = hash(val) % data.size()

•Must be sure that the final result is positive.– Use only positive arithmetic or take absolute value

– Remember smallest negative number, possibly use longs

•To get a good distribution of indices, prime numbers make the best table sizes:– Example: if you have 1000 elements, a table size of 997 or 1009 is preferable

Hash Functions: some ideas•Here are some typical hash functions:– Character: the char value cast to an int it’s ASCII value

– Date: a value associated with the current time

– Double: a value generated by its bitwise representation

– Integer: the int value itself

– String: a folded sum of the character values

– URL: the hash code of the host name

Hash Tables: Collisions•Ideally, we want a perfect hash function where each data element hashes to a unique hash index

•However, unless the data is known in advance, this is usually not possible

•A collision is when two or more different keys result in the same hash table index

Example, perfect hashing•Alfred, Alessia, Amina, Amy, Andy and Anne have a club. Amy needs to store information in a six element array. Amy discovers can convert 3rd letter to index:

Alfred

F = 5 % 6 = 5

Alessia

E = 4 % 6 = 4

Amina I = 8 % 6 = 2

Amy Y = 24 % 6 = 0

Andy D = 3 % 6 = 3

Anne N = 13 % 6 = 1

Indexing is faster than searching•Can convert a name (e.g. Alessia) into a number (e.g. 4) in constant time.

•Even faster than searching.

•Allows for O(1) time operations.

•Of course, things get more complicated when the input values change (Alan wants to join the club, since ‘a’ = 0 same as Amy, or worse yet Al who doesn’t have a third letter!)

Hash Tables: Resolving CollisionsThere are several general approaches to resolving collisions:

1.Open address hashing: if a spot is full, probe for next empty spot

2.Chaining (or buckets): keep a Collection at each table entry

3.caching: save most recently access value, slow search otherwise

Today we will examine Open Address Hashing

Open Address Hashing•All values are stored in an array.

•Hash value is used to find initial index to try.

•If that position is filled, next position is examined, then next, and so on until an empty position is filled

•The process of looking for an empty position is termed probing, specifically linear probing.

•There are other probing algorithms, but we won’t consider them.

Example• Eight element table using Amy’s hash function.

Amina Andy Alessia Alfred Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Now Suppose Anne wants to Join•The index position (5) is filled by Alfred. So we probe to find next free location.

Amina Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Next comes Agnes•Her position, 6, is filled by Anne. So we once more probe. When we get to the end of the array, start again at the beginning. Eventually find position 1.

Amina Agnes Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Finally comes Alan•Lastly Alan wants to join. His location, 0, is filled by Amina. Probe finds last free location. Collection is now completely filled. (More on this later)

Amina Agnes Alan Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Next operation, contains test•Hash to find initial index, move forward examining each location until value is found, or empty location is found.

•Search for Amina, Search for Anne, search for Albert

•Notice that search time is not uniform

Amina Andy Alessia Alfred Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Final Operation: Remove•Remove is tricky. Can’t just replace entry with null. What happens if we delete Agnes, then search for Alan?

Amina Alan Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

How to handle remove•Simple solution: Just don’t do it. (we will do this one)

•Better: create a tombstone:–A value that marks a deleted entry–Can be replaced with new entry–But doesn’t halt a search

AminaTOMBSTONE

Alan Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Hash Table Size - Load Factor•Load factor:

= n / m

–So, load factor represents the average number of elements at each table entry

–For open address hashing, load factor is between 0 and 1 (often somewhere between 0.5 and 0.75)

–For chaining, load factor can be greater than 1

•Want the load factor to remain small

Load factor

# of elements

Size of table

What to do with a large load factor•Common solution: When the load factor becomes too large (say, bigger than 0.75) then reorganize.

•Create a new table with twice the number of positions

•Copy each element, rehashing using the new table size, placing elements in new table

•The delete the old table

•Exactly like you did with the dynamic array, only this time using hashing.

Hash Tables: Algorithmic Complexity•Assumptions:–Time to compute hash function is constant

–Worst case analysis All values hash to same position

–Best case analysis Hash function uniformly distributes the values (all buckets have the same number of objects in them)

•Find element operation:–Worst case for open addressing O(n)

–Best case for open addressing O(1)

Hash Tables: Average Case•What about average case?

•Turns out, it is 1/(1-)•So keeping load factor small is very important

(1/(1-))

0.25 1.3

0.5 2.0

0.6 2.5

0.75 4.0

0.85 6.6

0.95 19.0

Your turn•Complete the implementation of the hash table

•Use hashfun(value) to get hash value

•Don’t do remove.

•Do add and contains test first, then do the internal reorganize method

top related