dictionaries and hash tables cmput 115 - lecture 24 department of computing science university of...

39
Dictionaries and Dictionaries and Hash Tables Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from the book: Java Structures by Duane A. Bailey or the companion structure package Revised 3/28/00

Post on 21-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

Dictionaries and Dictionaries and Hash TablesHash Tables

Cmput 115 - Lecture 24

Department of Computing Science

University of Alberta©Duane Szafron 2000

Some code in this lecture is based on code from the book:Java Structures by Duane A. Bailey or the companion structure package

Revised 3/28/00

Page 2: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

2

About This LectureAbout This Lecture

In this lecture we will study a container interface called Dictionary and an implementation class called HashTable.

Page 3: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

3

OutlineOutline Dictionary Interface

HashTable Class

Iterators

External Chaining

Page 4: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

4

DictionaryDictionary

A Dictionary is an unordered container that contains key-value pairs.

The keys are unique, but the values are not.

45

"Barney"

"Wilma"

"Betty""Fred"

39

37 39

keys

values

Page 5: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

5

Dictionary HierarchyDictionary Hierarchy In Java.util, Dictionary is a class. In the structure package the Dictionary Interface as an extension of

the Store Interface. The class HashTable will implement the Dictionary interface.

Store

Dictionary

HashTable

Page 6: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

6

Structure Interface - StoreStructure Interface - Storepublic interface Store {

public int size();//post: returns the number of elements contained in // the store.

public boolean isEmpty();// post: returns the true iff store is empty.

public void clear();// post: clears the store so that it contains no // elements.

}

code based on Bailey pg. 18

Page 7: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

7

Structure Interface - Dictionary 1Structure Interface - Dictionary 1

public interface Dictionary extends Store {

public Object put(Object key, Object value);// pre: key is non-null// post: puts the key-value pair in this Dictionary. If a // matching key was in this Dictionary, returns the old value.// Otherwise, returns null

public Object get(Object key);// pre: key is non-null// post: returns the value with the given key or null if// no matching key is found

public boolean contains(Object value);// pre: value is non-null// post: returns true iff the Dictionary contains the value

code based on Bailey pg. 268

Page 8: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

8

Structure Interface - Dictionary 2Structure Interface - Dictionary 2

public boolean containsKey(Object key);// pre: key is non-null

// post: returns true iff the Dictionary contains the key

public Object remove(Object key);// pre: key is non-null// post: removes a key-value pair whose key is “equal” to // the given key and returns the value. If no matching key // was found, then returns null

public Iterator keys();// post: returns an Iterator for traversing all keys

public Iterator elements();// post: returns an Iterator for traversing all values

}code based on Bailey pg. 267

Page 9: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

9

Dictionary - Dictionary - Obvious ImplementationsObvious Implementations

We could implement a Dictionary using two parallel containers (Arrays, Vectors, Lists etc.,) one for the keys and one for the values.

We could also implement a Dictionary using a single container that holds Associations.

In either case, the methods get(Object), put(Object, Object), contains(Object), containsKey(Object) and remove(Object) would each require O(n) calls to the equals(Object) method for Lists.

If the keys are Comparable we can reduce the comparisons to log (n) for Arrays and Vectors.

Can we do better?

Page 10: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

10

Dictionary - Parcel AnalogyDictionary - Parcel Analogy Assume that you are about to leave a busy mall or

amusement park and you are one of about a thousand people picking up a parcel at any time during the day.

This is a Dictionary problem with names as keys and parcels as values.

Assume the mall has 100 bins that each hold about 10 parcels.

How should the mall organize these parcels to minimize waiting time?

Page 11: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

11

Parcels - Using BinsParcels - Using Bins When you buy your item, you are asked for the last two

digits of your phone number and your parcel is sent to that bin.

When you pick up your parcel the attendant asks for the last two digits of your phone number, goes to the correct bin (1 - 100) and searches through the parcels (1-10) to get the one with your name.

This is an example of hashing.

Each item is assigned a hash number (or index) that is used to select a bin which contains a small number of items that can be searched for your item.

Page 12: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

12

Selecting Bin NumbersSelecting Bin Numbers Would the first two digits of a phone number be as good as

the last two digits?

There are only a few combinations of first two digits that most local residents share (for example, 42, 43, 44, 45, 46, 47, 48, 92, 96, 98 in Edmonton), so a few bins would overflow and others would be empty.

What about using the first two or last two letters of the name of the person?

This would take 26*26 = 676 bins but even so, some bins would fuller than others.

For maximum efficiency, we want the keys to be uniformly distributed over the bin numbers.

Page 13: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

13

Hash FunctionsHash Functions A hash function maps keys to bin values.

– It should map keys uniformly across all bins.– It should be fast to compute.– It should be applicable to all objects.

h(“Paul”) = 28

When two keys map to the same bin, we have a hash collision.

When a collision occurs, a collision resolution algorithm is used to establish the locations of the colliding keys in the bin.

In some cases when we know all of the key values in advance we can construct a perfect hash function that maps each key to a different bin (no collisions).

Page 14: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

14

Hash TablesHash Tables A hash table is a container (usually an Array or Vector)

whose elements are used as bins.

In the basic implementation, each entry in the hash table is a bin that holds a single element.

“longest”

“to”

“kiwi”“fifth”

0123456

hash function= length % 7

“kiwi”

4

Page 15: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

15

Hash Tables CollisionsHash Tables Collisions If there is a hash collision, the collision resolution

algorithm selects a different bin for the new element to be inserted.

This is called open addressing.

“longest”

“to”

“kiwi”“fifth”

0123456

hash function= length % 7

“poem”

4?

Page 16: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

16

Linear ProbingLinear Probing One open addressing algorithm is called linear

probing:– Locations are checked from the hash location to the end of the

table and the element is placed in the first empty slot.

– If the bottom of the table is reached, checking “wraps around” to the start of the table.

Collision resolution modifies how a search is done since the match for a search might not be at the hash location.

For example, if linear probing is used, the search must continue down the table until a match or empty location is found.

Page 17: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

17

Linear Probing ExampleLinear Probing Example

“longest”

“to”

“kiwi”

“fifth”

0123456

“poem”

hash function= length % 7 4

“poem”

Page 18: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

18

Other Open Addressing SchemesOther Open Addressing Schemes Linear probing has an offset value of 1.

Instead, we can use a second hash function to generate a different offset from the first hash location using double hashing.

“longest”

“to”

“kiwi”

“poem”“fifth”

0123456

“fred”

hash function= length % 7 4

hash function=value (firstChar) 6

(4 + 6) % 7 -> 3

“fred”

Page 19: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

19

Element Deletion ProblemElement Deletion Problem Open addressing affects element removal. When an element is removed, the “hole” may prevent

us from finding another element that hashed to the same location.

hash function=length % 7

“poem”

4“longest”

“to”

“kiwi”

“poem”“fifth”

0123456

stop before finding “poem”

Page 20: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

20

Element DeletionElement Deletion Deletions can be handled in two ways:

– Mark the deleted location as Reserved• During insertion, a reserved location can be re-used.

– Move all of the elements that hashed to the same location as the removed element “up” in the hash table after a deletion.

“longest”

“to”

“kiwi”

“poem”Reserved

0123456

“longest”

“to”

“kiwi”“poem”

0123456

Page 21: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

21

Efficiency of HashTablesEfficiency of HashTables If the number of collisions is small, searching, adding and

removing elements in a hash table requires O(C) time.

To reduce the number of collisions, in addition to using a good hash function, we must make sure the table does not get too full.

The load factor of a hash table is the ratio of full elements to empty elements.

For best results, the load factor should not be above 0.6.

If it gets higher, we should extend the hash table and re-hash all of its elements.

Page 22: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

22

Implementation of HashTableImplementation of HashTable We will use an array of Associations.

We will use the Reserved strategy for deletions.

We will grow the HashTable when the load factor gets too high.

We will cache the logical size to make it easier to determine when the load factor is too high.

The size of the HashTable should be a prime, but we will allow the user to specify the initial size and double this size and add one, when it must be grown. (e.g., run some experiments using size 97 vs 100)

Page 23: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

23

ExampleExample

Aho

Hopcroft

Backus

Von Neuman

Scott

Jacobsen

0123456789

10

put “Aho”, prog-lang ->1*3%11=3put “Scott”, automata -> 19*5%11=7put “Hopcroft”, automata -> 9put “Backus”, prog-lang -> 1put “von Neuman”, archit -> 10put “Turing”, coding -> 10put “Jacobsen”, softeng -> 3

Turing

hash = value(first char of key)*length(key)

Page 24: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

24

ExampleExample

111213141516171819202122

put “Aho”, prog-lang ->1*3%23->3put “Scott”, automata -> 19*5%23->3put “Hopcroft”, automata -> 9 ->18put “Backus”, prog-lang -> 1 -> 12put “von Neuman”, archit -> 10 ->13put “Turing”, coding -> 10 -> 5put “Jacobsen”, softeng -> 3 ->

Aho

Hopcroft

Backus

Von Neuman

Scott

Jacobsen

0123456789

10

Turing

hash = value(first char of key)*length(key)

ScottAho

Hopcroft

Backusvon Neuman

Turing

Jacobsen

McCarthy

11

put “McCarthy”,AI -> 13*8%23 ->12rehash

Page 25: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

25

HashTableHashTable - - State and ConstructorsState and Constructors

class HashTable implements Dictionary {protected static Association reserved = new

Association(“reserved”, null);protected Association data[ ];protected int count; protected int capacity;protected final double loadFactor = 0.6;

public HashTable(int initialCapacity) {// pre: initialCapacity > 0// post: constructs a HashTable with given initial size.

this.data = new Association[initialCapacity];this.capacity = initialCapacity;this.count = 0;

}public HashTable() {// post: constructs a HashTable with a default size.

this(997);} code based on Bailey pg. 270

Page 26: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

26

HashTable - HashTable - Store InterfaceStore Interface/* Interface Store Methods */

public int size() {//post: returns the number of elements in the store.

return this.count; }

public boolean isEmpty() {// post: returns the true iff store is empty.

return this.size() == 0; }

public void clear();// post: clears the store so that it contains no elements.

for (index = 0; index < this.capacity; index++) this.data[index] = null;

this.count = 0;}

code based on Bailey SPackage

Page 27: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

27

HashTable - HashTable - getget

public Object get(Object key) {// pre: key is non-null// post: returns the value with the given key or null if// no matching key is found

int index;Association found;

index = this.locate(key); // locate does the workfound = this.data[index];if (found == null || found == reserved) return null;return found.value();

}

code based on Bailey pg. 275

Page 28: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

28

HashTable - HashTable - put 1put 1

public Object put(Object key, Object value);// pre: key is non-null// post: puts the key-value pair in this Dictionary. If a // matching key was in this Dictionary, returns the old value.// Otherwise, returns null

int index;Association found;Object oldValue;if (count + 1 > this.loadFactor * capacity) this.rehash();index = this.locate(key); // locate does the workfound = this.data[index];if (found == null || found == reserved) { // not found this.data[index] = new Association(key, value); this.count++; return null;}

code based on Bailey pg. 274

Page 29: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

29

HashTable - HashTable - put 2 and containsKeyput 2 and containsKey

else // found oldValue = found.value(); found.setValue(value); return oldValue;}

}

public boolean containsKey(Object key) {// pre: key is non-null// post: returns true iff the Dictionary contains the key

int index;

index = this.locate(key); // locate does the workreturn this.data[index] != null && this.data[index] != reserved;

}

code based on Bailey pg. 275

Page 30: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

30

HashTable - HashTable - removeremovepublic Object remove(Object key);// pre: key is non-null// post: removes a key-value pair whose key is “equal” to // the given key and returns the value. If no matching key // was found, then returns null

int index;Association found;Object oldValue;index = this.locate(key); // locate does the workfound = this.data[index];if (found == null || found == reserved) { // not found return null;this.count--;oldValue = found.value();this.data[index] = reserved;return oldValue;

}

code based on Bailey pg. 276

Page 31: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

31

HashTable - HashTable - locate 1locate 1

protected int locate(Object key);// pre: key is non-null// post: returns ideal index of key in table

int index;int reservedIndex;Association found;Object oldValue;

index = Math.abs(key.hashCode() % this.capacity);

reservedIndex = -1;

code based on Bailey pg. 274

Page 32: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

32

HashTable - HashTable - locate 2locate 2

while (this.data[index] != null) { if (this.data[index] = reserved {

if (reservedIndex == -1)reservedIndex = index;

} else

if (key.equals(this.data[index].key())) return index; // we have located the key

index = (index + 1) % this.capacity; //probe linearly

}if (reservedIndex == -1) return index; //haven’t hit reserved key so return index

else //return first available (reserved) index

return reservedIndex;} code based on Bailey pg. 274

Page 33: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

33

HashTable - HashTable - rehashrehashprotected void rehash() {// post: resizes table and re-hashes all elements

Association association;

Iterator iterator;

iterator = new HashtableIterator(this.data);

this.capacity = this.capacity * 2 + 1;

this.data = new Association[this.capacity];

this.count = 0;

while (iterator.hasMoreElements()) {

association = (Association) iterator.nextElement();

put(association.key(), association.value());

}

}code based on Bailey SPackage

Page 34: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

34

IteratorsIterators

Create a HashTableIterator class whose elements are Associations.

A HashTableIterator is used in rehash().

Also, let each KeyIterator or ValueIterator be a filter on a HashTableIterator (see textbook).

Page 35: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

35

HashtableIterator - HashtableIterator - public 1public 1

class HashtableIterator implements Iterator {protected int current;protected Association data[ ];

public HashtableIterator(Association[ ] table) {// post: constructs a new hash table iterator

this.data = table;this.reset();

}public void reset() {// post: resets iterator to beginning of hash table

this.current = 0;this.findNextElement();

}public boolean hasMoreElements() {// post: returns true if there are unvisited elements

return this.current < this.data.length;}

code based on Bailey SPackage

Page 36: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

36

HashtableIterator - HashtableIterator - public 2public 2

public Object nextElement() {// pre: hasMoreElements()// post: returns current element, increments iterator

Object result;

result = this.data[this.current];this.findNextElement();return result;

}

public Object value()// pre: hasMoreElements()// post: returns current element (key and value)

return this.data[this.current];}

code based on Bailey SPackage

Page 37: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

37

HashtableIteratorHashtableIterator - findNextElement- findNextElement

protected void findNextElement() {// post: moves current index to the next real element

while (this.current < this.data.length && (this.data[this.current] == null || this.data[this.current] == Hashtable.reserved)) this.current++;

}

code based on Bailey SPackage

Page 38: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

38

External ChainingExternal Chaining Instead of implementing a hash table whose entries are

associations, we can have a hash table whose entries are containers for associations.

Then when there is a hashing collision, we put all elements that collided into a common container.

0123456

“longest”

“to”

“kiwi”“fifth”

“largest”

“there”“fred” “association”

Page 39: Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture

©Duane Szafron 2000

39

Some Principles from the TextbookSome Principles from the Textbook

25. Provide a method for hashing the objects you implement.

26. Equivalent Objects should return equal hash codes.

principles from Bailey ch. 13