18. dictionaries, hash tables and sets
Post on 20-May-2015
34.303 Views
Preview:
DESCRIPTION
TRANSCRIPT
Hash Tables and Hash Tables and SetsSets
Dictionaries, Hash Tables, Collisions Dictionaries, Hash Tables, Collisions Resolution, SetsResolution, Sets
Svetlin NakovSvetlin NakovTelerik Telerik
CorporationCorporationwww.telerik.com
Table of ContentsTable of Contents
1.1. DictionariesDictionaries
2.2. Hash TablesHash Tables
3.3. Dictionary<TKey,Dictionary<TKey, TValue>TValue> Class Class
4.4. SetsSets
2
DictionariesDictionariesData Structures that Map Keys to ValuesData Structures that Map Keys to Values
The Dictionary (Map) The Dictionary (Map) ADTADT
The abstract data type (ADT) The abstract data type (ADT) ""dictionarydictionary" maps key to values" maps key to values Also known as "Also known as "mapmap" or "" or "associative associative
arrayarray""
Contains a set of (key, value) pairsContains a set of (key, value) pairs Dictionary ADT operations:Dictionary ADT operations:
Add(key,Add(key, value)value)
FindByKey(key)FindByKey(key) valuevalue
Delete(key)Delete(key) Can be implemented in several waysCan be implemented in several ways
List, array, hash table, balanced tree, ...List, array, hash table, balanced tree, ...4
ADT Dictionary – ADT Dictionary – ExampleExample
Example dictionary:Example dictionary:
5
KeyKey ValueValue
C#C#Modern object-oriented Modern object-oriented programming language for the programming language for the Microsoft .NET platformMicrosoft .NET platform
CLRCLR
Common Language Runtime – Common Language Runtime – execution engine for .NET execution engine for .NET assemblies, integral part of .NET assemblies, integral part of .NET FrameworkFramework
compilcompilerer
Software that transforms a Software that transforms a computer program to computer program to executable machine codeexecutable machine code
…… ……
Hash TablesHash TablesWhat is Hash Table? How it Works?What is Hash Table? How it Works?
Hash TableHash Table A hash table is an array that holds A hash table is an array that holds
a set of (key, value) pairsa set of (key, value) pairs The process of mapping a key to a The process of mapping a key to a
position in a table is called position in a table is called hashinghashing
…… …… …… …… …… …… …… ……
0 1 2 3 4 5 … m-1
T
h(h(kk))
Hash Hash table of table of size size mm
Hash Hash function function h:h: kk →→ 00 …… m-1m-1
7
Hash Functions and Hash Functions and HashingHashing
A hash table has A hash table has mm slots, indexed slots, indexed from from 00 to to m-1m-1
A hash function A hash function h(k)h(k) maps keys to maps keys to positions:positions:
h:h: kk → → 00 …… m-1m-1
For any value For any value kk in the key range and in the key range and some hash function some hash function hh we have we have h(k)h(k) == pp and and 00 ≤≤ pp << mm…… …… …… …… …… …… …… ……
0 1 2 3 4 5 … m-1
T
h(h(kk)) 8
Hashing FunctionsHashing Functions Perfect hashing function (PHF)Perfect hashing function (PHF)
h(k)h(k) : one-to-one mapping of each key : one-to-one mapping of each key kk to an integer in the range to an integer in the range [[00,, mm-1-1]]
The PHF maps each key to a The PHF maps each key to a distinctdistinct integer within some manageable rangeinteger within some manageable range
Finding a perfect hashing function is Finding a perfect hashing function is in most cases in most cases impossibleimpossible
More realisticallyMore realistically Hash function Hash function h(k)h(k) that maps that maps mostmost of of
the keys onto unique integers, but the keys onto unique integers, but not not allall
9
Collisions in a Hash Collisions in a Hash TableTable
A A collisioncollision is the situation when different is the situation when different keys have the same hash valuekeys have the same hash value
h(kh(k11)) == h(kh(k22)) forfor kk11 ≠≠ kk22
When the number ofWhen the number of collisions is collisions is sufficiently small, the hash tables work sufficiently small, the hash tables work quite well (fast)quite well (fast)
Several collisions resolution strategies Several collisions resolution strategies existexist Chaining in a listChaining in a list Using the neighboring slots (linear Using the neighboring slots (linear
probing)probing) Re-hashingRe-hashing ...... 10
Collision Resolution: Collision Resolution: ChainingChaining
h("Pesho") = 4h("Pesho") = 4h("Kiro") = 2h("Kiro") = 2 h("Mimi") = 1h("Mimi") = 1h("Ivan") = 2h("Ivan") = 2h("Lili") = m-1h("Lili") = m-1
Kiro
Ivannull
Mimi
null
Lili
null
Pesho
null
collisioncollision
Chaining Chaining elements in elements in
case of case of collisioncollision
nullnull …… …… nullnull …… …… ……
0 1 2 3 4 … m-1
T
11
Hash Tables and Hash Tables and EfficiencyEfficiency
Hash tables are the most efficient Hash tables are the most efficient implementation of ADT "dictionary"implementation of ADT "dictionary"
Add / Find / Delete take just few Add / Find / Delete take just few primitive operationsprimitive operations Speed does not depend on the size of Speed does not depend on the size of
the hash-table (constant time)the hash-table (constant time)
Example: finding an element in a hash-Example: finding an element in a hash-table with 1 000 000 elements, takes table with 1 000 000 elements, takes just few stepsjust few steps
Finding an element in array of 1 000 000 Finding an element in array of 1 000 000 elements takes average 500 000 stepselements takes average 500 000 steps
12
Dictionaries – Dictionaries – Interfaces and Interfaces and
ImplementationsImplementations
13
Hash Tables in C#Hash Tables in C#The The Dictionary<TKey,TValue>Dictionary<TKey,TValue> Class Class
Dictionary<TKey,TValueDictionary<TKey,TValue>>
Implements the ADT dictionary as hash Implements the ADT dictionary as hash tabletable Size is dynamically increased as neededSize is dynamically increased as needed Contains a collection of key-value pairsContains a collection of key-value pairs Collisions are resolved by chainingCollisions are resolved by chaining Elements have almost random orderElements have almost random order
Ordered by the hash code of the keyOrdered by the hash code of the key Dictionary<TKey,TValue>Dictionary<TKey,TValue> relies on relies on
Object.Object.Equals(Equals()) – for comparing the – for comparing the keyskeys
Object.GetHashCode()Object.GetHashCode() –– for calculating for calculating the hash codes of the keysthe hash codes of the keys
15
Dictionary<TKey,TValue>Dictionary<TKey,TValue> (2)(2)
Major operations:Major operations: Add(TKey,TValue)Add(TKey,TValue) – adds an element with – adds an element with
the specified key and valuethe specified key and value
Remove(TKey)Remove(TKey) – removes the element by – removes the element by keykey
this[]this[] – get/add/replace of element by key – get/add/replace of element by key
Clear()Clear() – removes all elements – removes all elements
CountCount – returns the number of elements – returns the number of elements
KeysKeys – returns a collection of the keys – returns a collection of the keys
ValuesValues – returns a collection of the – returns a collection of the valuesvalues
16
Major operations:Major operations: ContainsKey(TKey)ContainsKey(TKey) – checks whether – checks whether
the dictionary contains given keythe dictionary contains given key
ContainsValue(TValue)ContainsValue(TValue) – checks – checks whether the dictionary contains given whether the dictionary contains given valuevalue
Warning: slow operation!Warning: slow operation!
TryGetValue(TKey,TryGetValue(TKey, outout TValue)TValue)
If the key is found, returns it in the If the key is found, returns it in the TValueTValue
Otherwise returns Otherwise returns falsefalse
Dictionary<TKey,TValue>Dictionary<TKey,TValue> (3)(3)
17
Dictionary<TKey,TValue>Dictionary<TKey,TValue> – – Example Example
Dictionary<string, int> studentsMarks =Dictionary<string, int> studentsMarks = new Dictionary<string, int>();new Dictionary<string, int>();studentsMarks.Add("Ivan", 4);studentsMarks.Add("Ivan", 4);studentsMarks.Add("Peter", 6);studentsMarks.Add("Peter", 6);studentsMarks.Add("Maria", 6);studentsMarks.Add("Maria", 6);studentsMarks.Add("George", 5);studentsMarks.Add("George", 5);
int peterMark = studentsMarks["Peter"];int peterMark = studentsMarks["Peter"];Console.WriteLine("Peter's mark: {0}", peterMark);Console.WriteLine("Peter's mark: {0}", peterMark);Console.WriteLine("Is Peter in the hash table: Console.WriteLine("Is Peter in the hash table: {0}",{0}", studentsMarks.ContainsKey("Peter"));studentsMarks.ContainsKey("Peter"));
Console.WriteLine("Students and grades:");Console.WriteLine("Students and grades:");foreach (var pair in studentsMarks)foreach (var pair in studentsMarks){{ Console.WriteLine("{0} --> {1}", pair.Key, Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);pair.Value);}} 18
Dictionary<TKey,TValuDictionary<TKey,TValue>e>
Live DemoLive Demo
Counting the Words in Counting the Words in a Texta Text
string text = "a text, some text, just some text";string text = "a text, some text, just some text";IDictionary<string, int> wordsCount = IDictionary<string, int> wordsCount = new new DictionaryDictionary<string, int>(); <string, int>();
string[] words = text.Split(' ', ',', '.');string[] words = text.Split(' ', ',', '.');foreach (string word in words)foreach (string word in words){{ int count = 1;int count = 1; if (wordsCount.ContainsKey(word))if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1;count = wordsCount[word] + 1; wordsCount[word] = count;wordsCount[word] = count;}}
foreach(var pair in wordsCount)foreach(var pair in wordsCount){{ Console.WriteLine("{0} -> {1}", pair.Key, Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);pair.Value);}}
20
Balanced Tree Balanced Tree DictionariesDictionariesThe SortedThe SortedDictionary<TKey,TValue>Dictionary<TKey,TValue>
ClassClass
SortedDictionarySortedDictionary<TKey,TValue><TKey,TValue>
SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue> implements the ADT "dictionary" as self-implements the ADT "dictionary" as self-balancing search treebalancing search tree Elements are arranged in the tree ordered Elements are arranged in the tree ordered
by keyby key Traversing the tree returns the elements in Traversing the tree returns the elements in
increasing orderincreasing order Add / Find / Delete perform logAdd / Find / Delete perform log22(n) (n)
operationsoperations Use Use SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue>
when you need the elements sortedwhen you need the elements sorted Otherwise use Otherwise use Dictionary<TKey,TValue>Dictionary<TKey,TValue> – –
it has better performanceit has better performance22
Counting Words (Again)Counting Words (Again)
string text = "a text, some text, just some text";string text = "a text, some text, just some text";IDictionary<string, int> wordsCount = IDictionary<string, int> wordsCount = new new SortedDictionarySortedDictionary<string, int>(); <string, int>();
string[] words = text.Split(' ', ',', '.');string[] words = text.Split(' ', ',', '.');foreach (string word in words)foreach (string word in words){{ int count = 1;int count = 1; if (wordsCount.ContainsKey(word))if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1;count = wordsCount[word] + 1; wordsCount[word] = count;wordsCount[word] = count;}}
foreach(var pair in wordsCount)foreach(var pair in wordsCount){{ Console.WriteLine("{0} -> {1}", pair.Key, Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);pair.Value);}}
23
Comparing Dictionary Comparing Dictionary KeysKeys
Using custom key classes in Using custom key classes in Dictionary<TKey, TValue>Dictionary<TKey, TValue> and and SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue>
IComparable<T>IComparable<T> Dictionary<TKey,TValue>Dictionary<TKey,TValue> relies on relies on
Object.Object.Equals(Equals()) – for comparing the keys – for comparing the keys Object.GetHashCode()Object.GetHashCode() –– for calculating for calculating
the hash codes of the keysthe hash codes of the keys SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue> relies on relies on IComparable<T>IComparable<T> for ordering the keys for ordering the keys
Built-in types like Built-in types like intint, , longlong, , floatfloat, , stringstring and and DateTimeDateTime already implement already implement Equals()Equals(), , GetHashCode()GetHashCode() and and IComparable<T>IComparable<T> Other types used when used as Other types used when used as
dictionary keys should provide custom dictionary keys should provide custom implementationsimplementations
25
Implementing Implementing Equals()Equals() and and GetHashCodeGetHashCode()()
26
public struct Pointpublic struct Point{{ public int X { get; set; }public int X { get; set; } public int Y { get; set; }public int Y { get; set; }
public override bool Equals(Object obj)public override bool Equals(Object obj) {{ if (!(obj is Point) || (obj == null)) return if (!(obj is Point) || (obj == null)) return false;false; Point p = (Point)obj;Point p = (Point)obj; return (X == p.X) && (Y == p.Y);return (X == p.X) && (Y == p.Y); }}
public override int GetHashCode()public override int GetHashCode() {{ return (X << 16 | X >> 16) ^ Y;return (X << 16 | X >> 16) ^ Y; }}}}
Implementing Implementing IComparable<T>IComparable<T>
27
public struct Point : IComparable<Point>public struct Point : IComparable<Point>{{ public int X { get; set; }public int X { get; set; } public int Y { get; set; }public int Y { get; set; }
public int CompareTo(Point otherPoint)public int CompareTo(Point otherPoint) {{ if (X != otherPoint.X)if (X != otherPoint.X) {{ return this.X.CompareTo(otherPoint.X);return this.X.CompareTo(otherPoint.X); }} elseelse {{ return this.Y.CompareTo(otherPoint.Y);return this.Y.CompareTo(otherPoint.Y); }} }}}}
SetsSetsSets of ElementsSets of Elements
Set and Bag ADTsSet and Bag ADTs The abstract data type (ADT) "The abstract data type (ADT) "setset" keeps " keeps
a set of elements with no duplicatesa set of elements with no duplicates Sets with duplicates are also known as Sets with duplicates are also known as
ADT "ADT "bagbag"" Set operations:Set operations:
Add(element)Add(element)
Contains(element)Contains(element) true / falsetrue / false
Delete(element)Delete(element)
Union(set) / Intersect(set)Union(set) / Intersect(set) Sets can be implemented in several waysSets can be implemented in several ways
List, array, hash table, balanced tree, ...List, array, hash table, balanced tree, ...29
Sets – Interfaces and Sets – Interfaces and ImplementationsImplementations
30
HashSet<T>HashSet<T> HashSet<T>HashSet<T> implements ADT implements ADT setset by hash by hash
tabletable
Elements are in no particular orderElements are in no particular order
All major operations are fast:All major operations are fast:
Add(element)Add(element) – appends an element to the set – appends an element to the set
Does nothing if the element already existsDoes nothing if the element already exists
Remove(element)Remove(element) – removes given element – removes given element
CountCount – returns the number of elements – returns the number of elements
UnionWith(set)UnionWith(set) / / IntersectWith(set)IntersectWith(set) – – performs union / intersection with another performs union / intersection with another setset
31
HashSet<T>HashSet<T> – Example – Example
32
ISet<string> firstSet = new HashSet<string>(ISet<string> firstSet = new HashSet<string>( new string[] { "SQL", "Java", "C#", "PHP" });new string[] { "SQL", "Java", "C#", "PHP" });ISet<string> secondSet = new HashSet<string>(ISet<string> secondSet = new HashSet<string>( new string[] { "Oracle", "SQL", "MySQL" });new string[] { "Oracle", "SQL", "MySQL" });
ISet<string> union = new ISet<string> union = new HashSet<string>(firstSet);HashSet<string>(firstSet);union.UnionWith(secondSet);union.UnionWith(secondSet);PrintSet(union); // SQL Java C# PHP Oracle MySQLPrintSet(union); // SQL Java C# PHP Oracle MySQL
private static void PrintSet<T>(ISet<T> set)private static void PrintSet<T>(ISet<T> set){{ foreach (var element in set)foreach (var element in set) {{ Console.Write("{0} ", element);Console.Write("{0} ", element); }} Console.WriteLine();Console.WriteLine();}}
SortedSet<T>SortedSet<T>
SortedSet<T>SortedSet<T> implements ADT implements ADT setset by by balanced search treebalanced search tree
Elements are sorted in increasing Elements are sorted in increasing orderorder
Example:Example:
33
ISet<string> firstSet = new SortedSet<string>(ISet<string> firstSet = new SortedSet<string>( new string[] { "SQL", "Java", "C#", "PHP" });new string[] { "SQL", "Java", "C#", "PHP" });ISet<string> secondSet = new SortedSet<string>(ISet<string> secondSet = new SortedSet<string>( new string[] { "Oracle", "SQL", "MySQL" });new string[] { "Oracle", "SQL", "MySQL" });ISet<string> union = new ISet<string> union = new HashSet<string>(firstSet);HashSet<string>(firstSet);union.UnionWith(secondSet);union.UnionWith(secondSet);PrintSet(union); // C# Java PHP SQL MySQL OraclePrintSet(union); // C# Java PHP SQL MySQL Oracle
HashSet<T>HashSet<T> and and SortedSet<T>SortedSet<T>
Live DemoLive Demo
SummarySummary Dictionaries map key to valueDictionaries map key to value
Can be implemented as hash table or Can be implemented as hash table or balanced search treebalanced search tree
Hash-tables map keys to valuesHash-tables map keys to values Rely on hash-functions to distribute the Rely on hash-functions to distribute the
keys in the tablekeys in the table Collisions needs resolution algorithm Collisions needs resolution algorithm
(e.g. chaining)(e.g. chaining) Very fast add / find / deleteVery fast add / find / delete
Sets hold a group of elementsSets hold a group of elements Hash-table or balanced tree Hash-table or balanced tree
implementationsimplementations 35
Hash Tables and SetsHash Tables and Sets
Questions?Questions?
http://academy.telerik.com
ExercisesExercises1.1. Write a program that counts in a given Write a program that counts in a given
array of integers the number of array of integers the number of occurrences of each integer. Use occurrences of each integer. Use Dictionary<TKey,TValue>Dictionary<TKey,TValue>..
Example: array = {Example: array = {33, , 44, , 44, , 22, , 33, , 33, , 44, , 33, , 22}}
22 22 times times
33 44 times times
44 33 times times
2.2. Write a program that extracts from a given Write a program that extracts from a given sequence of strings all elements that sequence of strings all elements that present in it odd number of times. Example:present in it odd number of times. Example:
{C#, SQL, PHP, PHP, SQL, SQL } {C#, SQL, PHP, PHP, SQL, SQL } {C#, SQL} {C#, SQL}
37
Exercises (2)Exercises (2)3.3. Write a program that counts how many Write a program that counts how many
times each word from given text file times each word from given text file words.txtwords.txt appears in it. The character appears in it. The character casing differences should be ignored. The casing differences should be ignored. The result words should be ordered by their result words should be ordered by their number of occurrences in the text. number of occurrences in the text. Example:Example:
is is 2 2
the the 2 2
this this 3 3
text text 6 6 38
This is the TEXT. Text, text, text – THIS This is the TEXT. Text, text, text – THIS TEXT! Is this the text?TEXT! Is this the text?
Exercises (3)Exercises (3)
39
4.4. Implement the data structure "Implement the data structure "hash tablehash table" in a " in a class class HashTable<K,T>HashTable<K,T>. Keep the data in array . Keep the data in array of lists of key-value pairs of lists of key-value pairs ((LinkedList<KeyValuePair<K,T>>[]LinkedList<KeyValuePair<K,T>>[]) with initial ) with initial capacity of capacity of 1616. When the hash table load runs . When the hash table load runs over over 7575%, perform resizing to %, perform resizing to 22 times larger times larger capacity. Implement the following methods and capacity. Implement the following methods and properties: properties: Add(key,Add(key, value)value), , Find(key)Find(key)valuevalue, , Remove( key)Remove( key), , CountCount, , Clear()Clear(), , this[]this[], , KeysKeys. Try . Try to make the hash table to support iterating to make the hash table to support iterating over its elements with over its elements with foreachforeach..
5.5. Implement the data structure "Implement the data structure "setset" in a class " in a class HashedSet<T>HashedSet<T> using your class using your class HashTable<T,T>HashTable<T,T> to hold the elements. Implement all standard to hold the elements. Implement all standard set operations like set operations like Add(T)Add(T), , Find(T)Find(T), , Remove(T)Remove(T), , CountCount, , Clear()Clear(), union and intersect., union and intersect.
top related