a knowledge sharing session on
DESCRIPTION
A Knowledge Sharing Session on. Unit IV: Tables (DSPS). Syllabus: Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Hash Tables: Basic Concepts, Hash Function, Hashing - PowerPoint PPT PresentationTRANSCRIPT
A Knowledge Sharing Session on
Unit IV: Tables (DSPS)
1
Syllabus: Symbol Tables: Static and dynamic tree tables,
AVL trees, AVL Tree Implementation, Algorithms
and analysis of AVL Tree
Hash Tables: Basic Concepts, Hash Function,
Hashing methods, Collision resolution, Bucket hashing,
Dynamic Hashing.
Tables |Unit IV of DSPS (SE-Comp)
2
Part I : Symbol Tables
Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree.
Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.
Part II: Hash Tables
3
Symbol Table Examples
AVL Tree
AVL Implementation
AVL Algorithm Analysis
Symbol Table | Why Symbol Table
What Compiler Does?
• Lexical analysis– Detects inputs with illegal tokens • e.g.: main$ ();
• Parsing– Detects inputs with ill-formed parse trees • e.g.: missing semicolons
• Semantic analysis– Last “front end” phase– Catches all remaining errors
Symbol Table
4
Symbol Table | Why Symbol Table
Typical Semantic Errors
• multiple declarations: a variable should be declared (in the same region) at most once.
• undeclared variable: a variable should not be used before being declared.
• type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side.
• wrong arguments: methods should be called with the right number and types of arguments.
5
Symbol Table | Aim of Symbol Table
Purpose of Symbol Table
– keep track of names declared in the program
– names of• variables, classes, fields, methods,
6
Symbol Table | Symbol Table Stores
What it Contains
associates a name with a set of attributes, e.g.:
• kind of name (variable, class, field, method, etc)
• type (int, float, etc)
• nesting level
• memory location (i.e., where will it be found at runtime).
7
Symbol Table | Symbol Table Revisit
In Short,
During Lexical Analysis --Finds Symbols--Adds Symbols to symbol table
During Syntactic Analysis--Information about each symbol is filled in
During Semantic Analysis--Used for type checking.
8
Symbol Table | Symbol Table Important?
Info Provided by Symbol Table,• Given an Identifier which name is it?
• What information is to be associated with a name? (Actual Characters of the name, Type, Storage allocation info (number of bytes), Line number where declared, Lines where referenced, Scope.
• How do we access this information?
• How do we associate this information with a name?
9
Symbol Table | Reminder on Symbol Table
Note,
• A name can represent– Variable– Type– Constant– Parameter– Record– Record Field– Procedure– Array– Label– file
10
Symbol Table
Operations on Symbol Table
determining whether a string has alreadybeen stored
inserting an entry for a string
deleting a string when it goes out of scope
This requires three functions:
1. lookup(s): returns the index of the entry forstring s, or 0 if there is no entry2. insert(s): add a new entry for string s and return its index3. delete(s): deletes s from the table (or, typically,hides it)
11
Symbol Table |Operations on Symbol Table
Symbol Table
Example
01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}
12
Symbol Table | Symbol Table Examples
Symbol Table Unsorted List
01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}
Name Characteristic Class Scope Other AttributesDeclared Referenced Other
Main Program 0 Line 1a Variable 0 Line 2 Line 11b Variable 0 Line 2 Line 7P Procedure 0 Line 3 Line 11 1, parameter, xx Parameter 1 Line 3 Line 8a Variable 1 Line 4 Line 6
nOLook up Complexity
13
Symbol Table Sorted List
01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}
nO logLook up Complexity
Name Characteristic Class Scope Other AttributesDeclared Referenced Other
a Variable 0 Line 2 Line 11a Variable 1 Line 4 Line 6b Variable 0 Line 2 Line 7Main Program 0 Line 1P Procedure 0 Line 3 Line 11 1, parameter, xx Parameter 1 Line 3 Line 8
nOWorst Case:
14
Two issues:
1. Interface: how to use symbol tables
2. Implementation: how to implement it.
15
Basic Implementation Techniques
Considerations:
Number of names
Storage space
Retrieval time
16
<1> unordered list (linked list/array)
<2> ordered list» binary search on arrays» expensive insertion
(+) good for a fixed set of names(e.g. reserved words, assembly opcodes)
<3> binary search tree» On average, searching takes O(log(n)) time.» However, names in programs are not chosen
randomly.
<4>AVL:<5> Hash table: most common
(+) constant time 17
Static Tree TableIf Symbols are known in advance :
No insertion and Deletion allowed Cost of searching symbols of higher frequency
should be small.• Huffman tree and OBST
if
do Read
while
Fig: Optimal Search Tree when frequency of symbols are specified
0
0
0
0
1
11
1abc
de
Fig: Huffman Tree 18
Dynamic Tree TablesSymbols are inserted as and when they
comeDeletion is also possibleAVL
32 60
20 45 55 68
50 bst
19
Part I : Symbol Tables
Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree
Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.
Part II: Hash Tables
20
Where Hashing will be Used?1. docDict2. Database3. Compliers 4. Network Router and Servers5. Substring Search6. Cryptography
Hash Table| Motivation
21
Motivation
Hashing Methods
Collision Resolution
Symbol Table | Why Hash Table
A Problem?
• We have to store some records and perform the following:
add new recorddelete recordsearch a record by
key
Find a way to do these efficiently!
Hashing
22
Use an array to store the records, in unsorted order1. add - add the records as the last entry
fast O(1)
2. delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n)
3. search - sequential search slow O(n)
Hash Table| Unsorted Array
23
Use an array to store the records, keeping them in sorted order1. add - insert the record in proper
position. much record movement slow O(n)
2. delete a target - how to handle the hole after deletion? Much record movement slow O(n)
3. search - binary search fast O(log n)
Hash Table| Sorted Array
24
Store the records in a linked list (unsorted) 1. add - fast if one can insert node
anywhere O(1)2. delete a target - fast at disposing the
node, but slow at finding the target O(n)
3. search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)
Hash Table| Linked List
25
What is the Solution then?have better performance but are more
complex
1. Hash table
2. Tree (BST, Heap, …)
Hash Table| More Approaches
26
Array as table?
Hash Table| More Approaches
27
9903030
98020209801010
0056789
00123450033333
tushar
manalipeter
david
sandybubli
73
10020
56.8
81.590
studid name score
9908080 Namrata 49
...
...
Hash Table| Array as table?
28
:33333
:12345
0:
:betty
:andy
:
:90:
81.5:
name score
56789 david 56.8
:9908080
::
:bill::
:49::
9999999
One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]
One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]
Hash Table| Whats Wrong Then?
29
Consider this problem. We want to store 1,000 student records and search them by student id.
Consider this problem. We want to store 1,000 student records and search them by student id.
One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]
One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]
1. Keys may not be nonnegative integers.
2. Gigantic Memory hog
Hash Table| What's Wrong Then?
30
1. Keys may not be nonnegative integers.
Solution: Prehash
2. Gigantic Memory hogSolution: Direct Hash Table
(reduce universe of all keys to reasonable size)
Hash Table| What's Wrong Then?
31
• Each slot, or position, corresponds to a key in U.
• If there’s an element x with key k, then T [k] contains a pointer to x.
• Otherwise, T [k] is empty, represented by NIL.
Hash Table| Direct Hashing Table
32
Store the records in a huge array where the index corresponds to the keyadd - very fast O(1) delete - very fast O(1) search - very fast O(1)
Hash Table| Direct Hashing Table
33
Hash Table| Hash function
34
function Hash(key: KeyType): integer;
Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.
Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.
H(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9908080’) = 3
Hash Table| Hash Table
35
:betty
:bill:
:90:
49:
name score
andy 81.5
::
david:
::
56.8:
:0033333
:9908080
:
0012345
::
0056789:
3
67
0
764
999
134
To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).
To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).
Ex: key mod size 2201 mod 1000 =201
Hash Table| Division Method
36
h(k) = k mod m
different keys map to the same indexi.e h(k1)=h(k2)=i (k1!=K2)
Ex: 5 mod 11 and 27 mod 11 have index 5.
Hash Table| Collision
37
HashingWidely useful technique for
implementing dictionariesConstant time per operation (on the
average)Best Case O(1)Worst Case O(n)
KeyRecord
f()=>address
01
23
45
38
Ch s Hash FunctionQuick ComputationI t should spread keys evenly:
Uniform DistributionAvoid collisionVery rare cases
E.g Birth day paradox
39
Hash FunctionsDirect hashingDigit ExtractionModulo –division methodMid-square MethodFolding method
40
1. Hashing with Separate Chaining (Open hashing)-unlimited space
2. Hashing with Open Addressing(closed hashing)
Hash Table|-Collision Resolution DS
41
Hash Table|-Collision Resolution Strategies
42
Separate chaining Open Addressing
Linear Probing Quadratic Probing Double Hashing
LP with chainingLP without chaining
LP WC without replacement
LPWC with replacement
Hash Table| Chained Hash Table
43
2
4
10
3
nil
nilnil
5
nil
:
HASHMAX Key: 9903030name: tomscore: 73
One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.
One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.
Is required:• When table is completely full• With quadratic probing when table is
filled half • When insertion fail due to overflow
• Size get double after rehashing• Mod value changed to new size* Very costly as new table creation, insertion from old table with using new hash fun.
Hash Table| Rehashing
44
It’s more efficient when load factor is >=70%
Whr l is load factor=
l=h/t whr h is total mapped loc
t is total loc.
Hash Table| Rehashing
45
Types of Linear Probing (with chaining with and without replacement
Note: Try to Solve all example that is taken in class on transparencies and on board ……you can take it from book…
46
Extendible Hashing• All tech. so far are used for small data• When data becomes bulky there will be too
many disk access• So in that case use extendible hashing• This uses binary (disk) coding to mapped the
loc with binary values.– 4 size hash table with 4 slot– 00– 01– 10– 11 47
**Implementation:
• Followings are some example how to create structure and apply hash function on it…
1. Linear Probing with store and search2. Double hashing 3. Quadratic probing
48
Linear Probeint search_LP(int hashtable[],int key,int T[]){ int I,j;
J=key%max;// mapped locfor(i=0;i<MAX;i++){
if(T[j]==0){hashtable[j]=key; T[j]=1;return(j);}
j=(j+1)%MAX;//next loc in circular way.}
return(-1);}
49
Search in LP
Only change if condition checking
for{If(T[j]==1 && hashtable[j]==key)
{ return(j);
}}
50
Double hashing
51
int search_DH(int hashtable[],int T[]){
int I,j,start;start=f1(key)%max; // 1st mapped locu=f2(key); // u will used for incrementfor(i=0;i<MAX;i++){
j=(start+ i*u)%max; if(T[j]==0) // found empty{ hashtable[j]=key; T[j]=1; return(j);}
}return(-1);}
Quadratic hashing
52
int search_QP(int hashtable[],int T[]){
int I,j,start;start=key%max; // 1st mapped locfor(i=0;i<MAX;i++){ j=(start+ i*i)%max;
if(T[j]==0) // found empty{ hashtable[j]=key; T[j]=1; return(j);}
}return(-1);}