1 symbol tables symbol tables are used by compilers to keep track of information about –variables...

22
1 Symbol Tables Symbol tables are used by compilers to keep track of information about – variables – functions class names type names temporary variables – etc. Typical symbol table operations are Insert, Delete and Search It's a dictionary structure!

Upload: dominick-matthews

Post on 28-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

1

Symbol Tables

Symbol tables are used by compilers to keep track of information about– variables– functions– class names– type names– temporary variables– etc.

Typical symbol table operations are Insert, Delete and Search– It's a dictionary structure!

Page 2: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

2

Symbol Tables

What kind of information is usually stored in a symbol table?– type– storage class– size– scope– stack frame offset– register

We also need a way to keep track of reserved words.

Page 3: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

3

Symbol Tables

Where is a symbol table stored?– array/linked list

• simple, but linear lookup time• However, we may use a sorted array for reserved

words, since they are generally few and known in advance.

– balanced tree• O(lgn) lookup time

– hash table• most common implementation• O(1) amortized time for dictionary operations

Page 4: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

4

Hashing

Hash tables– use array of size m to store elements– given key k (the identifier name), use a function h to

compute index h(k) for that key– collisions are possible

• two keys hash into the same slot.

Hash functions– A good hash function

• is easy to compute• avoids collisions (by breaking up patterns in the keys and

uniformly distributing the hash values)

Page 5: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

5

Hashing

In the following slides:– k is a key– h(k) is the hash function– m is the size of the hash table– n is the number of keys in the hash table

Page 6: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

6

Hashing

What makes a good hash function?

– It is easy to compute

– It minimizes collisions.• hash = to chop into small pieces (Merriam-

Webster) = to chop any patterns in the

keys so that the results are uniformly distributed (cs311)

Page 7: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

7

Hashing

When the key is a string, we generally use the ASCII values of its characters in some way:

Examples for k = c1c2c3...cx

– h(k) = (c1128x-1+c2128x-2+...+cx1280) mod m

– h(k) = (c1+c2+...+cx) mod m

– h(k) = (h1(c1)+h2(c2)+...hx(cx)) mod m, where each hi is an independent hash function.

Page 8: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

8

Hash functions

Truncation Ignore part of the key and use the

remaining part directly as the index. Example: if the keys are 8-digit numbers

and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function.

Not a very good method : does not distribute keys uniformly

Page 9: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

9

Hash functions

Folding Break up the key in parts and combine

them in some way. Example : if the keys are 9 digit numbers,

break up a key into three 3-digit numbers and add them up.

Page 10: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

10

Hash functions

Middle square Compute k*k and pick some digits from the

resulting number. Example : given a 9-digit key k, and a hash

table of size 1000 pick three digits from the middle of the number k*k.

Works fairly well in practice if the keys do not have many leading or trailing zeroes.

Page 11: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

11

Hash functions

Division h(k)=k mod m Fast Not all values of m are suitable for this. For

example powers of 2 should be avoided because then k mod m is just the least significant digits of k

Good values for m are prime numbers .

Page 12: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

12

Hash functions

Multiplication h(k)=m (k c- k c) , 0<c<1 In English :

– Multiply the key k by a constant c, 0<c<1– Take the fractional part of k c– Multiply that by m– Take the floor of the result

The value of m does not make a difference Some values of c work better than others A good value is

2

)15(

Page 13: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

13

Hash functions

Multiplication Example:

Suppose the size of the table, m, is 1301.

For k=1234, h(k)=850

For k=1235, h(k)=353

For k=1236, h(k)=115

For k=1237, h(k)=660

For k=1238, h(k)=164

For k=1239, h(k)=968

For k=1240, h(k)=471

nice distribution!

Page 14: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

14

Hash functions

Universal Hashing Worst-case scenario: The chosen keys all

hash to the same slot. This can be avoided if the hash function is not fixed:– Start with a collection of hash functions– Select one at random and use that.– Good performance on average: the probability

that the randomly chosen hash function exhibits the worst-case behavior is very low.

Page 15: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

15

Load factor

Given a hash table of size m, and n elements stored in it, we define the load factor of the table as =n/m

The load factor gives us an indication of how full the table is.

The possible values of the load factor depend on the method we use for resolving collisions.

Page 16: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

16

Resolving collisions: Chaining

Chaining * – Put all the elements that collide in a chain

(list) attached to the slot. The hash table is an array of linked lists The load factor indicates the average

number of elements stored in a chain. It could be less than, equal to, or larger than 1.

* a.k.a. closed addressing

Page 17: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

17

Resolving collisions: Chaining

Insert/Delete/Lookup in expected O(1) time– Keep the list doubly-linked to facilitate

deletions– Worst case of lookup time is linear.

However, this assumes that the chains are kept small. – If the chains start becoming too long, the

table must be enlarged and all the keys rehashed.

Page 18: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

18

Assumption: simple uniform hashing – any given key is equally likely to hash into

any of the m slots Analysis of unsuccessful search:

– average time to search unsuccessfully for key k = the average time to search to the end of a chain.

– The average length of a chain is .– Total (average) time required : (1+ )

Resolving collisions: Chaining

Page 19: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

19

Analysis of successful search:– Expected number e of elements examined

during a successful search for key k = one more than the expected number of elements examined when k was inserted.

• it makes no difference whether we insert at the beginning or the end of the list.

– Take the average, over the n items in the table, of 1 plus the expected length of the chain to which the i th element was added:

Resolving collisions: Chaining

Page 20: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

20

mm

i

ne

n

i 2

1

21...

11

1

1

Total time : (1+ )

Resolving collisions: Chaining

Page 21: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

21

Both types of search take (1+ ) time on average.

If n=O(m), then =O(1) and the total time for Search is O(1) on average

Insert : O(1) in the worst case Delete : O(1) in the worst case

Resolving collisions: Chaining

Page 22: 1 Symbol Tables Symbol tables are used by compilers to keep track of information about –variables –functions –class names –type names –temporary variables

22

Storage for the elements may be allocated and deallocated within the hash table itself by linking all unused slots into a free list.

Insert:– if key k hashes into empty slot h(k), put it there and

set a flag to indicate that this is the actual position where the element hashed.

– if h(k) is not empty, and the element k1 it contains has its flag set, then use a slot off the free list to store k1. Its flag should be unset.

– if h(k) is not empty, and the element k1 it contains has its flag unset, then move k1 to another slot and store k in h(k).

Resolving collisions: Chaining