1 types type = a set of values operations that are allowed on these values. why? to generate better...

26
1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors To improve expressiveness (see overloading)

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

1

Types

Type = a set of values operations that are allowed on these values.

Why? To generate better code, with less runtime overhead To avoid runtime errors To improve expressiveness (see overloading)

Page 2: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

2

Type

Type system for a programming language = set of types AND rules that specify how a typed program is allowed to behave

Actions Type checking

Given an operation and an operand of some type, determine whether the operation is allowed on that operand

Type inference Given the type of operands, determine

the meaning of the operation the type of the operation

OR, without variable declarations, infer type from the way the variable is used

Page 3: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

3

Issues in typing

Does the language have a type system? Untyped languages (e.g. assembly) have no type

system at all When is typing performed?

Static typing: At compile time Dynamic typing: At runtime

How strictly are the rules enforced? Strongly typed: No exceptions Weakly typed: With well-defined exceptions

Type equivalence & subtyping When are two types equivalent?

What does "equivalent" mean anyway? When can one type replace another?

Page 4: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

4

Components of a type system

Built-in types Rules for constructing new types

Where do we store type information?

Rules for determining if two types are equivalent

Rules for inferring the types of expressions

Page 5: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

5

Component: Built-in types

These are the basic types. Usually,

integer usual operations: standard arithmetic

floating point usual operations: standard arithmetic

character character set generally ordered lexicographically usual operations: (lexicographic) comparisons

boolean usual operations: not, and, or, xor

Page 6: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

6

Component: type constructors

Arrays array(I,T) denotes the type of an array with elements of type T,

and index set I multidimensional arrays are just arrays where T is also an array operations: element access, array assignment, products

Strings bitstrings, character strings operations: concatenation, lexicographic comparison

Products Groups of multiple objects of different types

essentially, Cartesian product of types (useful in functions later)

Records (structs) Groups of multiple objects of different types where the

elements are given specific names. How is a recursive type definition handled?

Page 7: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

7

Component: type constructors

Pointers addresses operations: arithmetic, dereferencing, referencing issue: equivalency

Function types A function such as "int add(real, int)" has type

realintint

Page 8: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

8

Component: type equivalence Name equivalence

Two types are equivalent only when they have the same name.

Why? Loose vs. strict

Structural equivalence Two types are equivalent when they have the same structure

Should records require member name identity to be equivalent?

Does the order of record fields matter? How about Array(int, 1..10) and Array(int, 1..20). Should we

consider the bounds or just the element type? How about recursively defined types?

Example C uses structural equivalence for structs and name

equivalence for arrays/pointers.

Page 9: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

9

Component: type equivalence

Type coercion If x is float, is x=3 acceptable?

Disallow Allow and implicitly convert 3 to float. "Allow" but require programmer to explicitly convert 3

to float How do we convert?

Reinterpret bit sequence Build new object

What should be allowed? float to int ? int to float ? What if both types are equally general? What if multiple coercions are possible?

Consider 3 + "4" in a language that can convert strings to integers.

Page 10: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

10

Overloading

Same operation name but different effect on different types

For an overloaded function f resolve arguments look up f in symbol table to get list of all visible

"versions" ignore those with wrong parameter types.

Page 11: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

11

A simple type checker

We can use an attribute grammar to implement a simple type checker.

Enum {E.type = integer}EE1+E2 {E.type = (E1.type == integer && E2.type == integer)?

integer : error}EE1[E2] {E.type = (E1.type == array(int, T) && E2.type == integer)?

T : error}E*E1 {E.type = (E1.type == pointer(T))? T : error}EE1(E2) {E.type = (E1.type == (ST) && E2.type == S)? T : errorS if E then S1 {S.type = (E.type == boolean)? void : error}

Page 12: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

12

Inference rules The formal notation for type checking/inference is rules of

inference They have the form:

If we can prove the hypotheses are all true in the existing environment, then the conclusion is also true.

For example:

In English: If expressions E1 and E2 both have type int, then E1+E2 also has type int.

Environment Hypotheses

Environment Conclusions

A E1: int A E2: int

A E1 + E2 : int

Page 13: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

13

Inference rules

The inference rules give us templates that describe how to type various expressions.

We can use the templates to prove that an expression has a valid type (and find what that type is)

The construct parallels the parse tree. Short example:

Type the expression x+a[i] under the assumption that x is an char, a[i] is an array of chars and i is an int.

The environment is A={x:char, a:array(int, char), i:char}

A E1: array(T,S) , A E2: T

A E1[E2] : S[ARRAY]

A E1: char , A E2: charA E1 + E2 : char [ADD]

Page 14: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

14

Inference rules

[ADD]A |– x:char

A |– i:int A |– a: array(int,char)

A |– a[i]: char

A |– x+a[i]: char

Page 15: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

15

Types

Where do we store type information? The symbol table

In general, the symbol table contains information about

variables functions class names type names temporary variables etc.

Do we need a separate symbol table for types?

Page 16: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

16

Symbol Tables

What kind of information is usually stored in a symbol table? type storage class size scope stack frame offset register

Page 17: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

17

Symbol Tables

How is a symbol table implemented? array

simple, but linear LookUp time However, we may use a sorted array for reserved

words, since they are generally few and known in advance.

tree O(lgn) lookup time if kept balanced

hash table most common implementation O(1) LookUp time

Page 18: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

18

Revisiting hash tables (cs311)

Hash tables use array of size m to store elements given key k (the identifier name), use a function h to

compute index h(k) for that key collisions are possible

two keys hash into the same slot.

Hash functions A good hash function

is easy to compute avoids collisions (by breaking up patterns in the keys

and uniformly distributing the hash values)

Page 19: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

19

Revisiting hash tables (cs311)

Hash functions A common hash function is

h(k) = m*(k*c- k*c), for some constant 0<c<1 In English

multiply the key k by the constant c Take the fractional part of k*c Multiply that by size m Take the floor of the result

A good value for c:

2

15

Page 20: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

20

Revisiting hash tables (cs311)

Different elements may still hash into the same slot. How do we resolve collisions? Chaining

Put all the elements that collide in a chain (list) attached to the slot.

Insert/Delete/Lookup in expected O(1) time However, this assumes that the chains are kept

small. If the chains start becoming too long, the table

must be enlarged and all the keys rehashed.

Page 21: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

21

Revisiting hash tables (cs311)

Different elements may still hash into the same slot. How do we resolve collisions? Open addressing

Store all elements within the table The space we save from the chain pointers is used

up to make the array larger. If there is a collision, probe the table in a

systematic way to find an empty slot. If the table fills up, we need to enlarge it and

rehash all the keys. Open addressing with linear probing

Probe the slots in a linear manner Simple but Bad: results in clustering (long

sequences of used slots build up very fast)

Page 22: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

22

Revisiting hash tables (cs311)

Open addressing with double hashing Use a second hash function. The probe sequence is:

(h(k) + i*h2(k) ) mod m, with i=0, 1, 2, ... Good performance

Since we use a second function, keys that originally collide will subsequently have different probe sequences.

No clustering A good choice for h2(k) is p-(k mod p) where p is a

prime less than m

Page 23: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

23

Scope issues

Declaration before use promotes one-pass compiling

Block-structured languages allow nested name scopes.

Usual visibility rules Only names created in the current or enclosing

scopes are visible When there is a conflict, the innermost declaration

takes precedence. What about "same-level" declarations?

Page 24: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

24

Scope issues

One idea is to have a global symbol table and save the scope information for each entry. When an identifier goes out of scope, scan the table

and remove the corresponding entries We may even link all same-scope entries together

for easier removal. Careful: deleting from a hash table that uses open

addressing is tricky We must mark a slot as Deleted, rather than

Empty, otherwise later LookUp operations may fail.

Alternatively, we can maintain a separate, local table for each scope.

Page 25: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

25

Structure tables

Where should we store struct field names? Separate mini symbol table for each struct

Conceptually easy Separate table for all struct field names

We need to somehow uniquely map each name to its structure (e.g. by concatenating the field name with the struct name)

No special storage struct field names are stored in the regular

symbol table. Again we need to be able to map each name to its

structure.

Page 26: 1 Types Type = a set of values operations that are allowed on these values. Why? To generate better code, with less runtime overhead To avoid runtime errors

26

Static vs. Dynamic scope

What we've seen so far is static scope: Scoping follows the structure of the program

An alternative is dynamic scope, where scoping follows the execution path. Example:

int i = 1;void func() { cout << i << endl;}int main () { int i = 2; func(); return 0;}

If C++ used dynamicscoping, this would print out 2, not 1.