semantic analysis. - 1 - semantic analysis v lexically and syntactically correct programs may still...
TRANSCRIPT
Semantic Analysis
- 2 -
Semantic Analysis
Lexically and syntactically correct programs may still contain other errors
Lexical and syntax analyses are not powerful enough to ensure the correct usage of variables, objects, functions, ...
Semantic analysis: Ensure that the program satisfies a set of rules regarding the usage of programming constructs (variables, objects, expressions, statements)
- 3 -
Class ProblemClassify each error as lexical, syntax, semantic, or correct.
int a;a = 1.0;
int a; bb = a;
{ int a; a = 1;}{ a = 2;}
in a;a = 1;
int foo(int a){ foo = 3;}
1int x;x = 2; int foo(int a)
{ a = 3;}
- 4 -
Categories of Semantic Analysis
Examples of semantic rules» Variables must be defined before being used
» A variable should not be defined multiple times
» In an assignment stmt, the variable and the expression must have the same type
» The test expr. of an if statement must have boolean type
2 major categories» Semantic rules regarding scopes
» Semantic rules regarding types
Semantic Analysis I Scope Analysis
- 6 -
Scope Information
Characterizes the declaration of identifiers and the portions of the program where it is allowed to use each identifier» Example identifiers: variables, functions, objects,
labels
Lexical scope: textual region in the program» Examples: Statement block, formal argument list,
object body, function or method body, source file, whole program
Scope of an identifier: The lexical scope its declaration refers to
- 7 -
Variable Scope
Scope of variables in statement blocks:
Scope of global variables: current file Scope of external variables: whole
program
{ int a; ...
{int b;...}
....}
scope of variable a
scope of variable b
- 8 -
Function Parameter and Label Scope
Scope of formal arguments of functions:
Scope of labels:
int foo(int n) { ...}
void foo() { ... goto lab; ... lab: i++; ... goto lab; ...}
scope of label lab, Note inAnsi-C all labels have functionscope regardless of where they are
scope of argument n
- 9 -
Scope in Class Declaration
Scope of object fields and methods:
class A { public:
void f() {x=1;}...
private:int x;...
}
scope of variable x andmethod f
- 10 -
Semantic Rules for Scopes
Main rules regarding scopes:» Rule 1: Use each identifier only within its scope
» Rule 2: Do not declare identifier of the same kind with identical names more than once in the same lexical scope
class X { int X; void X(int X) {
X: ...goto X;
}}
int X(int X) { int X; goto X; {
int X;X: X = 1;
}}
Are these legal? If not,identify theillegal portion.
- 11 -
NAME KIND TYPE ATTRIBUTESfoo func int,int int externm arg intn arg int consttmp var char const
Symbol Tables
Semantic checks refer to properties of identifiers in the program – their scope or type
Need an environment to store the information about identifiers = symbol table
Each entry in the symbol table contains:» Name of an identifier
» Additional info about identifier: kind, type, constant?
- 12 -
Scope Information
How to capture the scope information in the symbol table?
Idea:» There is a hierarchy of scopes in the program» Use similar hierarchy of symbol tables» One symbol table for each scope» Each symbol table contains the symbols
declared in that lexical scope
- 13 -
Example
int x;
void f(int m) { float x, y; ... {int i, j; ....; } {int x; l: ...; }}
int g(int n) { char t; ... ;}
x var intf func int voidg func int int
m arg intx var floaty var float
n arg intt var char
i var intj var int
x var intl label
Global symtab
func fsymtab
func gsymtab
- 14 -
Identifiers with Same Name
The hierarchical structure of symbol tables automatically solves the problem of resolving name collisions» E.g., identifiers with the same name and
overlapping scopes To find which is the declaration of an
identifier that is active at a program point:» Start from the current scope» Go up the hierarchy until you find an identifier
with the same name
- 15 -
Class Problem
int x;void f(int m) { float x, y; ... {int i, j; x=1; } {int x; l: x=2; }}
int g(int n) { char t; x=3;}
x var intf func int voidg func int int
m arg intx var floaty var float
n arg intt var char
i var intj var int
x var intl label
Global symtabAssociate eachdefinition of xwith its appropriatesymbol table entry
- 16 -
Catching Semantic Errors
int x;void f(int m) { float x, y; ... {int i, j; x=1; } {int x; l: i=2; }}
int g(int n) { char t; x=3;}
x var intf func int voidg func int int
m arg intx var floaty var float
n arg intt var char
i var intj var int
x var intl label
Global symtab
i=2
Error!undefinedvariable
- 17 -
Symbol Table Operations
Two operations:» To build symbol tables, we need to insert new
identifiers in the table
» In the subsequent stages of the compiler we need to access the information from the table: use lookup function
Cannot build symbol tables during lexical analysis» Hierarchy of scopes encoded in syntax
Build the symbol tables:» While parsing, using the semantic actions
» After the AST is constructed
- 18 -
Forward References
Use of an identifier within the scope of its declaration, but before it is declared
Any compiler phase that uses the information from the symbol table must be performed after the table is constructed
Cannot type-check and build symbol table at the same time
Example class A { int m() {return n(); } int n() {return 1; }}
Semantic Analysis II Type Checking
- 20 -
Type Information
What are types?» They describe the values computed during the
execution of the program» Essentially they are a predicate on values
E.g., “int x” in C means –2^31 <= x < 2^31 Type Information: Describes what kind of values
correspond to different constructs: variables, statements, expressions, functions, etc.» variables: int a; integer
» expressions: (a+1) == 2 boolean
» statements: a = 1.0; floating-point
» functions: int pow(int n, int m) int = int,int
- 21 -
Type Checking
Type Errors: improper or inconsistent operations during program execution
Type-safety: absence of type errors Type Checking: Set of rules which ensures
the type consistency of different constructs in the program
- 22 -
How to Ensure Type-Safety
Bind (assign) types, then check types Type binding: defines type of constructs in
the program (e.g., variables, functions)» Can be either explicit (int x) or implicit (x=1)
Type checking: determine if the program correctly uses the type bindings» Consists of a set of type-checking rules
- 23 -
Type Checking
Semantic checks to enforce the type safety of the program
Examples» Unary and binary operators (e.g. +, ==, [ ]) must
receive operands of the proper type
» Functions must be invoked with the right number and type of arguments
» Return statements must agree with the return type
» In assignments, assigned value must be compatible with type of variable on LHS
» Class members accessed appropriately
- 24 -
4 Concepts Related to Types/Languages
1. Static vs dynamic checking» When to check types
2. Static vs dynamic typing» When to define types
3. Strong vs weak typing» How many type errors
4. Sound type systems» Statically catch all type errors
- 25 -
Static vs Dynamic Checking
Static type checking» Perform at compile time
Dynamic type checking» Perform at run time (as the program executes)
Examples of dynamic checking» Array bounds checking» Null pointer dereferences
- 26 -
Static vs Dynamic Typing
Static and dynamic typing refer to type definitions (i.e., bindings of types to variables, expressions, etc.)
Static typed language» Types defined at compile-time and do not change
during the execution of the program C, C++, Java, Pascal
Dynamically typed language» Types defined at run-time, as program executes
Lisp, Smalltalk
- 27 -
Strong vs Weak Typing
Refer to how much type consistency is enforced
Strongly typed languages» Guarantee accepted programs are type-safe
Weakly typed languages» Allow programs which contain type errors
These concepts refer to run-time» Can achieve strong typing using either static
or dynamic typing
- 28 -
- 29 -
Soundness
Sound type systems: can statically ensure that the program is type-safe
Soundness implies strong typing Static type safety requires a conservative
approximation of the values that may occur during all possible executions» May reject type-safe programs» Need to be expressive: reject as few type-safe
programs as possible
- 30 -
Class Problem
Strong Typing Weak Typing
Static Typing
Dynamic Typing
Classify the following languages: C, C++, Pascal, Java, StandardML, Modula-3, Smalltalk
C, C++, Pascal, Java, C#, Modula-3
Standard ML, Smalltalk, Lisp
Javascript, PHP, Perl 5, Objective-C
- 31 -
Why Static Checking?
Efficient code» Dynamic checks slow down the program
Guarantees that all executions will be safe» Dynamic checking gives safety guarantees
only for some execution of the program But is conservative for sound systems
» Needs to be expressive: reject few type-safe programs
- 32 -
Type Systems
What are types?» They describe the values computed during the
execution of the program» Essentially they are a predicate on values
E.g., “int x” in C means –2^31 <= x < 2^31
Type expressions: Describe the possible types in the program» E.g., int, char*, array[], object, etc.
Type system: Defines types for language constructs» E.g., expressions, statements
- 33 -
Type Expressions
Language type systems have basic types (aka: primitive types or ground types)» E.g., int, char*, double
Build type expressions using basic types:» Type constructors
Array types Structure/object types Pointer types
» Type aliases» Function types
Semantic Analysis III Static Semantics
- 35 -
Static Semantics
Can describe the types used in a program How to describe type checking Static semantics: Formal description for
the programming language Is to type checking:
» As grammar is to syntax analysis» As regular expression is to lexical analysis
Static semantics defines types for legal ASTs in the language
- 36 -
Type Judgments or Relations
Static semantics = formal notation which describes type judgments:» E : T» means “E is a well-typed expression of type T”» E is typable if there is some type T such that E : T
Type judgment examples:» 2 : int» true : bool» 2 * (3 + 4) : int» “Hello” : string
- 37 -
Type Judgments for Statements
Statements may be expressions (i.e., represent values)
Use type judgments for statements:» if (b) 2 else 3 : int
» x == 10 : bool
» b = true, y = 2 : int (result of comma operator is the value of the rightmost expression)
For statements which are not expressions: use a special unit type (void or empty type)» S : unit
» means “S is a well-typed statement with no result type”
- 38 -
Class Problem
f1 [ 3 ]
i = i1 [ i2]
while (i < 10) do S1
(i ? 0) 4.0 : 1.0
Whats the type of the following statements?
Assume i* are int variables, f* are float variables
- 39 -
Deriving a Judgment
Consider the judgment» if (b) 2 else 3 : int
What do we need to decide that this is a well-typed expression of type int?» b must be a bool (b : bool)» 2 must be an int (2 : int)» 3 must be an int (3 : int)
- 40 -
Type Judgements
Type judgment notation: A E : T » Means “In the context A, the expression E is a
well-typed expression with type T” Type context is a set of type bindings: id :
T» (i.e. type context = symbol table)
» b: bool, x: int b: bool» b: bool, x: int if (b) 2 else x : int» 2 + 2 : int
- 41 -
Deriving a Judgment
To show» b: bool, x: int if (b) 2 else x : int
Need to show» b: bool, x: int b : bool» b: bool, x: int 2 : int» b: bool, x: int x : int
- 42 -
General Rule
For any environment A, expression E, statements S1 and S2, the judgement:» A if (E) S1 else S2 : T
Is true if:» A E : bool» A S1 : T» A S2 : T
- 43 -
Inference Rules
A E : bool A S1 : T A S2 : T
A if (E) S1 else S2 : T
if-rule premises
conclusion
• Holds for any choice of E, S1, S2, T
• Read as, “if we have established the statements in the premiseslisted above the line, then we may derive the conclusion belowthe line”
- 44 -
Why Inference Rules?
Inference rules: compact, precise language for specifying static semantics
Inference rules correspond directly to recursive AST traversal that implements them
Type checking is the attempt to prove type judgments A E : T true by walking backward through the rules
- 45 -
Meaning of Inference Rule
Inference rule says:» Given the premises are true (with some
substitutions for A, E1, E2)» Then, the conclusion is true (with consistent
substitution)
A E1 : intA E2 : int
(+)A E1 + E2 : int
E1 E2
E1 E2
:int :int
+ :int
- 46 -
Proof Tree
Expression is well-typed if there exists a type derivation for a type judgment
Type derivation is a proof tree Example: if A1 b : bool, x : int, then:
A1 b : bool
A1 !b : bool
A1 2 : int
A1 3 : int
A1 2 + 3 : int
A1 x : int
b : bool, x : int if (!b) 2 + 3 else x : int
- 47 -
More About Inference Rules
No premises = axiom
A goal judgment may be proved in more than one way
No need to search for rules to apply – they correspond to nodes in the AST
A true : bool
A E1 : float
A E2 : float
A E1 + E2 : float
A E1 : float
A E2 : int
A E1 + E2 : float
- 48 -
Class ProblemGiven the following syntax forarithmetic expressions:
t → true|false|if t then t else t|0|succ t|pred t|iszero t
And the following typing rulesfor the language:
true : boolfalse : boolt1: bool t2: T t3 : Tif t1 then t2 else t3 : Tt1 : intsucc t1 : intt1 : intpred t1 : intt1 : intiszero t1 : bool
Construct a type derivations to show(1) if iszero 0 then 0 else pred 0 : int(2) pred(succ(succ(pred(0)))) : int
- 49 -
Assignment Statements
id : T AA E : T
A id = E : T
(variable-assign)
A E3 : TA E2 : intA E1 : array[T]
A E1[E2] = E3 : T
(array-assign)
- 50 -
If Statements
A E : boolA S1 : T A S2 : T
A if (E) S1 else S2 : T
(if-then-else)
• If statement as an expression: its value is the value of the clausethat is executed
A E : boolA S : T
A if (E) S : unit
(if-then)
• If with no else clause, no value, why??
- 51 -
Class Problem1. Show the inference rule for a while statement, while (E) S
2. Show the inference rule for a variable declarationwith initializer, Type id = E
3. Show the inference rule for a question mark/colon operator,E1 ? S1 : S2
- 52 -
Sequence Statements
Rule: A sequence of statements is well-typed if the first statement is well-typed, and the remaining are well-typed as well:
A S1 : T1A (S2; .... ; Sn) : Tn
A (S1; S2; .... ; Sn) : Tn
(sequence)
- 53 -
Declarations
A id : T [ = E ] : T1A, id : T (S2; .... ; Sn) : Tn
A (id : T [ = E ]; S2; .... ; Sn) : Tn
(declaration)
= unit if no E
Declarations add entries to the environment(e.g., the symbol table)
- 54 -
Function Calls
If expression E is a function value, it has a type T1 x T2 x ... x Tn Tr
Ti are argument types; Tr is the return type How to type-check a function call?
» E(E1, ..., En)
A E : T1 x T2 x ... Tn TrA Ei : Ti (i 1 ... n)
A E(E1, ..., En) : Tr
(function-call)
- 55 -
Function Declarations
Consider a function declaration of the form:» Tr fun (T1 a1, ... , Tn an) = E» Equivalent to:
Tr fun (T1 a1, ..., Tn an) {return E;}
Type of function body S must match declared return type of function, i.e., E : Tr
But, in what type context?
- 56 -
Add Arguments to Environment
Let A be the context surrounding the function declaration. » The function declaration:
Tr fun (T1 a1, ... , Tn an) = E
» Is well-formed if A, a1 : T1 , ... , an : Tn E : Tr
What about recursion?» Need: fun: T1 x T2 x ... x Tn Tr A
- 57 -
Class Problem
Recursive function – factorial
int fact(int x) = if (x == 0) 1 else x * fact(x-1);
Is this well-formed?
- 58 -
Mutual Recursion
Example» int f(int x) = g(x) + 1;» int g(int x) = f(x) – 1;
Need environment containing at least f: int int, g: int int when checking both f and g
Two-pass approach:» Scan top level of AST picking up all function signatures
and creating an environment binding all global identifiers» Type-check each function individually using this global
environment
- 59 -
Static Semantics Summary
Static semantics = formal specification of type-checking rules
Concise form of static semantics: typing rules expressed as inference rules
Expression and statements are well-formed (or well-typed) if a typing derivation (proof tree) can be constructed using the inference rules
- 60 -
Review of Semantic Analysis
Check errors not detected by lexical or syntax analysis
Scope errors» Variables not defined» Multiple declarations
Type errors» Assignment of values of different types» Invocation of functions with different number
of parameters or parameters of incorrect type» Incorrect use of return statements
- 61 -
Other Forms of Semantic Analysis
One more category that we have not discussed
Control flow errors» Must verify that a break or continue statements
are always encosed by a while (or for) stmt» Java: must verify that a break X statement is
enclosed by a for loop with label X» Goto labels exist in the proper function» Can easily check control-flow errors by
recursively traversing the AST
- 62 -
Where We Are...
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Gen
Source code(character stream)
token stream
abstract syntaxtree
abstract syntaxtree + symboltables, types
Intermediate code
regular expressions
grammars
static semantics