cos 301 chapter 3 topics programming...

16
Sebesta Chapter 3.1 – 3.4 Syntax and Semantics COS 301 Programming Languages Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Dynamic Semantics Introduction Syntax: the form or structure of the expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units Syntax and semantics provide a language’s definition A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth Describing Syntax Descriptions of syntax are intended to communicate facts about a language to an audience. Who? Programmers want to find out what legal programs look like Implementers want an exact, detailed definition Tools such parser and scanner generators need an exact, detailed definition in a particular, machine- readable form Tools often need ambiguity eliminated, while people often prefer a more readable grammar Some Terminology Any language (human or computer or otherwise) consists of a set of strings called sentences The syntactic rules of a language specify what the legal strings are members of the language This does not of course preclude a language from having an infinite number of such strings Human languages are quite complex compared to computer languages Some Terminology •A sentence is a string of characters over some alphabet Can be as small as two symbols e.g. {0,1} •A language is a set of sentences •A lexeme is the lowest level syntactic unit of a language (e.g., 1.0, *, sum, begin) Formal syntactic descriptions of a language are usually separated into lexical and syntactic rules. Lexical rules specify how numeric literals are formed, language operators, keywords, etc. •A token type is a category of lexemes (e.g., identifier)

Upload: others

Post on 25-Apr-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Sebesta Chapter 3.1 – 3.4Syntax and Semantics

COS 301

Programming Languages

Chapter 3 Topics

• Introduction• The General Problem of Describing Syntax• Formal Methods of Describing Syntax• Attribute Grammars• Dynamic Semantics

Introduction

• Syntax: the form or structure of the expressions, statements, and program units

• Semantics: the meaning of the expressions, statements, and program units

• Syntax and semantics provide a language’s definition

A language that is simple to parse for the compiler is also simple to parse for the human programmer.

N. Wirth

Describing Syntax

• Descriptions of syntax are intended to communicate facts about a language to an audience. Who?– Programmers want to find out what legal programs

look like – Implementers want an exact, detailed definition– Tools such parser and scanner generators need an

exact, detailed definition in a particular, machine-readable form

– Tools often need ambiguity eliminated, while people often prefer a more readable grammar

Some Terminology

• Any language (human or computer or otherwise) consists of a set of strings called sentences

• The syntactic rules of a language specify what the legal strings are members of the language – This does not of course preclude a language from

having an infinite number of such strings

• Human languages are quite complex compared to computer languages

Some Terminology

• A sentence is a string of characters over some alphabet– Can be as small as two symbols e.g. {0,1}

• A language is a set of sentences• A lexeme is the lowest level syntactic unit of a

language (e.g., 1.0, *, sum, begin)– Formal syntactic descriptions of a language are

usually separated into lexical and syntactic rules. – Lexical rules specify how numeric literals are formed,

language operators, keywords, etc.• A token type is a category of lexemes (e.g.,

identifier)

Page 2: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Tokens and Lexemes

• Lexemes are partitioned into groups or types such as identifiers, operators, integer literals etc.

• Often the term “token” is used in place of lexemeIndex = 2 * count + x;

Lexeme Token ValueIndex identifier "index"= assignment2 int literal 2Count identifier "count"+ addition17 int literal 17

Lexical and Syntactic Rules

• Lexical and syntactic rules are specified separately because they are specified by different types of grammars and are recognized by different types of automata

• In particular, lexical rules are equivalent to regular expressions and specified by very restricted grammars called regular grammars

Formal Definition of Languages

• Languages can be formally defined in two different ways: 1. Recognizers2. Generators

Formal Definition of Languages

• Recognizers– A recognition device reads input strings over the

alphabet of the language and decides whether the input strings belong to the language

– Example: syntax analysis part of a compiler

• Generators– A device that generates sentences of a language– One can determine if the syntax of a particular

sentence is syntactically correct by comparing it to the structure of the generator

– Example: a grammar

Recognizers and Generators

• There is a close relationship between recognizers and generators of a language

• Given a context-free grammar (a generator) we can algorithmically construct a recognizer (a parser)

• Many such systems have been constructed• The oldest (and still in wide use) is yacc

– Yet Another Compiler Compiler

The Chomsky Hierarchy

• Noam Chomsky developed the idea of formal grammars in the late 1950’s

• Four levels of grammar:1. Regular2. Context-free3. Context-sensitive4. Unrestricted (recursively enumerable)• We will use only regular and context free

grammars• Chomsky’s work has been extended into a set

of 9 levels distinguished by recognition automata

Page 3: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

BNF and Context-Free Grammars

• Context-Free Grammars– Grammars are language generators, meant to describe the syntax

of natural languages– A context-free grammar defines a class of languages called

context-free languages– CFGs are the most powerful grammars that are amenable to

computation

• Backus-Naur Form (1959)– Invented by John Backus and Peter Naur to describe Algol 60– BNF grammars are equivalent to context-free grammars– A similar notation was actually used over 2,000 years ago to

describe the structure of Sanskrit (one of the most regular of human languages

BNF Grammar: Formalism

• The grammar of a programming language is a set of {P,T,N,S} with four members

1. A set of productions: P2. A set of terminal symbols: T3. A set of nonterminal symbols: N4. start symbol: S ∈ N• A production has the form A →ω where A ∈

N and ω ∈ (N ∪ T)

Note that N and T are disjoint sets

BNF Fundamentals

• BNF is a metalanguage (a language used to describe another language)

• BNF uses abstractions to represent classes of syntactic structures

• A simple assignment statement might be represented by the symbol <assign><assign> -> <var> = <expression>

• A production or rule shows how a nonterminal can be expanded

• A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals

• Terminals cannot be expanded further

BNF Notation

• Nonterminals are often enclosed in angle brackets– Examples of BNF rules:

<ident_list> → identifier | identifier, <ident_list><if_stmt> → if <logic_exp> then <stmt>

BNF Rules or Productions

• A production is a rule for rewriting that can be applied to a string of symbols called a sentential form– The nonterminal symbols N identify grammatical categories

such as identifier, integer, expression, program– The start symbol S identifies the principal grammatical

category (usually Program).– The terminal symbolsT are the lexemes or tokens from

which programs are constructed

• An abstraction (or nonterminal symbol) can have more than one RHS

<stmt> <single_stmt> | begin <stmt_list> end

Describing Lists

• Syntactic lists are described using recursion<ident_list> ident

| ident, <ident_list>

• A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)

Page 4: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Metasymbols

• The symbol is used after the left nonterminalof a rule. Alternate (original) notation uses ::=

• Nonterminals may be written in angle brackets or with a distinctive font<statement>

• Selection (OR) is designated by the | character.• Parentheses may be used for grouping ( ... ).• Note that there are several different written

styles for BNF but all are fundamentally equivalent

Definition: A Language

• The language L defined by a BNF grammar G = {P,T,N,S} is the set of all terminal strings that can be derived from the start symbol in zero or more steps.

An Example Grammar

<program> <stmts><stmts> <stmt> | <stmt> ; <stmts><stmt> <var> = <expr><var> a | b | c | d<expr> <term> + <term> | <term> - <term><term> <var> | const

An Example Derivation

<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term>=> a = <var> + <term> => a = b + <term>=> a = b + const

Derivations

• Every string of symbols in a derivation is a sentential form

• A sentence is a sentential form that has only terminal symbols

• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded

• A derivation may be neither leftmost nor rightmost

Example

• Given G below, does the string cbab belong to L(G)? In other words, is there a way to derive cbab from the start symbol?

• G = { T, V, P, S }T = { a, b, c }V = { A, B, C, W }S = { W }

• P consist of the rules:1. W AB or <W> ::= <A><B>2. A Ca <A> ::= <C>a3. B Ba <B> ::= <B>a4. B Cb <B> ::= <C>b5. B b <B> ::= b6. C cb <C> ::= cb7. C b <C> ::= b

Page 5: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Leftmost derivation

• Begin with the start symbol W and apply production rules expanding the leftmost non-terminal.W AB Rule 1AB CaB Rule 2CaB cbaB Rule 6cbaB cbab Rule 5

Rightmost derivation

• Begin with the start symbol W and apply production rules expanding the rightmost non-terminal.W AB Rule 1AB Ab Rule 5Ab Cab Rule 2Cab cbab Rule 6

A shorter version of G

• Using selection in the RHSG = { T, V, P, S }T = { a, b, c }V = { A, B, C, W }S = { W }

1. W AB or <W> ::= <A><B>2. A Ca <A> ::= <C>a3. B Ba | Cb | b <B> ::= <B>a | <C>b | b4. C cb | b <C> ::= cb | b

Parse Tree

• A tree representation of a derivation

<program>

<stmts>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>

Parse trees

• In a parse tree:– Each internal node of the tree corresponds to a step

in the derivation.– Each child of a node represents a right-hand side of

a production.– Each leaf node represents a symbol of the derived

string, reading from left to right.

A Grammar for Assigment Statements

<assign> ::= <id> = <expr><id> ::= A | B | C<expr> ::= <id> + <expr>

| <id> * <expr>| ( <expr> )| <id>

Page 6: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Example derivation

• A = B * ( A + C )<assign> => <id> = <expr>

=> A = <expr>=> A = <id> * <expr>=> A = B * <expr>=> A = B * ( <expr> )=> A = B * ( <id> + <expr> )=> A = B * ( A + <expr> )=> A = B * ( A + <id> )=> A = B * ( A + C )

Ambiguity in Grammars

• A grammar is ambiguous when it generates a sentential form that has two or more distinct parse trees

An ambiguous grammar

• Simple assignment statements<assign> ::= <id> = <expr><id> ::= A | B | C<expr> ::= <expr> + <expr>

| <expr> * <expr>| ( <expr> )| <id>

Ambiguity

A small difference

• Ambiguous<assign> ::= <id> = <expr><id> ::= A | B | C<expr> ::= <expr> + <expr>

| <expr> * <expr>| ( <expr> )| <id>

• Not ambiguous<assign> ::= <id> = <expr><id> ::= A | B | C<expr> ::= <id> + <expr>

| <id> * <expr>| ( <expr> )| <id>

What causes ambiguity?

• In the example above the unambiguous grammar allows the expression to grow only on the right

• Ambiguity is actually undecidable but there are some useful indicators such as the presence of more than one leftmost or rightmost derivation

• Parsers can use extra-grammatical information to correct ambiguity

Page 7: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

An Unambiguous Expression Grammar

• If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity

<expr> <expr> - <term> | <term><term> <term> / const| const

<expr>

<expr> <term>

<term> <term>

const const

const/

-

Precedence of Operators

• Operator a has higher precedence than operator b if operator a should be evaluated before operator b in all parenthesis-free expressions involving only the two operators– Ex: 5 * 4 + 3 = 23 5 + 4 * 3 = 17

• “Evaluated before” means lower in the parse tree

No Precedence (right to left evaluation)

• In this grammar any parse tree with multiple operators has the rightmost operator lowest in the tree<assign> ::= <id> = <expr><id> ::= A | B | C<expr> ::= <id> + <expr>

| <id> * <expr>| ( <expr> )| <id>

• In A + B * C multiplication will be first• In A * B + C addition will be first

Precedence of C++ Operators -1Precedence Operator Description Example Associativity 1 :: Scoping operator Class::age = 2; none

2

() [] -> . ++ --

Grouping operator Array access Member access from a pointer Member access from an object Post-increment Post-decrement

(a + b) / 4; array[4] = 2; ptr->age = 34; obj.age = 34; for( i = 0; i < 10; i++ ) ... for( i = 10; i > 0; i-- ) ...

left to right

3

! ~ ++ -- - + * & (type) sizeof

Logical negation Bitwise complement Pre-increment Pre-decrement Unary minus Unary plus Dereference Address of Cast to a given type Return size in bytes

if( !done ) ... flags = ~flags; for( i = 0; i < 10; ++i ) ... for( i = 10; i > 0; --i ) ... int i = -1; int i = +1; data = *ptr; address = &obj; int i = (int) floatNum; int size = sizeof(floatNum);

right to left

4 ->* .*

Member pointer selector Member object selector

ptr->*var = 24; obj.*var = 24; left to right

5 * / %

Multiplication Division Modulus

int i = 2 * 4; float f = 10 / 3; int rem = 4 % 3;

left to right

Precedence of C++ Operators -2

6 + -

Addition Subtraction

int i = 2 + 3; int i = 5 - 1; left to right

7 << >>

Bitwise shift left Bitwise shift right

int flags = 33 << 1; int flags = 33 >> 1; left to right

8

< <= > >=

Comparison less-than Comparison less-than-or-equal-to Comparison greater-than Comparison geater-than-or-equal-to

if( i < 42 ) ... if( i <= 42 ) ... if( i > 42 ) ... if( i >= 42 ) ...

left to right

9 == !=

Comparison equal-to Comparison not-equal-to

if( i == 42 ) ... if( i != 42 ) ... left to right

10 & Bitwise AND flags = flags & 42; left to right 11 ^ Bitwise exclusive OR flags = flags ̂ 42; left to right 12 | Bitwise inclusive (normal) OR flags = flags | 42; left to right

Precedence of C++ Operators -3

13 && Logical AND if( conditionA && conditionB ) ...

left to right

14 || Logical OR if( conditionA || conditionB ) ... left to right

15 ? : Ternary conditional (if-then-else) int i = (a > b) ? a : b; right to left

16

= += -= *= /= %= &= ^= |= <<= >>=

Assignment operator Increment and assign Decrement and assign Multiply and assign Divide and assign Modulo and assign Bitwise AND and assign Bitwise exclusive OR and assign Bitwise inclusive (normal) OR and assign Bitwise shift left and assign Bitwise shift right and assign

int a = b; a += 3; b -= 4; a *= 5; a /= 2; a %= 3; flags &= new_flags; flags ^= new_flags; flags |= new_flags; flags <<= 2; flags >>= 2;

right to left

17 , Sequential evaluation operator for( i = 0, j = 0; i < 10; i++, j++ ) ...

left to right

Page 8: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Associativity

• Associativity specifies whether operators of equal precedence should be evaluated left-to-right or right-to-left– Ex: (left) 5 - 4 - 3 = 1 - 3 = -2– (right) 2 ** 3 ** 3 = 2 ** 9 = 512

Associativity

• A grammar can be used to define both associativity and precedence among the operators in an expression. Consider the conventonal rules:+ and - are left-associative operators*, %, and / are left associative but have higher precedence than +

and –Exponentiation (^) is right associative and has the highest precedence

• Consider this grammar G:Expr -> Expr + Term | Expr – Term | TermTerm -> Term * Factor | Term / Factor

| Term % Factor | FactorFactor -> Primary ** Factor | PrimaryPrimary -> 0 | ... | 9 | ( Expr )

Parse tree for 4**2**3+5*6+7 Determining precedence and associativity

• Precedence is determined by the length of the shortest derivation from start symbol to operator– Shorter derivations have lower precedence

• Associativity is determined by use of left or right recursion– Left

Expr Expr + Term | Expr - Term | Term

– RightFactor Primary ^ Factor | Primary

Some design choices

• C++ has 17 distinct levels of precedence, Java has 16, C has 15 – In all three languages some operators associate to

the left and others to the righta = b < c ? * p + b * c : 1 << d ()

– Perhaps too many?

• Pascal has only 5a <= 0 or 100 <= a Error!– Perhaps too few?

Some design choices - 2

• Smalltalk has no precedence and all operators are left-associative– (minor aside: Smalltalk actually does not have operators at

all, but a number of binary message characters are defined in the grammar. The meaning of + is determined by the classes that implement it)

• APL has no precedence and all operators are right-associative

Page 9: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Where syntax meets semantics

• Parse trees are where syntax meets semantics• We want the structure of the parse tree to

correspond to the semantics of the string it generates

The Dangling Else ProblemIfStatement -> if ( Expression ) Statement

| if ( Expression ) Statement else Statement

Statement -> Assignment | IfStatement| Block

Block -> { Statements }

Statements -> Statements Statement | Statement

Example

• With which ‘if’ does the following ‘else’associate

if (x < 0)if (y < 0) y = y - 1;else y = 0;

• Answer: either one!

Parse Trees

Solving the dangling else problem

1. Algol 60, C, C++, Pascal: associate each elsewith closest if; use { } or begin…end to override.

2. Algol 68, Modula, Ada, Visual Basic: use explicit delimiter to end every conditional (e.g., if…fi)

3. Java: rewrite the grammar to limit what can appear in a conditional:

IfThenStatement -> if ( Expression ) StatementIfThenElseStatement -> if ( Expression ) StatementNoShortIf

else Statement

The category StatementNoShortIf includes all except IfThenStatement.

Audiences

• Grammars are means of communicating information to an audience– Programmers want to find out what legal programs

look like – Implementers want an exact, detailed definition– Tools such parser and scanner generators need an

exact, detailed definition in a particular, machine-readable form

– Tools often need ambiguity eliminated, while people often prefer a more readable grammar

• Grammars therefore can vary with the audience

Page 10: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Levels of Precedence and Complexity

• C, C++, and Java have a large number of operators and precedence levels

• For each precedence level we need to introduce a new non-terminal

• Grammar can get large and difficult to read• Instead of using a large grammar, we can:

– Write a smaller ambiguous grammar, and– Specify precedence and associativity separately

Extended BNF

• BNF was developed in the late 1950’s; still very widely used

• However the original BNF has a few minor inconveniences such as recursion instead of iteration and verbose selection syntax– Note that for some applications such as recursive

descent parsing, left recursion is forbidden

• Extended BNF (EBNF) increases readability and writability– Expressive power is unchanged: still CFGs

• Several variations exist

Extended BNF: Optional parts

• Optional parts are placed in brackets [ ]<proc_call> → ident ([<expr_list>])

• Replaces<proc_call> → ident()

| → ident (<expr_list>)

Extended BNF: Alternative RHS

• Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term> (+|-) const

• Replaces <term> → <term> + const

| <term> - const

Extended BNF: Recursion

• Repetitions (0 or more) are placed inside braces { }<ident> → letter {letter|digit}

• Replaces<ident> → letter

| <ident> letter| <ident> digit

BNF and EBNF

• BNF<expr> <expr> + <term>

| <expr> - <term>| <term>

<term> <term> * <factor>| <term> / <factor>| <factor>

• EBNF<expr> <term> {(+ | -) <term>}<term> <factor> {(* | /) <factor>}

Page 11: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

EBNF and Associativity

• Note that the production:Expr -> Term { ( + | - ) Term }

does not seem to specify the left associativitythat we have in Expr -> Expr + Term | Expr + Term | Term

• In EBNF left recursion is usually assumed. – Explicit recursion is used for right associative

operators– Some EBNF grammars may specify associativity

outside of the grammar

Recent Variations in EBNF

• Alternative RHSs are put on separate lines• Use of a colon instead of =>• Use of opt for optional parts• Use of oneof for choices

EBNF to BNF• We can always rewrite an EBNF grammar as a BNF

grammar. E.g.,A -> x { y } z

• can be rewritten:A -> x A' zA' -> | y A'

• Note is the standard symbol used in grammars for the “empty string”

• Rewriting EBNF rules with ( ), [ ] can be done in a similar fashion

• While EBNF is no more powerful than BNF, its rules are often simpler and clearer for human readers

Syntax Diagrams

• Similar to EBNF• Introduced by Jensen and Wirth with Pascal in

1975

Ex: Expressions with addition A More Complex Example

Page 12: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

An Expression Grammar

From http://en.wikipedia.org/wiki/Syntax_diagram

Static Semantics

• Context-free grammars (CFGs) cannot describe all of the syntax of programming languages

• “Static semantics” has only an indirect relationship with meaning– Static semantic rules deal with the legal form of

programs (syntax)– Most rules deal with typing systems– Called “static” because analysis is done at compile

time

• Dynamic semantics describe the meaning (runtime behavior) of a program

Static Semantics

• Typical items for static semantic analysis:– Type of RHS of an expression must match the type

of the LHS (lvalue) – All variables must be declared before being

referenced

• First restriction can be expressed in BNF but only in a very cumbersome way

• The second restriction cannot be expressed in BNF– Consider the common-sense meaning of “context

free” and “context sensitive”

Attribute Grammars

• Attribute Grammars (AGs) were developed by Donald Knuth 1968

• AGs are additions to CFGs to carry some semantic information on parse tree nodes

• CFG plus – Attributes

• Associated with terminals and non-terminals, similar to variables in that values can be assigned

– attribute computation functions• Aka semantic functions, associated with grammatical

rules, specify how attribute values are computed– predicate functions

• State semantic rules; associated with grammatical rules

Attribute Grammars : Definition

• Def: An attribute grammar is a context-free grammar G = {P,T,N,S} with the following additions:– For each grammar symbol x in N,T there is a set

A(x) of attribute values• A(x) consists of two disjoint sets S(x) and I(x) called

synthesized attrbutes and inherited attributes

– Each rule has a set of functions that define certain attributes of the nonterminals in the rule

– Each rule has a (possibly empty) set of predicates to check for attribute consistency

Synthesized Attributes

• Synthesized attributes are used to pass semantic information UP in parse tree– Synthesized = computed– For a grammar rule of the form X0-> X1 . . . Xn the

synthesized attributes of X0 are computed as a function f(A(X1) . . . A(X1n) )

– The value of a synthesized attribute therefore depends only on the value of the attributes of that nodes children

Page 13: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Inherited Attributes

• Inherited attributes are used to pass semantic information DOWN the parse tree– Child nodes inherit from the parent– Synthesized = computed– For a grammar rule of the form X0-> X1 . . . Xn the

inherited attributes of Xj are computed as a function f(A(X0) . . . A(Xj-1) )

– The value of an inherited attribute therefore depends only on the value of the attributes of the parent and (usually) the left siblings

Predicate Functions

• A predicate is a Boolean expression on the union of the attribute set {A(X0) . . . A(Xn) } and a set of literal values

• The only derivations allowed in an attribute grammar are those in which every predicate associated with a nonterminal is true.

• A false predicate indicates a rule violation

Attributed (decorated) parse trees

• The parse tree has a possibly empty set of attributes attached to each node.

• When all attributes have been computed the tree is fully attributed or decorated

• Conceptually you think of the parse as producing a parse tree, then attribute values are computed in a second pass

Intrinsic Attributes

• Are synthesized attributes whose values are determined outside the parse tree– Example: type of a variable instance is taken from

the symbol table. – Contents of the symbol table are determined by

declaration statements– Initially given an unattributed parse tree, the only

values with attributes are the intrinsic attributes of the leaf nodes

– Given the intrinsic values, the semantic functions can compute the remaining attribute values

Attribute Grammars: Definition

• Let X0 X1 ... Xn be a rule• Functions of the form S(X0) = f(A(X1), ... ,

A(Xn)) define synthesized attributes• Functions of the form I(Xj) = f(A(X0), ... ,

A(Xn)), for i <= j <= n, define inherited attributes

• Initially, there are intrinsic attributes on the leaves

Attribute Grammars: An Example

• Consider the Ada rule that name on endof a procedure statement must match the procedure name

• Syntax rule:<proc_def> -> procedure <proc_name>[1]<proc_body> end <proc_name>[2]

• Predicate:<proc_name>[1].string == <proc_name>[2].string

Page 14: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Attribute Grammar: Example 2 Example 2

• Actual type– A synthesized attribute associated with <expr> and

<var>.– Intrinsic for <var>– Synthesized from children of <expr>

• Expected type– Inherited, associated with non-terminal <expr>– Stores type expected for expression as determined

by the type of the variable on the left hand side of the assignment

Parse tree of A = A + B Computing Attribute values

• How are attribute values computed?– If all attributes were inherited, the tree could be

decorated in top-down order.– If all attributes were synthesized, the tree could be

decorated in bottom-up order.– In many cases, both kinds of attributes are used,

and it is some combination of top-down and bottom-up that must be used.

• The general case is a complex problem that requires the construction of a dependency graph to determine evaluation order

Decorating the tree

1. <var>.actual_type <- lookup(A) [Rule 4]

2. <expr>.exp_type <- <var>.actual_type [Rule 1]

3. <var>[2].actual_type <- lookup(A) [Rule 4]4. <var>[3].actual_type <- lookup(B) [Rule 4]

5. <expr>.actual_type <- (int | real)[Rule 2]

6. <expr>.exp_type == <expr>.actual_type is either TRUE or FALSE (Rule 2)

Decorating

Page 15: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

The decorated tree Semantics

• There is no single widely acceptable notation or formalism for describing semantics

• Several needs for a methodology and notation for semantics:– Programmers need to know what statements mean– Compiler writers must know exactly what language constructs

do– Correctness proofs would be possible– Compiler generators would be possible– Designers could detect ambiguities and inconsistencies

Operational Semantics

• Operational Semantics– Describe the meaning of a program by executing its

statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement

• To use operational semantics for a high-level language, a virtual machine is needed

Operational Semantics

• Whatever happens when the program is compiled by compiler C and run on machine M

• A few problems with this– Architectural dependency reduces usefulness for

people working with a different architecture– Requires a precise semantic definition of machine M– Not useful as a basis for standardization

Operational Semantics (continued)

• Uses of operational semantics:- Language manuals and textbooks- Teaching programming languages

• Two different levels of uses of operational semantics:- Natural operational semantics- Structural operational semantics

• Evaluation- Good if used informally (language manuals, etc.)- Extremely complex if used formally (e.g.,VDL)

Denotational Semantics

• Based on recursive function theory• The most abstract semantics description

method• Originally developed by Scott and Strachey

(1970)

• Each type of statement in the abstract syntax is defined as a state-transforming function

• A program is a collection of functions operating on the program state

Page 16: COS 301 Chapter 3 Topics Programming Languagesaturing.umcs.maine.edu/~Meadow/Courses/Cos301/Cos301-3.pdfProgramming Languages Chapter 3 Topics • Introduction ... The Chomsky Hierarchy

Denotational Semantics - continued

• The process of building a denotationalspecification for a language:- Define a mathematical object for each language

entity– Define a function that maps instances of the

language entities onto instances of the corresponding mathematical objects

• The meaning of language constructs are defined by only the values of the program's variables

Evaluation of Denotational Semantics

• Can be used to prove the correctness of programs

• Provides a rigorous way to think about programs

• Can be an aid to language design• Has been used in compiler generation systems • Because of its complexity, it are of little use to

language users

Axiomatic Semantics

• Based on formal logic (predicate calculus)• Original purpose: formal program verification• Axioms or inference rules are defined for each

statement type in the language (to allow transformations of logic expressions into more formal logic expressions)

• The logic expressions are called assertions

Axiomatic Semantics

• Given a formal specification of a program P it should be possible to mechanically prove that P is correct by deriving its semantics from the axioms

• Nice in theory, but very difficult and tedious in practice

• Some recent tools support axiomatic semantics:– Java Modeling Language (JML)– Haskell– Spark/Ada

Evaluation of Axiomatic Semantics

• Developing axioms or inference rules for all of the statements in a language is difficult

• It is a good tool for correctness proofs, and an excellent framework for reasoning about programs, but it is not as useful for language users and compiler writers

• Its usefulness in describing the meaning of a programming language is limited for language users or compiler writers