annotation tree grammar not for parsing attributes

1
======================================== Decorating a Syntax Tree The calculator language we’ve been using for examples doesn’t have sufficiently interesting semantics. Consider an extended version with types and declarations: program stmt_list $$ stmt_list decl stmt_list | stmt stmt_list | ε decl int id | real id stmt id := expr | read id | write expr expr term term_tail term_tail add_op term term_tail | ε term factor factor_tail factor_tail mul_op factor factor_tail | ε factor ( expr ) | id | int_const | real_const | float ( expr ) | trunc ( expr ) add_op + | – mul_op * | / Now we can - require declaration before use - require type match on arithmetic ops We could do some of this checking while building the AST. We could even do it while building an explicit parse tree. The more common strategy is to implement checks once the AST is built easier -- tree has nicer structure more flexible -- can accommodate non depth-first left-to-right traversals - mutually recursive definitions e.g., methods of a class in most languages - type inference based on use - switch statement label checking etc. Assume the parser builds the AST and tags every node with a source location Tagging of tree nodes is annotation inside the compiler, tree nodes are structs annotations and pointers to children are fields (annotation can also be done to an explicit parse tree; we’ll stick to ASTs) But first: what do we want the AST to look like? One appealing way to specify it is a tree grammar. Each "production" of tree grammar has parent on LHS and children on RHS. This is not for parsing; it's to describe the trees that - we want the parser to build - we need to annotate Example for the extended calculator language: program → item int_decl : item → id item // item is next decl or stmt real_decl : item → id item assign : item → id expr item read : item → id item write : item → expr item null : item → ε +’ : expr → expr expr -’ : expr → expr expr *’ : expr → expr expr /’ : expr → expr expr float : expr → expr trunc : expr → expr id : expr → ε // no children int_const : expr → ε real_const : expr → ε The A:B syntax on the left means that A is one kind of a B, and may appear wherever a B is expected on a RHS. Note that "program → item" does not mean that a program "is" an item (the way it does in a CFG), but merely that a program node in a syntax tree has one child, which is an item. Here's a syntax tree for a tiny program. Structure is given by the tree grammar. Construction would be via execution of appropriate action routines embedded in a CFG. Remember: tree grammars are not CFGs. Language for a CFG is the set of possible fringes of parse trees. Language for a tree grammar is the set of possible whole trees. No comparable notion of parsing: structure of tree is self-evident. Our tree grammar helps guide us as we write (by hand) the action routines to build the AST. It can also help guide us in writing recursive tree-walking routines to perform semantic checks and (later) generate mid-level intermediate code (next lecture). - Helpful to augment the tree grammar with semantic rules that describe relationships among annotations of parent and children. - Semantic rules are like action routines, but without explicit specification of what is executed when. A CFG or tree grammar with semantic rules is an attribute grammar (AG) Not used much in production compilers, but useful for prototyping (e.g., the first validated Ada implementation [Dewar et al., 1980]) and in some cool language-based tools - syntax-directed editing [Reps, 1984] - parallel CSS [Meyerovich et al., 2013] The book goes into a bit of AG theory, talking about synthesized attributes (depend only on information below the current node in the tree) inherited attributes (depend at least in part on info from above or to the side) Remember that an AG doesn't actually specify the order in which rules should be evaluated. There exist tools to figure that out, and a rich theory of classes of grammars with varying attribute flow (non-circular, circular but converging, ...) When basing an AG on a CFG, it's desirable to have attribute flow that’s consistent with the order in which the parser builds the tree bottom-up parsers need S-attributed grammars -- all attributes are synthesized top-top parsers can use L-attributed grammars, which are a superset -- attributes are synthesized or depend on stuff to the left See the text for more info. Our CFG w/ action routines to build the AST could be written as an AG by making each action routine a semantic rule and then listing the rules for each production w/out actually embedding them in the RHS. For something as simple as AST construction, not having to specify what is done when isn’t much of a savings – a tool to find an evaluation order consistent w/ attribute flow wouldn’t be useful (it was useful in the tools mentioned above). In practice, people do hand-written tree walk on ASTs. Book gives extended example for declaration and type checking in extended calculator grammar. Written as a pure AG, with following attributes: program errors - list of all static semantic errors (type clash, undefined/redefined names) item, expr symtab - list with types of all names declared to left item errors_in - list of all static semantic errors to left errors_out - list of all static semantic errors through here expr type errors - list of all static semantic errors inside everything location More common to make symbol table and error lists global variables insert errors, as found, into a list or tree, sorted by source location for symtab, label each construct with list of active scopes look up <name, scope> pairs, starting with closest scope for calculator language, which has no scopes, can enforce declare-before-use in a simple left-to-right traversal of the tree - complain at any re-definition - or any use w/out prior definition To avoid cascading errors, it's common to have an "error" value for an attribute that means "I already complained about this." So, for example, in int a real b int c a := b + c We label the '+' tree node with type "error" so we don't generate a second message for the ":=" node. A few example rules (with error list and symtab as globals): int_decl : item1 id item2 // item2 is rest of program if <id.name, ?> symtab errors.insert("redefinition of" id.name, item1.location) else symtab.insert(<id.name, int>) id : expr ε if <id.name, A> symtab expr.type := A else errors.insert(id.name "undefined", id.location) expr.type := error ‘+’ : expr1 expr2 expr3 if expr2.type = error or expr3.type = error expr1.type := error else if expr2.type <> expr3.type expr 1 .type := error errors.insert("type clash", expr1.location) else expr1.type := expr2.type The right-pointing triangle here is meant to introduce a semantic rule. (This is not standard notation, but it matches what’s in the text.) In these particular cases there is only one rule per “production,” but in a more complicated grammar there could be many. Formal AG notation would require no side effects (no globals) and would specify each semantic rules as Si.ax := f(Si.ax, ..., Sk.ay) – e.g., expr.type := if <id.name, A> symtab then A else error expr.errors := if <id.name, A> symtab then null else [id.name “undefined at” id.location] We can see how these rules would be enforced while walking the syntax tree: In a more complicated language, we might make multiple passes over the tree – perhaps one to fill in the symbol table; a second to check types, check for undeclared names, match parameter lists to declarations, etc.; and a third to generate mid-level IF. program int_decl read real_decl read write a a b b null 2.0 b a float int a read a real b read b write (float (a) + b) / 2.0 + ÷ program int_decl read real_decl read write a a b b null 2.0 b a float + ÷

Upload: others

Post on 31-Dec-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: annotation tree grammar not for parsing attributes

Semantic Analyais and Attribute Evaluation16 and 21 Sept. 2020

========================================Static Analysis

Recall that static semantics are enforced at compile time, and dynamicsemantics are enforced at run time. Some things have to be dynamicsemantics because of late binding (discussed in Chap. 3): we lack thenecessary info (e.g. input values) at compile time, or inferring what wewant is uncomputable.

A smart compiler may avoid run-time checks when it is able to verifycompliance at compile time. This makes programs run faster.

array boundsvariant record tagsdangling references

Similarly, a conservative code improver will apply optimizations onlywhen it knows they are safe

alias analysiscaching in registerscomputation out of order or in parallel

escape analysislimited extentnon-synchronized

subtype analysisstatic dispatch of virtual methods

An optimistic compiler maygenerate multiple versions with a dynamic check to dispatchalways use the "optimized" version if it's speculative --

always safe and usually fastprefetchingtrace scheduling

always start with the "optimized" version but check along the way tomake sure it's safe, and be prepared to roll back

transactional memory

Alternatively, language designer may tighten rulestype checking in ML v. Lisp (cons: 'a * 'a list -> 'a list)definite assignment in Java/C# v. C

----------------------------------------

As noted in Chap. 1, job of semantic analyzer is to(1) enforce rules(2) connect the syntax of the program (as discovered by the parser) to

something else that has semantics (meaning) – e.g.,value for constant expressionscode for subroutines

This work can be interleaved with parsing in a variety of ways.- At one extreme: build an explicit parse tree, then call the semantic

analyzer as a separate pass.- At the other extreme, perform all static dynamic checks and generate

intermediate form while parsing, using action routines calledfrom the parser.

- The most common approach today is intermediate: use action routinesto build an AST, then perform semantic analysis on each top-levelAST fragment (class, function) as it is completed.

We'll focus on this intermediate approach. But first, it's instructiveto see how we could build an explicit parse tree if we wanted.This will help motivate the code to build an AST.

recursive descenteach routine returns its subtree

table-driven top-downpush markers at end-of-productioneach, when popped, pulls k subtrees off separate attribute stack

and pushes new subtree, where k is length of RHS

1: E → T TT2: TT → ao T TT3: T → F FT4: FT → mo F FT5: F → ( E )6: F → id7: F → lit

(A + 1) * B

So how do we build a syntax tree instead?Start with RD

requires passing some stuff into RD routines

AST_node expr():case input_token of

id, literal, ( :T := term()return term_tail(T)

else error

AST_node term_tail(T1):case input_token of

+, - :O := add_op()T2 := term()N := new node(O, T1, T2)return term_tail(N)

), id, read, write, $$ :return T1 // epsilon

else error

It's standard practice to express the extra code as action routines inthe CFG:

E → T { TT.st := T.n } TT { E.n := TT.n }

TT1 → ao T { TT2.st := make_bin_op(ao.op, TT1.st, T.n) } TT2 { TT1.n := TT2.n }

TT → ε { TT.n := TT.st }

T → F { FT.st := F.n } FT { T.n := FT.n }

FT1 → mo F { FT2.st := make_bin_op(mo.op, FT1.st, F.n) } FT2 { FT1.n := FT2.n }

FT → ε { FT.n := FT.st }

F → ( E ) { F.n := E.n }

F → id { F.n := id.n } // id.n comes from scanner

F → lit { F.n := lit.n } // as does lit.n

Here the subscripts distinguish among instances of the same symbol in agiven production.The .n and and .st suffixes are attributes (fields) of symbols.I’ve elided the ao and mo productions.

See how this handles, for example, (A + 1) * B :

A parser generator like ANTLR can turn the grammar w/ action routinesinto an RD parser that builds a syntax tree.

It's also straightforward to turn that grammar into a table-driven TD parser.Give each action routine a numberPush these into the stack along with other RHS symbolsExecute them as they are encountered. That is:

- match terminals- expand nonterminals by predicting productions- execute action routines

e.g., by calling a do_action(#) routine with a big switchstatement inside

requires space management for attributes; companion site (Sec. 4.5.2) explainshow to maintain that space automatically

extension of the attribute stack we used to build a parse tree abovespace for all symbols of all productions on path from rootto current top-of-parse-stack symbol

- when predict, push space for all symbols of RHS- maintain lhs and rsh indices into the stack- at end of production, pop space used by RHS; update lhs and rsh indices

========================================Decorating a Syntax Tree

The calculator language we’ve been using for examples doesn’t havesufficiently interesting semantics.Consider an extended version with types and declarations:

program → stmt_list $$stmt_list → decl stmt_list | stmt stmt_list | εdecl → int id | real idstmt → id := expr | read id | write exprexpr → term term_tailterm_tail → add_op term term_tail | εterm → factor factor_tailfactor_tail → mul_op factor factor_tail | εfactor → ( expr ) | id | int_const | real_const

| float ( expr ) | trunc ( expr )add_op → + | –mul_op → * | /

Now we can- require declaration before use- require type match on arithmetic ops

We could do some of this checking while building the AST.We could even do it while building an explicit parse tree.

The more common strategy is to implement checks once the AST is builteasier -- tree has nicer structuremore flexible -- can accommodate non depth-first left-to-right traversals

- mutually recursive definitionse.g., methods of a class in most languages

- type inference based on use- switch statement label checking

etc.

Assume the parser builds the AST and tags every node with a source location

Tagging of tree nodes is annotationinside the compiler, tree nodes are structs

annotations and pointers to children are fields(annotation can also be done to an explicit parse tree; we’ll stick to ASTs)

But first: what do we want the AST to look like?One appealing way to specify it is a tree grammar.Each "production" of tree grammar has parent on LHS and children on RHS.This is not for parsing; it's to describe the trees that

- we want the parser to build- we need to annotate

Example for the extended calculator language:

program → itemint_decl : item → id item // item is next decl or stmtreal_decl : item → id itemassign : item → id expr itemread : item → id itemwrite : item → expr itemnull : item → ε‘+’ : expr → expr expr‘-’ : expr → expr expr‘*’ : expr → expr expr‘/’ : expr → expr exprfloat : expr → exprtrunc : expr → exprid : expr → ε // no childrenint_const : expr → εreal_const : expr → ε

The A:B syntax on the left means that A is one kind of a B, and may appearwherever a B is expected on a RHS.

Note that "program → item" does not mean that a program "is" an item(the way it does in a CFG), but merely that a program node in a syntax treehas one child, which is an item.

Here's a syntax tree for a tiny program.Structure is given by the tree grammar. Construction would be via execution ofappropriate action routines embedded in a CFG.

Remember: tree grammars are not CFGs.Language for a CFG is the set of possible fringes of parse trees.Language for a tree grammar is the set of possible whole trees.No comparable notion of parsing: structure of tree is self-evident.

Our tree grammar helps guide us as we write (by hand) the actionroutines to build the AST.

It can also help guide us in writing recursive tree-walking routines to performsemantic checks and (later) generate mid-level intermediate code (nextlecture).

- Helpful to augment the tree grammar with semantic rules thatdescribe relationships among annotations of parent and children.

- Semantic rules are like action routines, but without explicitspecification of what is executed when.

A CFG or tree grammar with semantic rules is an attribute grammar (AG)Not used much in production compilers, but useful for prototyping (e.g., thefirst validated Ada implementation [Dewar et al., 1980]) and in some coollanguage-based tools

- syntax-directed editing [Reps, 1984]- parallel CSS [Meyerovich et al., 2013]

The book goes into a bit of AG theory, talking aboutsynthesized attributes (depend only on information below the current

node in the tree)inherited attributes (depend at least in part on info from above or

to the side)

Remember that an AG doesn't actually specify the order in which rulesshould be evaluated. There exist tools to figure that out, and a rich theoryof classes of grammars with varying attribute flow (non-circular, circularbut converging, ...)

When basing an AG on a CFG, it's desirable to have attribute flow that’sconsistent with the order in which the parser builds the tree

bottom-up parsers need S-attributed grammars -- all attributes aresynthesized

top-top parsers can use L-attributed grammars, which are a superset --attributes are synthesized or depend on stuff to the left

See the text for more info.

Our CFG w/ action routines to build the AST could be written as an AG bymaking each action routine a semantic rule and then listing the rules foreach production w/out actually embedding them in the RHS.

For something as simple as AST construction, not having to specify whatis done when isn’t much of a savings – a tool to find an evaluationorder consistent w/ attribute flow wouldn’t be useful (it was useful inthe tools mentioned above).

In practice, people do hand-written tree walk on ASTs.Book gives extended example for declaration and type checking inextended calculator grammar.Written as a pure AG, with following attributes:

programerrors - list of all static semantic errors

(type clash, undefined/redefined names)item, expr

symtab - list with types of all names declared to leftitem

errors_in - list of all static semantic errors to lefterrors_out - list of all static semantic errors through here

exprtype errors - list of all static semantic errors inside

everythinglocation

More common to make symbol table and error lists global variablesinsert errors, as found, into a list or tree,

sorted by source locationfor symtab, label each construct with list of active scopes

look up <name, scope> pairs, starting with closest scopefor calculator language, which has no scopes, can enforce

declare-before-use in a simple left-to-right traversal of the tree- complain at any re-definition- or any use w/out prior definition

To avoid cascading errors, it's common to have an "error" value for anattribute that means "I already complained about this."So, for example, in

int areal bint ca := b + c

We label the '+' tree node with type "error" so we don't generate asecond message for the ":=" node.

A few example rules (with error list and symtab as globals):

int_decl : item1 → id item2 // item2 is rest of program▷ if <id.name, ?> ∈ symtab

errors.insert("redefinition of" id.name, item1.location)else

symtab.insert(<id.name, int>)

id : expr → ε▷ if <id.name, A> ∈ symtab

expr.type := Aelse

errors.insert(id.name "undefined", id.location)expr.type := error

‘+’ : expr1 → expr2 expr3

▷ if expr2.type = error or expr3.type = errorexpr1.type := error

else if expr2.type <> expr3.typeexpr1.type := errorerrors.insert("type clash", expr1.location)

elseexpr1.type := expr2.type

The right-pointing triangle here is meant to introduce a semantic rule.(This is not standard notation, but it matches what’s in the text.)

In these particular cases there is only one rule per “production,” but in amore complicated grammar there could be many.

Formal AG notation would require no side effects (no globals) and wouldspecify each semantic rules as Si.ax := f(Si.ax, ..., Sk.ay) – e.g.,

▷ expr.type := if <id.name, A> ∈ symtab then A else error▷ expr.errors := if <id.name, A> ∈ symtab then null

else [id.name “undefined at” id.location]

We can see how these rules would be enforced while walking the syntaxtree:

In a more complicated language, we might make multiple passes over thetree – perhaps• one to fill in the symbol table;• a second to check types, check for undeclared names, match parameter

lists to declarations, etc.; and• a third to generate mid-level IF.

E

program

int_decl

read

real_decl

read

write

a

a

b

b null

2.0

b

a

float

int a

read a

real b

read b

write (float (a) + b) / 2.0 +

÷

program

int_decl

read

real_decl

read

write

a

a

b

b null

2.0

b

a

float

int a

read a

real b

read b

write (float (a) + b) / 2.0 +

÷

MLS iPad
MLS iPad
MLS iPad
MLS iPad
MLS iPad
MLS iPad
MLS iPad