annotation tree grammar not for parsing attributes
TRANSCRIPT
Semantic Analyais and Attribute Evaluation16 and 21 Sept. 2020
========================================Static Analysis
Recall that static semantics are enforced at compile time, and dynamicsemantics are enforced at run time. Some things have to be dynamicsemantics because of late binding (discussed in Chap. 3): we lack thenecessary info (e.g. input values) at compile time, or inferring what wewant is uncomputable.
A smart compiler may avoid run-time checks when it is able to verifycompliance at compile time. This makes programs run faster.
array boundsvariant record tagsdangling references
Similarly, a conservative code improver will apply optimizations onlywhen it knows they are safe
alias analysiscaching in registerscomputation out of order or in parallel
escape analysislimited extentnon-synchronized
subtype analysisstatic dispatch of virtual methods
An optimistic compiler maygenerate multiple versions with a dynamic check to dispatchalways use the "optimized" version if it's speculative --
always safe and usually fastprefetchingtrace scheduling
always start with the "optimized" version but check along the way tomake sure it's safe, and be prepared to roll back
transactional memory
Alternatively, language designer may tighten rulestype checking in ML v. Lisp (cons: 'a * 'a list -> 'a list)definite assignment in Java/C# v. C
----------------------------------------
As noted in Chap. 1, job of semantic analyzer is to(1) enforce rules(2) connect the syntax of the program (as discovered by the parser) to
something else that has semantics (meaning) – e.g.,value for constant expressionscode for subroutines
This work can be interleaved with parsing in a variety of ways.- At one extreme: build an explicit parse tree, then call the semantic
analyzer as a separate pass.- At the other extreme, perform all static dynamic checks and generate
intermediate form while parsing, using action routines calledfrom the parser.
- The most common approach today is intermediate: use action routinesto build an AST, then perform semantic analysis on each top-levelAST fragment (class, function) as it is completed.
We'll focus on this intermediate approach. But first, it's instructiveto see how we could build an explicit parse tree if we wanted.This will help motivate the code to build an AST.
recursive descenteach routine returns its subtree
table-driven top-downpush markers at end-of-productioneach, when popped, pulls k subtrees off separate attribute stack
and pushes new subtree, where k is length of RHS
1: E → T TT2: TT → ao T TT3: T → F FT4: FT → mo F FT5: F → ( E )6: F → id7: F → lit
(A + 1) * B
So how do we build a syntax tree instead?Start with RD
requires passing some stuff into RD routines
AST_node expr():case input_token of
id, literal, ( :T := term()return term_tail(T)
else error
AST_node term_tail(T1):case input_token of
+, - :O := add_op()T2 := term()N := new node(O, T1, T2)return term_tail(N)
), id, read, write, $$ :return T1 // epsilon
else error
It's standard practice to express the extra code as action routines inthe CFG:
E → T { TT.st := T.n } TT { E.n := TT.n }
TT1 → ao T { TT2.st := make_bin_op(ao.op, TT1.st, T.n) } TT2 { TT1.n := TT2.n }
TT → ε { TT.n := TT.st }
T → F { FT.st := F.n } FT { T.n := FT.n }
FT1 → mo F { FT2.st := make_bin_op(mo.op, FT1.st, F.n) } FT2 { FT1.n := FT2.n }
FT → ε { FT.n := FT.st }
F → ( E ) { F.n := E.n }
F → id { F.n := id.n } // id.n comes from scanner
F → lit { F.n := lit.n } // as does lit.n
Here the subscripts distinguish among instances of the same symbol in agiven production.The .n and and .st suffixes are attributes (fields) of symbols.I’ve elided the ao and mo productions.
See how this handles, for example, (A + 1) * B :
A parser generator like ANTLR can turn the grammar w/ action routinesinto an RD parser that builds a syntax tree.
It's also straightforward to turn that grammar into a table-driven TD parser.Give each action routine a numberPush these into the stack along with other RHS symbolsExecute them as they are encountered. That is:
- match terminals- expand nonterminals by predicting productions- execute action routines
e.g., by calling a do_action(#) routine with a big switchstatement inside
requires space management for attributes; companion site (Sec. 4.5.2) explainshow to maintain that space automatically
extension of the attribute stack we used to build a parse tree abovespace for all symbols of all productions on path from rootto current top-of-parse-stack symbol
- when predict, push space for all symbols of RHS- maintain lhs and rsh indices into the stack- at end of production, pop space used by RHS; update lhs and rsh indices
========================================Decorating a Syntax Tree
The calculator language we’ve been using for examples doesn’t havesufficiently interesting semantics.Consider an extended version with types and declarations:
program → stmt_list $$stmt_list → decl stmt_list | stmt stmt_list | εdecl → int id | real idstmt → id := expr | read id | write exprexpr → term term_tailterm_tail → add_op term term_tail | εterm → factor factor_tailfactor_tail → mul_op factor factor_tail | εfactor → ( expr ) | id | int_const | real_const
| float ( expr ) | trunc ( expr )add_op → + | –mul_op → * | /
Now we can- require declaration before use- require type match on arithmetic ops
We could do some of this checking while building the AST.We could even do it while building an explicit parse tree.
The more common strategy is to implement checks once the AST is builteasier -- tree has nicer structuremore flexible -- can accommodate non depth-first left-to-right traversals
- mutually recursive definitionse.g., methods of a class in most languages
- type inference based on use- switch statement label checking
etc.
Assume the parser builds the AST and tags every node with a source location
Tagging of tree nodes is annotationinside the compiler, tree nodes are structs
annotations and pointers to children are fields(annotation can also be done to an explicit parse tree; we’ll stick to ASTs)
But first: what do we want the AST to look like?One appealing way to specify it is a tree grammar.Each "production" of tree grammar has parent on LHS and children on RHS.This is not for parsing; it's to describe the trees that
- we want the parser to build- we need to annotate
Example for the extended calculator language:
program → itemint_decl : item → id item // item is next decl or stmtreal_decl : item → id itemassign : item → id expr itemread : item → id itemwrite : item → expr itemnull : item → ε‘+’ : expr → expr expr‘-’ : expr → expr expr‘*’ : expr → expr expr‘/’ : expr → expr exprfloat : expr → exprtrunc : expr → exprid : expr → ε // no childrenint_const : expr → εreal_const : expr → ε
The A:B syntax on the left means that A is one kind of a B, and may appearwherever a B is expected on a RHS.
Note that "program → item" does not mean that a program "is" an item(the way it does in a CFG), but merely that a program node in a syntax treehas one child, which is an item.
Here's a syntax tree for a tiny program.Structure is given by the tree grammar. Construction would be via execution ofappropriate action routines embedded in a CFG.
Remember: tree grammars are not CFGs.Language for a CFG is the set of possible fringes of parse trees.Language for a tree grammar is the set of possible whole trees.No comparable notion of parsing: structure of tree is self-evident.
Our tree grammar helps guide us as we write (by hand) the actionroutines to build the AST.
It can also help guide us in writing recursive tree-walking routines to performsemantic checks and (later) generate mid-level intermediate code (nextlecture).
- Helpful to augment the tree grammar with semantic rules thatdescribe relationships among annotations of parent and children.
- Semantic rules are like action routines, but without explicitspecification of what is executed when.
A CFG or tree grammar with semantic rules is an attribute grammar (AG)Not used much in production compilers, but useful for prototyping (e.g., thefirst validated Ada implementation [Dewar et al., 1980]) and in some coollanguage-based tools
- syntax-directed editing [Reps, 1984]- parallel CSS [Meyerovich et al., 2013]
The book goes into a bit of AG theory, talking aboutsynthesized attributes (depend only on information below the current
node in the tree)inherited attributes (depend at least in part on info from above or
to the side)
Remember that an AG doesn't actually specify the order in which rulesshould be evaluated. There exist tools to figure that out, and a rich theoryof classes of grammars with varying attribute flow (non-circular, circularbut converging, ...)
When basing an AG on a CFG, it's desirable to have attribute flow that’sconsistent with the order in which the parser builds the tree
bottom-up parsers need S-attributed grammars -- all attributes aresynthesized
top-top parsers can use L-attributed grammars, which are a superset --attributes are synthesized or depend on stuff to the left
See the text for more info.
Our CFG w/ action routines to build the AST could be written as an AG bymaking each action routine a semantic rule and then listing the rules foreach production w/out actually embedding them in the RHS.
For something as simple as AST construction, not having to specify whatis done when isn’t much of a savings – a tool to find an evaluationorder consistent w/ attribute flow wouldn’t be useful (it was useful inthe tools mentioned above).
In practice, people do hand-written tree walk on ASTs.Book gives extended example for declaration and type checking inextended calculator grammar.Written as a pure AG, with following attributes:
programerrors - list of all static semantic errors
(type clash, undefined/redefined names)item, expr
symtab - list with types of all names declared to leftitem
errors_in - list of all static semantic errors to lefterrors_out - list of all static semantic errors through here
exprtype errors - list of all static semantic errors inside
everythinglocation
More common to make symbol table and error lists global variablesinsert errors, as found, into a list or tree,
sorted by source locationfor symtab, label each construct with list of active scopes
look up <name, scope> pairs, starting with closest scopefor calculator language, which has no scopes, can enforce
declare-before-use in a simple left-to-right traversal of the tree- complain at any re-definition- or any use w/out prior definition
To avoid cascading errors, it's common to have an "error" value for anattribute that means "I already complained about this."So, for example, in
int areal bint ca := b + c
We label the '+' tree node with type "error" so we don't generate asecond message for the ":=" node.
A few example rules (with error list and symtab as globals):
int_decl : item1 → id item2 // item2 is rest of program▷ if <id.name, ?> ∈ symtab
errors.insert("redefinition of" id.name, item1.location)else
symtab.insert(<id.name, int>)
id : expr → ε▷ if <id.name, A> ∈ symtab
expr.type := Aelse
errors.insert(id.name "undefined", id.location)expr.type := error
‘+’ : expr1 → expr2 expr3
▷ if expr2.type = error or expr3.type = errorexpr1.type := error
else if expr2.type <> expr3.typeexpr1.type := errorerrors.insert("type clash", expr1.location)
elseexpr1.type := expr2.type
The right-pointing triangle here is meant to introduce a semantic rule.(This is not standard notation, but it matches what’s in the text.)
In these particular cases there is only one rule per “production,” but in amore complicated grammar there could be many.
Formal AG notation would require no side effects (no globals) and wouldspecify each semantic rules as Si.ax := f(Si.ax, ..., Sk.ay) – e.g.,
▷ expr.type := if <id.name, A> ∈ symtab then A else error▷ expr.errors := if <id.name, A> ∈ symtab then null
else [id.name “undefined at” id.location]
We can see how these rules would be enforced while walking the syntaxtree:
In a more complicated language, we might make multiple passes over thetree – perhaps• one to fill in the symbol table;• a second to check types, check for undeclared names, match parameter
lists to declarations, etc.; and• a third to generate mid-level IF.
E
program
int_decl
read
real_decl
read
write
a
a
b
b null
2.0
b
a
float
int a
read a
real b
read b
write (float (a) + b) / 2.0 +
÷
program
int_decl
read
real_decl
read
write
a
a
b
b null
2.0
b
a
float
int a
read a
real b
read b
write (float (a) + b) / 2.0 +
÷