BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
1
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 1
UNIT - 4
1. What is the use of context free grammar? (Nov 2011)
It is useful for describing arithmetic expressions with arbitrary nesting of balanced
parenthesis.
It is also useful for describing block structure in programming languages.
2. Define ambiguous. (May 2012)
A grammar G is said to be ambiguous if it generates more than one parse tree for some
sentence of language L (G). i.e. both leftmost and rightmost derivations are same for the given
sentence.
3. Draw the Dag for assignment statement: a:=b*-c + b*-c. (Nov 2011)
a:=b*-c + b*-c.
4. What is parsing Tree? (May 2012)
A concrete syntax tree or parse tree or parsing tree[1]
is an ordered, rooted tree that
represents the syntactic structure of a string according to some context-free grammar.
Parse trees are usually constructed according to one of two competing relations, either in
terms of the constituency relation of constituency grammars (= phrase structure
Parsing: Role of Parser – Context free Grammars – Writing a Grammar – Predictive Parser – LR
Parser.
Intermediate Code Generation: Intermediate Languages – Declarations – Assignment
Statements – Boolean Expressions – Case Statements – Back Patching – Procedure Calls.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
2
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 2
grammars) or in terms of the dependency relation of dependency grammars. Parse trees
are distinct from abstract syntax trees (also known simply as syntax trees), in that their
structure and elements more concretely reflect the syntax of the input language.
5. Define Three-Address Code. (Nov 2012)
Three-address code is a sequence of statements of the general form
x := y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands
for any operator, such as fixed or floating-point arithmetic operator, or a logical
operator on boolean-valued data.
Three-address code is a linearized representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph.
The three-address code is generated using semantic rules that are similar to those
for constructing syntax trees for generating postfix notation.
6. Differentiate phase and pass. (Nov2012)
In an implementation of a compiler, portions of one or more phases are combined into a
module called a pass.
A pass reads the source program or the output of the previous pass,
makes the transformations specified by its phases, and writes output into an intermediate
file, which may then be read by a subsequent pass.
If several phases are grouped into one pass, then the operation of the phases may be
interleaved, with control alternating among several phases.
7. State the function of a intermediate code generator? (April 2013)
1. A Compiler for different machines can be created by attaching different back end
to the existing front ends of each machine.
2. A Compiler for different source languages can be created by proving different
front ends for corresponding source languages t existing back end.
3. A machine independent code optimizer can be applied to intermediate code in
order to optimize the code generation.
8. What is basic block? (April 2013)
A basic block is a sequence of consecutive statement in which flow of control enters at
the beginning leaves at the end with out halt or possibility of branching except at the end.
9. List the types of parser for grammars? (May 2014)
LR Parser (L: left to right scanning of the input and R: constructing right most derivation
in reverse)
SLR: Simple LR
Canonical LR
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
3
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 3
LALR: Lookahead LR
10. Define machine word. (May 2014)
In computing, word is a term for the natural unit of data used by a
particular processor design. A word is basically a fixed-sized group of digits (binary or
decimal) that are handled as a unit by the instruction set or the hardware of the processor.
The number of digits in a word (the word size, word width, or word length) is an
important characteristic of any specific processor design or computer architecture.
EXPLAIN THE ROLE OF A PARSER? (5 MARKS)
• Accepts string of tokens from lexical analyzer (usually one token at a time)
• Verifies whether or not string can be generated by grammar
• Reports syntax errors (recovers if possible)
Parser obtains a string of tokens from the lexical analyzer and verifies that it can be generated
by the language for the source program. The parser should report any syntax errors in an
intelligible fashion.
The two types of parsers employed are: 1. Top down parser: which build parse trees from top (root) to bottom (leaves) 2. Bottom up parser: which build parse trees from leaves and work up the root. Therefore there are two types of parsing methods– top-down parsing and bottom-up parsing.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
4
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 4
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
5
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 5
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
6
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 6
WHAT IS MEANT BY CONTEXT FREE GRAMMAR? EXPLAIN IT (11 MARKS)
Many programming language constructs have an inherently recursive structure that can be
defined by context-free grammars.
A context-free grammar (grammar for short) consists of terminals; nonterminals, a start symbol,
and productions.
1. Terminals are the basic symbols from which strings are formed. The word "token" is a synonym
for "terminal" each of the keywords if, then, and else is a terminal.
2. Nonterminals are syntactic variables that denote sets of strings. Eg: stmt and expr are non
terminals. The nonterminals define sets of strings that help define the language generated by the
grammar. 'They also impose a hierarchical structure on the language that is useful for both syntax
analysis and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of strings it
denotes is the language defined by the grammar.
4. The productions of a grammar specify the manner in which the terminals and non terminals can
be combined to form strings. Each production consists of a nonterminals, followed by an .arrow
(sometimes the symbol: = is used in place of the arrow), followed by a string of nonterminals and
terminals.
Eg:
In this grammar, the terminal symbols are
The nonterminals symbols are expr and op, and expr is the start symbol
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
7
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 7
NOTATIONAL CONVENTIONS
1. These symbols are terminals:
i) Lower-case letters early in the alphabet such as a, b, c.
ii) Operator symbols such as +, -, etc.
iii) Punctuation symbols such as parentheses, comma, etc.
iv) The digits 0, I, ..., 9.
v) Boldface strings such as id or if.
2. These symbols are non terminals:
i) Upper-case letters early in the alphabet such as A. B, C.
ii) The letter S, which, when it appears, is usually the start symbol.
iii) Lower-case italic names such as expr or stmt.
3. Upper-case letters late in the alphabet, such as X,Y, Z, represent grammar symbols, that is, either
nonterminals or terminals.
4. Lower-case letters late in the alphabet, chiefly u, v, ... , z, represent strings of terminals.
5. Lower-case Greek letters represent strings of grammar symbols.
6. If A a1, Aa2, Aa3, ……Aak are all productions with, A on the left (A-productions),
write as Aa1|a2|a3|…|ak the alternatives for A.
7. Unless otherwise stated, the left side of the first production is the start symbol.
DERIVATIONS
Derivational view gives a precise description of the top-down construction of a parse tree. The
central idea is that a production is treated as a rewriting rule in which the nonterminals on the left is
replaced by the string on the right side of the production.
For example, consider the following grammar for arithmetic expressions, with the nonterminals E
representing an expression.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
8
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 8
The production E - E signifies that an expression preceded by a minus sign is also an expression.
This production can be used to generate more complex expressions from simpler expressions by
allowing us to replace any instance of an E by - E.
Given a grammar G with starts symbol S + relation to define L(G), the language generated by G.
Strings in L(G) may contain only terminal symbols of G. A string of terminals w is in L (G) if and
only if S + w. The string w is called a sentence of G. A language that can be, generated by a
grammar is said to be a context-free language. If two grammars generate, the same language, the
grammars are said to be equivalent.
PARSE TREES AND DERIVATIONS
A parse tree may be viewed as a graphical representation for a derivation that filters out the choice
regarding replacement order. Each interior node of a parse tree is labeled by some nonterminals A,
and that the children of the node are labeled, from left to right, by the symbols in the right side of
the production by which this A was replaced in the derivation.
The leaves of the parse tree are labeled by nonterminals or terminals and read from left to right,
they constitute a sentential form, called the yield or frontier of the -tree.
Parse Tree for –(id+id):
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
9
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 9
Building the parse tree for derivation:
The sentence id + id * id has the two distinct leftmost derivations:
Two parse tree for id + id * id
Ambiguity:
A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Put
another, way, an ambiguous grammar is one that produces more than one leftmost or more than one
rightmost derivation for the same sentence
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
10
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 10
EXPLAIN BRIEFLY ABOUT WRITING A GRAMMAR? (6 MARKS)
An efficient non-back tracking form of top-down parser is called a predictive parser.
Recursive-Descent Parsing
Top-down parsing can be viewed as an attempt to find, a left most derivation for an input string.
Equivalently, it can be viewed as an attempt to construct a parse tree for the input starting from the
root and creating the nodes of the parse tree in-preorder.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
11
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 11
Recursive descent involves backtracking, that is, making repeated scans of the input. Backtracking
is rarely needed to parse programming language constructs.
In situations like natural language parsing, backtracking is still not very efficient.
Consider the grammar
ScAd
Aab|a
An input string w=cad, steps in top-down parse are as:
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
12
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 12
A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go
into an infinite loop.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
13
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 13
EXPLAIN BRIEFLY ABOUT PREDICTIVE PARSERS? (6 MARKS)
In many cases, by carefully writing a grammar, eliminating left recursion from it, and left factoring
the resulting grammar, we can obtain a grammar that can be parsed by a recursive-descent parser
that needs no backtracking; i.e., a predictive parser, To construct a predictive parser, we must.
Know; given the current input symbol and, the nonterminal A to be expanded, with one of the
terminals. Flow-of-control constructs in most programming languages, with their distinguishing
keywords, are usually detectable in this way.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
14
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 14
For example, if we have the productions
stmt if expr then stmt else stmt
| while expr do stmt
| begin stmt_list end
then the keywords if, while, and begin tell us which alternative is the only one that could possibly
succeed to find a statement.
Transition Diagrams for Predictive Parsers
In the case of the parser,
There is one diagram for each nonterminal. The labels of edges are tokens and nonterminals. A
transition on a token (terminal) means we should take that transition if that token is the next input
symbol. A transition on a nonterminal A is a call of the procedure for A.
To construct the transition diagram of a predictive parser from a grammar, first eliminate left
recursion from the grammar, and then left factor the grammar. Then for each nonterminals A do the
following:
1. Create an initial and final (return) state.
2. For each production A X1 X2 ... Xn, create a path from the initial to
the final state, with edges labeled X1 X2, ….Xn.
The predictive parser working
It begins in the start state for the start symbol.
If after some actions it is in state s with an edge labeled by terminal a to state t, and if the next input
symbol is a, then the parser moves the input cursor one position right and goes to state t.
If, on the other hand, the edge is labeled by a nonterminal A, the parser instead goes to the start
state for A, without moving the input cursor.
If it ever reaches the final state for A, it immediately goes to state t, in effect having read A from the
input during the time it moved from state s to t.
Finally, if there is an edge from s to t labeled ε, then from state s the parser immediately goes to
state t, without advancing the input.
Transition diagrams can be simplified by substituting diagrams in one another; these substitutions
are similar to the transformations on grammars.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
15
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 15
AβA‘
A‘αA‘|ε
Consider the following grammar for arithmetic expressions,
ET
E‘+TE‘|ε
TFT ‘
T‘ *FT‘| ε
F (E)| id
Simplified transition diagram
EXPLAIN BRIEFLY ABOUT NON PREDICTIVE PARSER? EXPLAIN ITS ALGORITHM
(6 MARKS)
It is possible to build a non recursive predictive parser by maintaining a stack explicitly, rather than
implicitly through recursive calls. The key problem during predictive parser is that of determining
the production to be applied for a non terminal.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
16
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 16
A table-driven predictive parser has an input buffer. a stack, a parsing table; and an output stream.
The input buffer contains the string to be parsed, followed by $, a symbol used as a right end
marker to indicate the end of the input string. The Stack contains a sequence of grammar symbols
with $ on the bottom, indicating the bottom of the stack. Initially, the stack contains the start
symbol of the grammar on top of $. The parsing table is a two dimensional array M [A, a], where A
is a nonterminal, and a is a terminal or the symbol.
The program considers X, the symbol on top of the stack, and a the current input symbol. These two
symbols determine the action of the parser. There are three possibilities.
1. If X = a = $, the parser halts and announces successful completion of parsing.
2. If X = a * $, the parser pops X off the stack and advances the input pointer to the next input
symbol.
3. If X is a nonterminal, the program consults entry M IX, a] of the parsing table M. This entry
will be either an X-production of the grammar or an error entry. For example M[X. a] = {X
UVW, the parser replaces X on top of the stack by WVU ( U on top).
If M[X, a] = error, the parser calls an error recovery routine.
The behavior of the parser can be described in terms of its configurations, which give the stack
contents and the remaining input.
ALGORITHM
Non recursive predictive parsing.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
17
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 17
Input: A string w and a parsing table M for grammar G.
Output: If w is in L(G), a leftmost derivation of w; otherwise an error indication.
Method: Initially, the parser is in a configuration in which it has $S on the stack with S, the start
symbol of G on top; and w$ in the input buffer.
FIRST and FOLLOW
The construction of a predictive parser is aided by two functions associated with a grammar G.
These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing table
for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be used as
synchronizing tokens during panic-mode error recovery.
If α is any string of grammar symbols, let F1RST (α) be the set of terminals that begin the strings
derived from α. If a ε, then ε is also in FIRST (α).
Define FOLLOW (A), for nonterminal A, to be the set of terminals a, that can appear immediately
to the right of A in some sentential form, that is; the set of terminals a such that there exists a
derivation of the form S * α A a β for some α and β. If A can be the rightmost symbol in some
sentential form, then $ is in FOLLOW (A).
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
18
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 18
To compute FIRST (X) for all grammar symbols X, apply the following rules until no more
terminals or ε be added to any F1RST set.
I. If X is terminal, and then FIRST(X) is {X}.
2. If X ε is a production, then add ε to FIRST(X).
3. If X is non-terminal. And XY I Y2……Yk is a production, then place a in FIRST(X) if for
some i, a is in FIRST (Yi), and ε is in all of FIRST (Y I)… FIRST (Yi-1); that is, Y 1…. Yi-1* ε.
If ε is in FIRST (Yj) for. All j = 1, 2... k, then add ε to FIRST (X).
Now, compute FIRST for any string X1 X2 … X. as follows,
Add to FIRST (X1 X2 ….Xn) all the non ε symbols of FIRST (X1).
Also add the non ε symbols of FIRST(X 2) if ε is in FIRST (X1)' the non ε symbols of FIRST(X3)
if ε is in both FIRST(X1) and FIRST(X2), and so on.
Finally, add ε to FIRST(X IX2 ... Xn) if, for all i, FIRST(X;) contains ε.
FOLLOW
To compute FOLLOW (A) for, all nonterminals A, apply the following rules until nothing can be
added to any FOLLOW set.
I. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.
2. If there is a production A αBβ, then everything in FIRST (β) except for ε is placed in
FOLLOW(B).
3. If there is a production A αB, or a production A αBβ where FIRST (β) contains ε(i.e.,
β*ε), then everything in FOLLOW(A)is in FOLLOW(B).
E TE‘
E‘ TE‘/ ε
T FT‘
T‘ *FT‘/ ε
F (E)/ id
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
19
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 19
Construction of a predictive parsing table.
Input. Grammar G.
Output. Parsing table M.
Method.
1. For each production A a of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST (a), add A a to M[A, a].
3. If E is in FIRST (a), add A-a to M [A,b] for each terminal b in FOLLOW(A). If E is in FIRST
(a) and $ is in FOLLOW (A), add A a to M [A, $].
4. Make each undefined entry of M be error
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
20
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 20
WRITE AN ALGORITHM FOR CONSTRUCTING LR PARSER TABLE.***
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
21
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 21
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
22
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 22
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
23
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 23
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
24
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 24
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
25
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 25
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
26
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 26
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
27
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 27
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
28
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 28
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
29
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 29
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
30
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 30
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
31
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 31
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
32
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 32
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
33
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 33
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
34
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 34
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
35
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 35
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
36
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 36
. Explain briefly about CLR parser algorithm? (11 marks)
Constructing Canonical LR Parsing Tables:
In the SLR method, state i calls for reduction by Aa if the set of items Ii contains item [A α]
and a is in FOLLOW (A). In some situations, however, when state i appears on top of the stack, the
viable prefix αβ on the stack is such that βA cannot be followed by a in a right-sentential form.
Thus, the reduction by A a would be invalid on input a.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
37
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 37
Construction of the sets of LR(1) items.
Input. An augmented grammar G'.
Output. The sets of LR( I) items that are the set of items valid for one or more viable prefixes of G' .
Method. The procedures closure and goto and the main routine items for constructing the sets of
items are computed.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
38
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 38
Construction of the canonical LR parsing table
Input. An augmented grammar G'.
Output. The canonical LR parsing table functions action and goto for G'.
Method.
1. Construct C = {Io, I …. ,In}, the collection of sets of LR(l) items for G'.
2. State i of the parser is constructed from Ii. The parsing actions for state i are determined as
follows:
a) If. [A αaβ b] is in Ii and goto(Ii, a) = Ij, then set action[i, a] to "shift j." Here, a is required to
be a terminal. .
b) If [A α, a] is in Ii, A != S', then set action[i, a] to reduce A α.
c) If [S'S, $] is in Ii, then set action Ii, $1 to accept If a conflict results from the above rules, the
grammar is said not to be LR(l), and the algorithm is said to fail.
3. The goto transitions for state i are determined as" follows: If goto(Ii, A) =Ij, then goto[i, A] = j.
4. All entries not defined by rules (2) and (3) are made "error."
5. The initial state of the parser is the one constructed from the set containing item[S''S, $].
The table formed from the parsing action and goto function's produced by algorithm is called the
canonical LR (l) parsing table. An LR parser using this table is called a canonical LR (l) parser. If
the parsing action function has no multiply-defined entries, then the given grammar is called an LR
(J) grammar.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
39
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 39
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
40
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 40
Constructing LALR Parsing Tables
This method is often used in practice because the tables obtained by it are considerably smaller than
the canonical LR tables, yet most common syntactic constructs of programming languages can be
expressed conveniently by an LALR grammar. The same is almost true for SLR grammars. But
there are a few constructs that cannot be conveniently handled by SLR techniques.
For a comparison of parser size, the SLR and LALR tables for a grammar always have the same
number of states, and this number is typically several hundred states for a language like Pascal. The
canonical LR table would typically have several thousand states for the same size language. Thus it
is much easier and more economical to construct SLR and LALR tables than the canonical LR
tables.
Consider the grammar
Whose sets of LR(I) items were shown in the goto graph.
Take a pair of similar looking states such as I4 and I7. Each of these states has only items with first
component C d. In I4, the lookaheads are c or d; in I7, $ is the only lookahead.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
41
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 41
To see the difference between the roles of I4 and I7 ,in the parser, note that grammar generates the
regular set c*dc*d. When reading an input cc ... cdcc ... cd, the parser shifts the first group of c‘s
and their following d onto the stack, entering state 4 after reading the d. The parser then calls for a
reduction by C d provided the next input symbol is c or d.
The requirement that c or d follow makes sense since these are the symbols that could begin strings
in c*d. If $ follows the first d, we have an input like ccd, which is not in the language, and state 4
correctly declares an error if $ is the next input.
The parser enters state 7 after reading the second d. Then, the parser must see $ on the input, or it
started with a string not of the form c*dc*d. It thus makes sense that state 7 should reduce by C d
on input $ and declare error on inputs c or d.
Let us now replace I4 and I7 by I47, the union of 14 and 17, consisting of the set of three items
represented by [C d c/d/$]. The goto's on d to I4 or 17 from Io, I2,I3 and I6 now enter I47. The
action of state 47 is to reduce on any input. The revised parser behaves essentially like the original,
although it might reduce d to C in circumstances where the original would declare error, for
example, on input like ccd or cdcdc. The error will eventually be caught in fact; it will be caught
before any more input symbols are shifted.
More generally, we can look for sets of LR (I) items having the same core, ie set of first
components and merge these sets with common cores into one set of items.
For example, I4 and I7 form such a pair, with core {Cd}.
In general, a core is a set of LR(O) items for the grammar at hand, and that an LR( 1) grammar may
produce more than two sets of items with the same core.
Since the core of goto (I , X) depends only on the core of I, the goto's of merged sets can
themselves be merged. Thus, there is no problem revising the goto function as we merge sets of
items. The action functions are modified to reflect the non-error actions of all sets of items in the
merger.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
42
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 42
Suppose we have an LR (l) grammar, ie, one whose sets of LR(l) items produce no, parsing action
conflicts.
If we replace all states having the same core with their union, it is possible that the resulting union
will have a conflict, but it is unlikely for the following reason: suppose in the union there is a
conflict on lookahead ‗a‘ because there is an item
[Aa', a] calling for a reduction by A α, and there is another item [B βγ,b] calling for a shift.
Then some set, of items from which the union was formed has item [A a', a], and since the cores
of all these states are the same, it must have an item [B βγ,b] for some c.
But this state has the same shift/reduce conflict on a, and the grammar was not LR(I). Thus, the
merging of states with common cores can never produce a shift/reduce conflict that was not present
in one of the original states, because shift actions depend only on the core, not the lookahead.
10. Explain briefly about LALR Parser? (11 marks)
An easy, but space-consuming LALR table construction.
Input. An augmented grammar G';
Output... The LALR parsing table functions action and goto for G'.
Method.
1. Construct C = {Io, I1.....In}, the collection of sets of LR (I) items.
2. For each core present among the set of LR (1) items, find all sets having that core, and replace
these sets by their union.
3. Let C' = {Jo, J1.....Jm} be the resulting sets of LR(I) items. The parsing actions for state i are
constructed from Ji in the same manner as in algorithm. If there is a parsing action conflict, the
algorithm fails to produce a parser, and the grammar is said not to be LALR (1).
4. The goto table is constructed as follows. If J is the union of one or more sets of LR(I) items, ie J
= I1 U I2 U .. U Ik, then the cores of goto(Il,X), goto(I2, X), ..., goto(Ik,X) are the same, since
I1,12, ... ,Ik all have the same core. Let K be the union of all sets of items having the same core as
goto(I1,X).Then goto (J, X) =K.
The table produced by the algorithm is called the LALR parsing table for G. If there are no parsing
action conflicts, then the given grammar is said to be an LALR(1) grammar.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
43
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 43
The collection of sets of items constructed in step (3) is called the LALR( I) collection.
Consider the grammar whose goto graph is taken previous:
Consider the augmented grammar
The kernels of the sets, of LR (0) items for this grammar
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
44
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 44
Determining lookaheads.
Input. The kernel K of a set of LR(O) items I and a grammar symbol x.
Output. The lookaheads spontaneously generated by items in I for kernel items in goto(I,X) and the
items in I from which lookaheads are propagated to kernel items in
goto (I, X).
Method. It uses, a dummy lookahead symbol # to detect situations in which lookaheads propagate'"
Efficient computation of the kernels of the LALR (I) collection.
Input. An augmented grammar G'.
Output. The kernels of the LALR( I) collection of sets of items· for G'.
Method.
1. Construct the kernels of the sets of LR(0) items for G.
2. Apply the algorithm to the kernel of each set of LR (0) items and grammar symbol X to
determine which lookaheads are spontaneously generated for kernel items in goto(I, X), and from
which items in 1 lookaheads are propagated to kernel items in goto (I, X).
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
45
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 45
3. Initialize a table that gives, for each kernel item in each set of items, the associated lookaheads.
Initially, each item has associated with it only those lookaheads that we determined in (2) were
generated spontaneously.
4. Make repeated passes over the kernel items in all sets. When we visit an item i, we look up the
kernel items to which i propagates its lookaheads, using information tabulated in (2). The current
set of lookaheads for i is added to those already associated with each of the items to which i
propagates its lookaheads. We continue making passes over the kernel items until no more new
lookaheads are propagated. 0
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
46
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 46
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
47
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 47
INTERMEDIATE CODE GENERATION
What are the types of intermediate representations? (5 marks)
Compiler generate an easy represent form of source language called intermediate
language and its leads to efficient code generation.
INTERMEDIATE
CODE
There are three types of intermediate representation:-
1. Syntax Trees
2. Postfix notation
3. Three Address Code
Semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees of for generating postfix
notation.
Graphical Representations
A syntax tree depicts the natural hierarchical structure of a source program. A DAG
(Directed Acyclic Graph) gives the same information but in a more compact way because
common sub-expressions are identified. A syntax tree for the assignment statement a:=b*-
c+b*-c appear in the figure.
The expression a:=b*-c + b*-c
PARSER STATIC
CHECKER
INTERMEDIATE CODE
GENERATOR CODE GENERATOR
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
48
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 48
Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the
in which a node appears immediately after its children. The postfix notation for the syntax
tree in the fig is
a b c uminus + b c uminus * + assign
The edges in a syntax tree do not appear explicitly in postfix notation. They can be
recovered in the order in which the nodes appear and the no. of operands that the operator at
a node expects. The recovery of edges is similar to the evaluation, using a staff, of an
expression in postfix notation.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
49
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 49
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
50
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 50
Representation of syntax trees
12. What are the types of three address statements? (11 marks)
Types of Three-Address Statements
Three-address statements are akin to assembly code. Statements can have symbolic labels
and there are statements for flow of control. A symbolic label represents the index of a
three-address statement in the array holding inter- mediate code. Actual indices can be
substituted for the labels either by making a separate pass, or by using ‖back patching,‖
discussed in Section 8.6. Here are the common three-address statements used in the
remainder of this book:
1. Assignment statements of the form x: = y op z, where op is a binary arithmetic or
logical operation.
2. Assignment instructions of the form x:= op y, where op is a unary operation. Essential
unary operations include unary minus, logical negation, shift operators, and conversion
operators that, for example, convert a fixed-point number to a floating-point number.
3. Copy statements of the form x: = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the next to
be executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational
operator (<, =, >=, etc.) to x and y, and executes the statement with label L next if x stands
in relation relop to y. If not, the three-address statement following if x relop y goto L is
executed next, as in the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned
value is optional. Their typical use is as the sequence of three-address statements
param x1
param x2
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
51
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 51
param xn
call p, n
generated as part of a call of the procedure p(x,, x~,..., x‖). The integer n indicating the
number of actual parameters in ‖call p, n‖ is not redundant because calls can be nested. The
implementation of procedure calls is outline d in Section 8.7.
7. Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of these sets x to the
value in the location i memory units beyond location y. The statement x[i]:=y sets the
contents of the location i units beyond x to the value of y. In both these instructions, x, y,
and i refer to data objects.
8. Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of
these sets the value of x to be the location of y. Presumably y is a name, perhaps a
temporary, that denotes an expression with an I-value such as A[i, j], and x is a pointer name
or temporary. That is, the r-value of x is the l-value (location) of some object!. In the
statement x: = ~y, presumably y is a pointer or a temporary whose r- value is a location. The
r-value of x is made equal to the contents of that location. Finally, +x: = y sets the r-value of
the object pointed to by x to the r-value of y.
The choice of allowable operators is an important issue in the design of an
intermediate form. The operator set must clearly be rich enough to implement the operations
in the source language. A small operator set is easier to implement on a new target machine.
However, a restricted instruction set may force the front end to generate long sequences of
statements for some source, language operations. The optimizer and code generator may
then have to work harder if good code is to be generated.
Explain the process of syntax directed translation of three address code? (6 marks)
Syntax-Directed Translation into Three-Address Code
When three-address code is generated, temporary names are made up for the interior nodes
of a syntax tree. The value of non-terminal E on the left side of E E1 + E will be
computed into a new temporary t. In general, the three- address code for id: = E consists of
code to evaluate E into some temporary t, followed by the assignment id.place: = t. If an
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
52
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 52
expression is a single identifier, say y, then y itself holds the value of the expression. For the
moment, we create a new name every time a temporary is needed; techniques for reusing
temporaries are given .
The S-attributed definition in Fig generates three-address code for assignment statements.
Given input a: = b* – c + b* – c, it produces the code in Fig. 8.5(a). The synthesized
attribute S.code represents the three- address code for the assignment S. The non-terminal E
has two attributes:
1. E.place, the name that will hold the value of E, and
2. E.code, the sequence of three-address statements evaluating E.
The function newtemp returns a sequence of distinct names t1, t2,... in response to
successive calls. For convenience, we use the notation gen(x ‘: =‘ y ‘+‘ z) in Fig. 8.6 to
represent the three-address statement x: = y + z. Expressions appearing instead of variables
like x, y, and z are evaluated when passed to gen, and quoted operators or operands, like ‘+‘,
are taken literally. In practice, three- address statements might be sent to an output file,
rather than built up into the code attributes. Flow-of-control statements can be added to the
language of assignments in Fig. 8.6 by productions and semantic rules) like the ones for
while statements in Fig. 8.7. In the figure, the code for S - while E do S, is generated using‘
new attributes S.begin and S.after to mark the first statement in the code for E and the
statement following the code for S, respectively.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
53
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 53
These attributes represent labels created by a function new label that returns a new label
every time it is called. Note that S.after becomes the label of the statement that comes after
the code for the while statement. We assume that a non-zero expression represents true; that
is, when the value of F becomes zero, control leaves the while statement. f:expressions that
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
54
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 54
govern the flow of control may in general be Boolean expressions containing relational and
logical operators. The semantic rules for while statements in Section 8.6 differ from those in
Fig. 8.7 to allow for flow of contro1 within Boolean expressions. Postfix notation -an be
obtained by adapting the semantic rules in Fig. 8.6 (or see Fig. 2.5). 1he postfix notation for
an identifier is the identifier itself. The rules for the other productions concatenate only the
operator after the code for the operands. For example, associated with the production E – E,
is the semantic rule
E.code:= E1.code || ’uminus’
1n general, the intermediate form produced by the syntax-directed translations in this
chapter can he changed by making similar modifications to the semantic rules.
6. List out and discuss the different type of intermediate code.
There are three types of intermediate representation:-
1. Syntax Trees
2. Postfix notation
3. Three Address Code
Semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees of for generating postfix
notation.
Graphical Representations
A syntax tree depicts the natural hierarchical structure of a source program. A DAG
(Directed Acyclic Graph) gives the same information but in a more compact way because
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
55
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 55
common sub-expressions are identified. A syntax tree for the assignment statement a:=b*-
c+b*-c appear in the figure.
. fig8.2
Posix (Postfix notation) is a linearized representation of a syntax tree; it is a list of the
nodes of the in which a node appears immediately after its children. The postfix notation for
the syntax tree in the fig is
The edges in a syntax tree do not appear explicitly in postfix notation. They can be
recovered in the order in which the nodes appear and the no. of operands that the operator at
a node expects. The recovery of edges is similar to the evaluation, using a staff, of an
expression in postfix notation.
Three-address code is a sequence of statements of the general form
X:= Y Op Z
where x, y, and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on
Boolean-valued data. Note that no built-up arithmetic expressions are permitted, as there is
only one operator on the right side of a statement. Thus a source language expression like
x+y*z MIGHT be translated into a sequence
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
56
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 56
t1 := y * z
t2 : = x + t1
where t1 and t2 are compiler-generated temporary names. This unraveling of complicated
arithmetic expressions and of nested flow-of-control statements makes three-address code
desirable for target code generation and optimization. The use of names for the intermediate
values computed by a program allow- three-address code to be easily rearranged – unlike
postfix notation. three-address code is a linearized representation of a syntax tree or a dag in
which explicit names correspond to the interior nodes of the graph.
Eg: a=b+c+d
t1 := b+c
t2:= t1+d
a= t2
Implementations of three-Address Statements
A three-address statement is an abstract form of intermediate code. In a compiler, these
statements can be implemented as records with fields for the operator and the operands.
Three such representations are quadruples, triples, and indirect triples.
Quadruples
A quadruple is a record structure with four fields, which we call op, arg l, arg 2, and result.
The op field contains an internal code for the operator. The three-address statement x:= y op
z is represented by placing y in arg 1. z in arg 2. and x in result.
For example : consider the input statement x:= -a *b + -a * b
The TAC is
t1 := uminus a
t2 := t1 * b
t3 := -a
t4 := t3 + t4
x := t5
op Arg1 Arg2 Result
(0) uminus a t1
(1) * t1 b t2
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
57
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 57
fig8.8(a)Qudraples
Triples
To avoid entering temporary names into the symbol table. we might refer to a temporary value
bi the position of the statement that computes it. If we do so, three-address statements can be
represented by records with only three fields: op, arg 1 and arg2, as in Fig. 8.8(b). The fields arg
l and arg2, for the arguments of op, are either pointers to the symbol table (for programmer-
defined names or constants) or pointers into the triple structure (for temporary values). Since
three fields are used, this intermediate code format is known as triples.‘ Except for the treatment
of programmer-defined names, triples correspond to the representation of a syntax tree or dag by
an array of nodes, as in Fig. 8.4triples
fig8.8(b) triples
(2) uminus a t3
(3) * t3 b t4
(4) + t2 t4 t5
(5) := t5 x
number op Arg1 Arg2
(0) uminus a
(1) * (0) b
(2) uminus a
(3) * (2) b
(4) + (1) (3)
(5) := x (4)
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
58
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 58
Parenthesized numbers represent pointers into the triple structure, while symbol-table
pointers are represented by the names themselves. In practice, the information needed to
interpret the different kinds of entries in the arg 1 and arg2 fields can be encoded into the op
field or some additional fields. The triples in Fig. 8.8(b) correspond to the quadruples in Fig.
8.8(a). Note that the copy statement a:= t5 is encoded in the triple representation by placing
a in the arg 1 field and using the operator assign. A ternary operation like x[ i ]: = y requires
two entries in the triple structure, as shown in Fig. 8.9(a), while x: = y[i] is naturally
represented as two operations in Fig. 8.9(b).
Indirect Triples
Another implementation of three-address code that has been considered is that of listing
pointers to triples, rather than listing the triples themselves. This implementation is naturally
called indirect triples. For example, let us use an array statement to list pointers to triples in
the desired order.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
59
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 59
How declarations are translated into intermediate code? (11 marks)
As the sequence of declarations in a procedure or block is examined, we can lay out
storage for names local to the procedure. For each local name, we create a symbol-table
entry with information like the type and the relative address of the storage for the name. The
relative address consists of an offset from the base of the static data area or the field for
local data in an activation record. When the front end generates addresses, it may have a
target machine in mind. Suppose that addresses of consecutive integers differ by 4 on a
byte- addressable machine. The address calculations generated by the front end may
therefore include multiplications by 4. The instruction set of the target machine may also
favor certain layouts of data objects, and hence their addresses. We ignore alignment of data
objects here;
DECLARATIONS IN A PROCEDURE
The syntax of languages such as C, Pascal, and FORTRAN, allows all the declarations in a
single procedure to be processed as a group. In this case, a global variable, say offset, can
keep track of the next avai1able relative address. Non-terminal P generates a sequence of
declarations of the form id: T. Before ‘.he first declaration is considered, offset is set to 0.
As each new name is seen, that name is entered in the symbol table with offset equal to the
current value of offset, and offset is incremented by the width of the data object denoted by
that name. The procedure enter (name, type, offset) creates a symbol-table entry for name,
gives it type and relative address offset in its data area. We use synthesized attributes type
and width for non-terminal T to indicate the type and width, or number of memory units
statement
(0) (11)
(1) (12)
(2) (13)
(3) (14)
(4) (15)
(5) (16)
number op Arg1 Arg2
(0) uminus a
(1) * (11) b
(2) uminus a
(3) * (13) b
(4) + (12) (14)
(5) := x (15)
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
60
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 60
taken by objects of that type. Attribute type represents a type expression constructed from
the basic type‘s integer and real by applying the type constructors‘ pointer and array. If type
expressions are represented by graphs, then attribute type might be a pointer to the node
representing a type expression. Integers have width 4 and real have width 8. The width of an
array is obtained by multiplying the width of each element by the number of elements in the
array.- The width of each pointer is assumed to be 4.
P ->D
D -> D ; D
D -> id : T {enter (id.name, T.type, offset);
Offset:= offset + T.width }
T ->integer {T.type :=integer;
T.width :=4}
T -> real {T.type := real;
T.width := 8}
T -> array [num ] of T1 {T.type :=array(num.val, T1.type);
T.width :=num.val X T1.width}
T -> ^T1 {T.type :=pointer (T.type);
T.width:=4}
In Pascal and C, a pointer may be seen before we learn the type of the object
pointed to Storage allocation for such types is simpler if all pointers have the same width.
The initialization of offset in the translation scheme of Fig. 8.1 is more evident if the first
production appears on one line as:
P {offset:= 0 } D
Non-terminals generating a. called marker non-terminals in Section 5.6, can be used to
rewrite productions so that all actions appear at the ends of right sides. Using a marker non-
terminal M, (8.2) can be restated as:
P → M D
M →ε (offset:= 0}
Keeping Track of Scope Information
In a language with nested procedures, names local to each procedure can be
assigned relative addresses using the approach of Fig. 8.11 . When a nested procedure is
seen, processing of declarations in the enclosing procedure is temporarily suspended. This
approach will he illustrated by adding semantic rules to the following language.
P → D
D → D;D | id: T proc id; D;S
The production for non-terminals S for statements and T for types are not shown because
we focus on declarations. The non-terminal T has synthesized attributes type and width, as
in the translation scheme of Fig. For simplicity, suppose that there is a separate symbol table
for each procedure in the language.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
61
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 61
The semantic rules are defined in terms of the following operations:
1. mktable (previous) creates a new symbol table and returns a pointer to the new table. The
argument previous points to a previously created symbol table, presumably that for the
enclosing procedure. The pointer previous is placed in a header for the new symbol table,
along with additional information such as the nesting depth of a procedure. We can also
number the procedures in the order they are declared and keep this number in the header.
2. enter (table, name, type, offset) creates a new entry for name name in the symbol table
pointed to by table. Again, enter places type and relative address offset in fields within the
entry.
3. addwidth (table, width) records the cumulative width of all the entries table in the header
associated with this symbol table.
4. enterproc (table, name, newtable) creates a new entry for procedure name in the symbol
table pointed to by table. The argument newtable points to the symbol table for this
procedure name.
The translation scheme in Fig. S. l3 shows how data can be laid out in one pass, using a
stack tblptr to hold pointers to symbol tables of the enclosing procedures. With the symbol
tables to tblptr will contain pointers to the tables for -ort, quicksort, and partition when the
declarations in partition are considered. The pointer to the current symbol table is on top.
The other stack offset is the natural generalization to nested procedures of attribute offset.
The top element of offset is the next available relative address for a local of the current
procedure. All semantic actions in the sub-trees for B and C in
A B C {actionA}
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
62
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 62
are done before actionA the end of the production occurs. Hence, the action associated with
the marker M in Fig. 8.l3 is the first to be done. The action for non-terminal M initializes
stack tblptr with a symbol table for the outermost scope, created by operation mktable(nil).
The action also pushes relative address 0 onto stack offset.
The non-terminal V plays a similar role when a procedure declaration appears. Its action
uses the operation mktable(top(tblptr)) to create a new symbol table. Here the argument
top(tblptr) gives the enclosing scope of the new table. A pointer to the new table is pushed
above that for the enclosing scope. Again, 0 is pushed onto offset.
For each variable declaration id: T. an entry is created for id in the current symbol
table. This declaration leaves the stack pointer unchanged; the top of stack offset is
incremented by T.width. When the action on the right side of D proc id: N D,; S occurs. The
width of all
Declarations generated by D1 is on top of stack offset.‘, it is recorded using addwidth. and
offset are then popped, and we revert to examining the declarations in the closing procedure.
At this point, the name of the enclosed procedure is entered into the symbol table of its
enclosing procedure.
P → M D {addwidth(top(tblptr), top(offset));
Pop(tblptr); pop(offset)}
M → ε { t := mktable(nil);
Push(t,tblptr); push(0,offset)}
D → D1 ;D2
D → proc id ; N D1 ;S { t := top(tblptr);
addwidth(t.top(offset));
pop(tblptr); pop(offset);
enterproc(top(tblptr), id.name, t)}
D → id : T {enter(top(tblptr),id.name,T.type,top(offset));
top(offset) := top(offset) +T.width }
N → ε { t := mktable(top(tblptr));
Push(t, tblptr); push(0, offset)}
Field Names in Records
The following production allows non-terminal T to generate records in addition to basic
types, pointers, and arrays:
T → record D end
The actions in the translation scheme of Fig. S.I4 emphasize the similarity between the
layout of records as a language construct and activation records. Since procedure definitions
do not affect the width computations in Fig. 8.13, we overlook the fact that the above
production also allows procedure definitions to appear within records.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
63
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 63
T → record L D end {T.type := record(top(tblptr));
T.width := top(offset);
Pop(tblptr); pop(offset) }
L→ ε { t:= mktable(nil);
Push(t, tblptr); push (0, offset) }
EXPLAIN BRIEFLY ABOUT ASSIGNMENT STATEMENTS? (11 MARKS)
The Assignment statement mainly deals with the expression. The expression can be
of type integer, real, array and record.
In the translation of assignments into TAC, names can be looked up in the symbol
table as follows.
CFG SEMANTIC ACTION
S → id: = E { p := lookup(id.name);
if p != nil then
emit(p ′ :=′ E.place) else error }
E → E1 + E2 { E. place := newtemp;
emit(E.place ′ :=′ E1.place ′ +′ E2.place) }
E → E1 ∗ E2 { E. placer:= newtemp;
emit(E.place ′ :=′ E1.place ′ *′ E2.place) }
E → −E1 { E. place := newtemp;
emit(E.place ′ :=′ ′uminus′ E1.place) }
E → (E1) { E.place := E1.place }
E → id { p := lookup(id.name);
if p != nil then
E.place := p
else error; }
Three address statements using names for pointers to their symbol table entries.
The lexeme for the name represented by id,the attribute as id.name
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
64
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 64
The operation lookup(id.name),check if there is an entry for the occurrence of the name
in the symbol table.
If so,a pointer of the entry is returned .
Otherwise returns nil to indicate that no entry is found.
The semantic action use procedure emit or append to 3 address statements to an output
file, code attributes for non-terminals.
Sid := E
The non-terminal S represents the name modified lookup operation first checks if name
appears in the current symbol table, accessible through table pointer.
If not ,lookup uses the pointer in the header of a table to find the symbol table.
If the name cannot be found, then lookup returns nil
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
65
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 65
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
66
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 66
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
67
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 67
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
68
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 68
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
69
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 69
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
70
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 70
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
71
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 71
CONTROL-FLOW REPRESENTATION OF BOOLEAN EXPRESSIONS
We start by presenting the translation for flow-of-control statements generated
by the following grammar
S → if E then S
| if E then S1 else S2
| while E do S
In the translation, we assume that a three-address code statement can have a
symbolic label, and that the function newlabel generates such labels.
We associate with E two labels using inherited attributes
E.true, the label to which control flows if E is true.
E.false, the label to which control flows if E is false.
We associate to S the inherited attribute S.next that represents the label
attached to the first statement after the code for S.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
72
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 72
PRODUCTION SEMANTIC RULES
S → if E then S1
E.true := newlabel;
E.false := S.next;
S1.next := S.next;
S.code := E.code || gen(E.true ′ :′) || S1.code
S → if E then S1 else S2
E.true := newlabel;
E.false := newlabel;
S1.next := S.next;
S2.next := S.next;
S.code := E.code || gen(E.true ′ :′) || S1.code ||
gen(′goto′ S.next) ||
gen(E.false ′ :′) || S2.code
S → while E doS1
S.begin := newlabel;
E.true := newlabel;
E.false := S.next;
Explain the Boolean expressions in Intermediate generation? (11 marks)
Boolean expressions are composed, of the Boolean operators (and, or, and not)
applied to elements that are Boolean variables or relational expressions. In turn, relational
expressions are of the form E1 relop E2, where E) and E2 are arithmetic expressions.
Where relop is any of <, <=, =, !=, >, or >=.
Assume that ‗or‘ and ‗and‘ are left-associative, and that or has lowest precedence, then and, then
not.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
73
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 73
METHODS OF TRANSLATING BOOLEAN EXPRESSION
The first method is to encode true and false numerically and to evaluate a Boolean expression to an
arithmetic expression. Often 1 is used to denote true and 0 to denote false
The second method of implementing Boolean expressions is by flow of control, representing the
value of a Boolean expression by a position reached in a program.
Numerical Representation
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
74
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 74
emit places address statements into output file
nextstat that gives the index of the next three-address code statement and is
incremented by emit.
We use the attribute op to determine which of the comparison operators is
represented by relop.
SHORT CIRCUIT CODE
In this TAC we have used goto to jump on some specific statement
This style of evaluation is called Short-circuit or Jumping code
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
75
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 75
MIXED MODE BOOLEAN EXPRESSION
Boolean Expressions often contain Arithmetic sub-expressions—e.g.
(a+b)<c.
On the other hand, if true = 1 and false = 0, then (a<b)+(b<a) can be an
arithmetic expression with value 0 if a=b and 1 otherwise.
Consider the following Grammar:
E → E+E | E and E | E relop E | id
E+E, produces an arithmetic result, and the arguments can be mixed;
E and E, produces a Boolean result, and both arguments must be Boolean;
E relop E, produces a Boolean result, and the arguments can be mixed;
id is assumed of type arithmetic
To generate code we use a synthesized attribute E.type, that will be either
arith or bool.
Boolean Expressions will have inherited attributes E.true and E.false useful for the
jumping code.
Arithmetic Expressions will have the synthesized attribute E.place standing for the
(temporary) variable holding the value of E.
The global variable nextstat gives the index of the next three-address code
statement and is incremented by gen.
The semantic rule for E → E1+E2
E.type := arith;
If E1.type := arith and E2.type := arith then begin
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
76
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 76
E.place := newtemp;
E.code := E1.code || E2.code ||
gen(E.place′ :=′ E1.place ′ +′ E2.place)
end
esle if E1.type := arith and E2.type := bool then begin
E.place:= newtemp;
E2.true := newlabel;
E2.false := newlabel;
E.code := E1.code || E2.code ||
gen(E2.true′ :′ E.place ′ :=′ E1.place + 1) || gen(′goto′ nextstat + 1) ||
gen(E2.false′ :′ E.place ′ :=′ E1.place)
Write short notes on Case statements? (6 marks)
Consider the following switch statement.
Switch E
begin
case V1: S1
case V2 : S2
………..
case Vn-1 : Sn-1
default : Sn
end
The translation of switch code is
Evaluate the expression.
Find which value in the list of cases is same as the value of expression.
Execute the statement associated with the value found.
If any value is not matched, then default value displays the statement.
COMPILER PROCESS To implement a conditional goto is to create a table of pairs
It consists of a value.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
77
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 77
Transfer of code for the corresponding statement.
A compiler to compare the value of expression with each value in the table
If no other match is found, the last (default) entry is sure to match.
SYNTAX DIRECTED TRANSLATION OF CASE STATEMENTS To translate in the form the keyword switch generate 2 labels
Test and next
A new temporary variable t.
The expression E, generate code to evaluate E into t.
After processing E, generate the jump goto test.
To translate in the form the keyword case
Create a new label Li and enter into the symbol table.
A pointer to symbol table entry and value Vi is case constant.
Statement case Vi : Si, creates label Li, followed by code for Si, followed by jump goto
next.
end terminate the body of switch statement.
CASE STATEMENT TAC
Switch E
begin
case V1: S1
case V2: S2
...
case Vn-1: Sn-1
default: Sn
end
Code to evaluate E into t
goto test
L1:code for S1
goto next
L2: code for S2
goto next
...
Ln-1: code for Sn-1
goto next
Ln: code for Sn
goto next
test: if t = V1 goto L1
if t = V1 goto L1
……….
if t = Vn-1 goto Ln-1
goto Ln
next:
Explain briefly about back patching (11 marks)
To implementing syntax-directed definitions, compute the translations given in the
definition.
To generating three address codes in a single pass for Boolean expressions and flow of
control statements is that we may not know the labels that control must go to at the time
jump statements are generated.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
78
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 78
Each such statement will be put on a list of goto statements whose labels will be filled in
when the proper label can be determined.
This subsequent filling of addresses for the determined labels is called Backpatching.
Backpatching can be used to generate code for boolean expression and flow of control
statements in one pass.
Backpatching ,we generate quadruples into a quadruple array and Labels are indices to
this array.
To manipulate list if labels ,we use three functions:
makelist(i) -- creates a new list containing only i, an index into the array of quadruples
and returns pointer to the list it has made.
merge(i,j) – concatenates the lists pointed to by i and j ,and returns a pointer to the
concatenated list.
backpatch(p,i) – inserts i as the target label for each of the statements on the list pointed
to by p.
Translation scheme for Boolean expression. The grammar be:
E → E1 or M E2
E → E1 and M E2
E → not E1
E → (E1)
E → id1 relop id2
E → false
E → true
M → ε
Two synthesized attributes truelist and falselist of non-terminal E are used to generate
jumping code for Boolean expressions.
E.truelist : Contains the list of all the jump statements left incomplete to be filled by the
label for the start of the code for E=true.
E.falselist : Contains the list of all the jump statements left incomplete to be filled by the
label for the start of the code for E=false.
The variable nextquad holds the index of the next quadruple to follow.
M.quad represents records the number of first statement(index).
The semantic actions as
E → E1 or M E2
{backpatch(E1.falselist, M.quad)
E.truelist = merge(E1.truelist, E2.truelist);
E.falselist = E2.falselist);}
E → E1 and M E2
{ backpatch(E1.truelist, M.quad);
E.truelist = E2.truelist);
E.falselist = merge(E1.falselist, E2.falselist);}
E → not E1
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
79
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 79
{ E.truelist = E1.falselist;
E.falselist = E1.truelist; }
E → (E1)
{ E.truelist = E1.truelist;
E.falselist = E1.falselist;}
E → id1 relop id2
{ E.truelist = makelist(nextquad);
E.falselist = makelist(nextquad +1 );
emit(if id1.place relop id2.place goto __ );
emit(goto ___);}
E → true
{ E.truelist = makelist(nextquad);
emit(goto ___);}
E → false
{ E.falselist = makelist(nextquad);
emit(goto ___);}
M → ε
{ M.Quad = nextquad;}
When a compiler encounters a statement like goto L, it looks for the label L in the symbol table.
If the jump is backward, that is, L has already been encountered, and then the symbol table will
have an entry giving the number of the first quadruple generated for the statement labeled L.
Generate a goto three-address statement with that quadruple number as target.
If the jump is forward, this may be the first occurrence of the label L, and if
so enter L into the symbol table. In any case, if the statement labeled L has not yet been
encountered, then generate a goto quadruple with unspecified target and add the quadruple
generated to a list of quadruples whose target is L. A pointer to this list appears in the symbol table
entry for L.
The syntax of labeled statements with productions such as
S LABEL: S
LABEL id
The semantic action associated with LABEL id is to
1. Install that identifier in the symbol table if it is not already there,
2. record that the quadruple referred to by this label is the current value of NEXTQUAD, and
finally
3. Back patch the list of goto's whose targets are the label just discovered.
Structured Flow-of-Control Constructs
A more complex example-of-flow of control concerns nested, or structured, control statements.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
80
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 80
.
Not only do Boolean expressions need two lists of jumps that occur when the expression is true and
when it is false, but statements also need lists of jumps (NEXT lists) to the code that follows them
in the execution sequence.
Scheme to Implement the Translation
The non terminal E has the two translation fields E. TRUE and E. FALSE. L and S each
need a list of unfilled quadruples which must eventually be completed by back patching.
These lists are pointed to by the translation fields L.NEXT and S. NEXT. S.NEXT is a pointer to a
list of all conditional and unconditional jumps to the quadruple following the statement S in
execution order, and L.NEXT is defined similarly
Each M has a translation M. QUAD, which is the number of the first quadruple following.
E.TRUE is back patched to go to the beginning of S(I) by making jumps on the ETRUE list go to
M(2) .QUAD
A more compelling argument for using S. NEXT and L. NEXT comes, generating code for the
conditional statement if E then S(I) else S(2). When we finish executing S(1), we have no idea
where to go next, since it may well be to the quadruple following S(2), whose index is not known
until we finish generating code for S(2). The use of the marker non terminal M solves this problem
as well
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
81
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 81
WRITE SHORT NOTES ON PROCEDURE CALLS? (5 MARKS)
Simple procedure call statement
S call id (elist)
elist elist, E
elist E
Translation includes
Calling sequence actions taken on entry to and exit from each procedure.
Arguments are evaluated and put in a known places(return address) location to which the
called routine must transfer after it is finished.
Static allocation return address is placed after code sequence itself.
Parameters passed by reference.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
82
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 82
3 address code generates statements needed to evaluate those arguments that are simple
names then the list.
For separate evaluation:
Save E.place for each expression E in id(E,E,E,E,..)
Data structure used is queue.
Semantics:
1. S call id (elist)
{for each item p on queue do
Gen (param p);
Gen (call id.place);}
2. elist elist, E
{ append E.place to end of queue}
3. elist E
{initialize queue to contain only E.place}
Queue is emptied & single pointer is given to symbol table denoting value of E.
TWO MARKS
1. What are parsers?
Parser
• Accepts string of tokens from lexical analyzer (usually one token at a time)
• Verifies whether or not string can be generated by grammar
• Reports syntax errors (recovers if possible)
THE ROLE OF A PARSER
Parser obtains a string of tokens from the lexical analyzer and verifies that it can be generated
by the language for the source program. The parser should report any syntax errors in an
intelligible fashion.
2. What are the two types of Parser?
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
83
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 83
The two types of parsers employed are:
1. Top down parser: which build parse trees from top (root) to bottom (leaves)
2. Bottom up parser: which build parse trees from leaves and work up the root.
Therefore there are two types of parsing methods– top-down parsing and bottom-up parsing.
3. Mention the basic issues in parsing.
There are two important issues in parsing.
Specification of syntax
Representation of input after parsing.
4. Why lexical and syntax analyzers are separated out?
Reasons for separating the analysis phase into lexical and syntax analyzers:
Simpler design.
Compiler efficiency is improved.
Compiler portability is enhanced.
5. Define a context free grammar.
A context free grammar G is a collection of the following
V is a set of non terminals
T is a set of terminals
S is a start symbol
P is a set of production rules
G can be represented as G = (V, T, S, P)
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
84
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 84
Production rules are given in the following form
Non terminal → (V U T)*
6. Briefly explain the concept of derivation.
Derivation from S means generation of string w from S. For constructing derivation two things
are important.
i) Choice of non terminal from several others.
ii) Choice of rule from production rules for corresponding non terminal.
Instead of choosing the arbitrary non terminal one can choose
i) Either leftmost derivation – leftmost non terminal in a sentinel form
ii) Or rightmost derivation – rightmost non terminal in a sentinel form
7. Define ambiguous grammar.
A grammar G is said to be ambiguous if it generates more than one parse tree for some sentence
of language L (G).
i.e. both leftmost and rightmost derivations are same for the given sentence..
8. List the properties of LR parser.
1. LR parsers can be constructed to recognize most of the programming languages for which the
context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of class of grammars that
can be parsed using predictive parsers.
3. LR parsers work using non backtracking shift reduce technique yet it is efficient one.
9. Mention the types of LR parser.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
85
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 85
SLR parser- simple LR parser
LALR parser- look ahead LR parser
Canonical LR parser
10. What are the problems with top down parsing?
The following are the problems associated with top down parsing:
Backtracking
Left recursion
Left factoring
Ambiguity
11. Write the algorithm for FIRST and FOLLOW.
FIRST
1. If X is terminal, and then FIRST(X) IS {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is non terminal and X → Y1,Y2..Yk is a production, then place a in FIRST(X) if for
some i , a is in FIRST(Yi) , and ε is in all of FIRST(Y1),…FIRST(Yi-1);
FOLLOW
1. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.
2. If there is a production A → αBβ, then everything in FIRST (β) except for ε is placed in
FOLLOW (B).
3. If there is a production A → αB, or a production A→ αBβ where FIRST (β) contains ε, then
everything in FOLLOW (A) is in FOLLOW (B).
12. List the advantages and disadvantages of operator precedence parsing.
Advantages
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
86
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 86
This type of parsing is simple to implement.
Disadvantages
1. The operator like minus has two different precedence (unary and binary).Hence it is hard to
handle tokens like minus sign.
2. This kind of parsing is applicable to only small class of grammars.
13. What is dangling else problem?
Ambiguity can be eliminated by means of dangling-else grammar which is show below:
stmt → if expr then stmt
| if expr then stmt else stmt
| other
14. Write short notes on YACC.
YACC is an automatic tool for generating the parser program.
YACC stands for yet another Compiler- Compiler which is basically the utility available from
UNIX.
Basically YACC is LALR parser generator.
It can report conflict or ambiguities in the form of error messages.
15. What is meant by handle pruning?
A rightmost derivation in reverse can be obtained by handle pruning.
If w is a sentence of the grammar at hand, then w = γn, where γn is the nth right-sentential form
of some as yet unknown rightmost derivation
S = γ0 => γ1…=> γn-1 => γn = w
16. Define LR (0) items.
An LR (0) item of a grammar G is a production of G with a dot at some position of the right
side. Thus, production A → XYZ yields the four items
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
87
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 87
A→.XYZ
A→X.YZ
A→XY.Z
A→XYZ.
17. What is meant by viable prefixes?
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser
are called viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a
right sentential form that does not continue past the right end of the rightmost handle of that
sentential form.
18. Define handle.
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the nonterminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
A handle of a right – sentential form γ is a production A→β and a position of γ where the string
β may be found and replaced by A to produce the previous right-sentential form in a rightmost
derivation of γ. That is , if S =>αAw =>αβw,then A→β in the position following α is a handle
of αβw.
19. What are kernel & non-kernel items?
Kernel items, which include the initial item, S'→ .S, and all items whose dots are not at the left
end.
Non-kernel items, which have their dots at the left end.
20. What is phrase level error recovery?
Phrase level error recovery is implemented by filling in the blank entries in the predictive
parsing table with pointers to error routines. These routines may change, insert, or delete
symbols on the input and issue appropriate error messages. They may also pop from the stack.
21. What are different kinds of errors encountered during compilation?
Compiler Errors
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
88
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 88
• Lexical errors (e.g. misspelled word)
• Syntax errors (e.g. unbalanced parentheses, missing semicolon)
• Semantic errors (e.g. type errors)
• Logical errors (e.g. infinite recursion)
Error Handling
• Report errors clearly and accurately
• Recover quickly if possible
• Poor error recover may lead to avalanche of errors
22. What are different error recovery strategies?
Error Recovery strategies
• Panic mode: discard tokens one at a time until a synchronizing token is found
• Phrase-level recovery: Perform local correction that allows parsing to continue
• Error Productions: Augment grammar to handle predicted, common errors
• Global Production: Use a complex algorithm to compute least-cost sequence of changes
leading to parseable code
23. Explain Recursive descent parsing.
Recursive descent parsing: corresponds to finding a leftmost derivation for an input string
Equivalent to constructing parse tree in pre-order
Example:
Grammar: S ! cAd A ! ab j a
Input: cad
Problems:
1. Backtracking involved () buffering of tokens required)
2. left recursion will lead to infinite looping
3. Left factors may cause several backtracking steps
24. Give an example of ambiguous grammar.
Ambiguous grammar:
E ::= E ‖_‖ E | E ‖+‖ E | ‖1‖ | ‖(‖ E ‖)‖
Unambiguous grammar
E ::= E ‖+‖ T | T
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
89
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 89
T ::= T ‖_‖ F | F
F ::= ‖1‖ | ‖(‖ E ‖)‖
25. What is left recursion? How it is eliminated?
26. What is left factoring?
Left Factoring
• Rewriting productions to delay decisions
• Helpful for predictive parsing
• Not guaranteed to remove ambiguity
A αβ1 | αβ2
A αA‘
A‘ β1 | β2
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
90
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 90
27. What is top down parsing?
Top down Parsing
• Can be viewed two ways:
– Attempt to find leftmost derivation for input string
– Attempt to create parse tree, starting from at root, creating nodes in preorder
• General form is recursive descent parsing
– May require backtracking
– Backtracking parsers not used frequently because not needed
28. What is predictive parsing?
• A special case of recursive-descent parsing that does not require backtracking
• Must always know which production to use based on current input symbol
• Can often create appropriate grammar:
– removing left-recursion
– left factoring the resulting grammar
29. Define LL (1) grammar.
LL (1) Grammars
• Algorithm covered in class can be applied to any grammar to produce a parsing table
• If parsing table has no multiply-defined entries, grammar is said to be ―LL(1)‖
– First ―L‖, left-to-right scanning of input
– Second ―L‖, produces leftmost derivation
– ―1‖ refers to the number of lookahead symbols needed to make decisions
30. List the three kinds of intermediate representation.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
91
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 91
The three kinds of intermediate representations are
i. Syntax trees
ii. Postfix notation
iii. Three address code
31. How can you generate three-address code?
The three-address code is generated using semantic rules that are similar to those
for constructing syntax trees for generating postfix notation.
32. What is a syntax tree? Draw the syntax tree for the assignment statement
a := b * -c + b * -c.
A syntax tree depicts the natural hierarchical structure of a source program.
Syntax tree:
assign
a +
* *
b uminus b uminus
c c
33. Define three-address code.
Three-address code is a sequence of statements of the general form
x := y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands
for any operator, such as fixed or floating-point arithmetic operator, or a logical
operator on boolean-valued data.
Three-address code is a linearized representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph.
34. Construct three address codes for the following
Position: = initial + rate * 60
temp1:= inttoreal (60)
temp2:= id3 * temp1
temp3:= id2 + temp2
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
92
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 92
id1:= temp3
35. What are triples?
The fields arg1,and arg2 for the arguments of op, are either pointers to the
symbol table or pointers into the triple structure then the three fields used in the
intermediate code format are called triples.
In other words the intermediate code format is known as triples.
36. Draw the DAG for a: = b * -c + b * -c
assign
a +
*
b uminus
c
37. List the types of three address statements.
The types of three address statements are
a. Assignment statements
b. Assignment Instructions
c. Copy statements
d. Unconditional Jumps
e. Conditional jumps
f. Indexed assignments
g. Address and pointer assignments
h. Procedure calls and return
38. What are the various methods of implementing three-address statements?
i. Quadruples
ii. Triples
iii. Indirect triples
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
93
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 93
39. What is meant by declaration?
The process of declaring keywords, procedures, functions, variables, and statements with
proper syntax is called declaration.
40. How semantic rules are defined?
The semantic rules are defined by the following ways
a. mktable(previous)
b. enter(table,name,type,offset)
c. addwidth(table, width)
d. enterproc(table,name,newtable)
41. What are the two primary purposes of Boolean Expressions?
They are used to compute logical values
o They are used as conditional expressions in statements that alter the flow of
control, such as if-then, if-then-else, or while-do statements.
42. Define Boolean Expression.
Expressions which are composed of the Boolean operators (and, or, and not) applied to
elements that are Boolean variables or relational expressions are known as Boolean
expressions
43. What are the two methods to represent the value of a Boolean expression?
i. The first method is to encode true and false numerically and to evaluate a
Boolean expression analogously to an arithmetic expression.
ii. The second principal method of implementing Boolean expression is by flow
of control that is representing the value of a Boolean expression by a position
reached in a program.
44. What do you mean by viable prefixes?
Viable prefixes are the set of prefixes of right sentinels forms that can appear on the stack
of shift/reduce parser are called viable prefixes. It is always possible to add terminal
symbols to the end of the viable prefix to obtain a right sentential form.
45. What is meant by Shot-Circuit or jumping code?
We can also translate a Boolean expression into three-address code without generating
code for any of the Boolean operators and without having the code necessarily evaluate the
entire expression. This style of evaluation is sometimes called ―short-circuit‖ or ―jumping‖
code.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
94
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 94
46. What is the intermediate code representation for the expression a or b and not c?
(Or) Translate a or b and not c into three address code.
Three-address sequence is
t1 := not c
t2 := b and t1
t3 := a or t2
47. Explain the following functions:
i) makelist(i) ii) merge(p1,p2) iii) backpatch(p,i)
i. makelist (i) creates a new list containing only I, an index into the array
of quadruples; makelist returns a pointer to the list it has made.
ii. merge(p1,p2) concatenates the lists pointed to by p1 and p2 , and returns a
pointer to the concatenated list.
iii. backpatch(p,i) inserts i as the target label for each of the statements on the list
pointed to by p.
48. Define back patching.
Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process.
49. What is handle pruning?
• Repeat the following process, starting from string of tokens until obtain start symbol:
– Locate handle in current right-sentential form
– Replace handle with left side of appropriate production
• Two problems that need to be solved:
– How to locate handle
– How to choose appropriate production
50. What are LR parsers?
LR Parsers
• LR Parsers us an efficient, bottom-up parsing technique useful for a large class of
CFGs
• Too difficult to construct by hand, but automatic generators to create them exist
(e.g. Yacc)
• LR(k) grammars
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
95
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 95
– ―L‖ refers to left-to-right scanning of input
– ―R‖ refers to rightmost derivation (produced in reverse order)
– ―k‖ refers to the number of lookahead symbols needed for decisions (if
omitted, assumed to be 1)
–
51. What are the benefits of LR parsers?
Benefits of LR Parsing
• Can be constructed to recognize virtually all programming language construct for
which a CFG can be written
• Most general non-backtracking shift-reduce parsing method known
• Can be implemented efficiently
• Handles a class of grammars that is a superset of those handled by predictive
parsing
• Can detect syntactic errors as soon as possible with a left-to-right scan of input
52. What are three types of LR parsers?
Three methods:
a. SLR (simple LR)
i. Not all that simple (but simpler than other two)!
ii. Weakest of three methods, easiest to implement
b. Constructing canonical LR parsing tables
i. Most general of methods
ii. Constructed tables can be quite large
c. LALR parsing table (lookahead LR)
i. Tables smaller than canonical LR
ii. Most programming language constructs can be handled
53. What are the benefits of intermediate code generation?
A Compiler for different machines can be created by attaching different back end
to the existing front ends of each machine.
A Compiler for different source languages can be created by proving different
front ends for corresponding source languages t existing back end.
A machine independent code optimizer can be applied to intermediate code in
order to optimize the code generation.
54. Mention the functions that are used in back patching.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
96
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 96
makelist(i) creates the new list. The index i is passed as an argument to this
function where I is an index to the array of quadruple.
merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It
returns the pointer to the concatenated list.
backpatch(p,i) inserts i as target label for the statement pointed by pointer p.
55. What is the intermediate code representation for the expression a or b and not c?
The intermediate code representation for the expression a or b and not c is the three
address sequence
t1 := not c
t2 := b and t1
t3:= a or t2
56. What are the various methods of implementing three address statements?
The three address statements can be implemented using the following methods.
Quadruple: a structure with almost four fields such as
operator(OP),arg1,arg2,result.
Triples: the use of temporary variables is avoided by referring the pointers in
the symbol table.
Indirect triples: the listing of triples has been done and listing pointers are
used instead of using statements.
BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS
97
RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE Page 97