subject: language translators › notes › cse › iii year › language translators › unit...

BRANCH: CSE Y3/S5 SUBJECT: LANGUAGE TRANSLATORS

1

RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY/ DEPT. OF CSE

UNIT - 4

1. What is the use of context free grammar? (Nov 2011)

It is useful for describing arithmetic expressions with arbitrary nesting of balanced

parenthesis.

It is also useful for describing block structure in programming languages.

2. Define ambiguous. (May 2012)

A grammar G is said to be ambiguous if it generates more than one parse tree for some

sentence of language L (G). i.e. both leftmost and rightmost derivations are same for the given

sentence.

3. Draw the Dag for assignment statement: a:=b*-c + b*-c. (Nov 2011)

a:=b*-c + b*-c.

4. What is parsing Tree? (May 2012)

A concrete syntax tree or parse tree or parsing tree[1]

is an ordered, rooted tree that

represents the syntactic structure of a string according to some context-free grammar.

Parse trees are usually constructed according to one of two competing relations, either in

terms of the constituency relation of constituency grammars (= phrase structure

Parsing: Role of Parser – Context free Grammars – Writing a Grammar – Predictive Parser – LR

Parser.

Intermediate Code Generation: Intermediate Languages – Declarations – Assignment

Statements – Boolean Expressions – Case Statements – Back Patching – Procedure Calls.

http://en.wikipedia.org/wiki/Parse_tree#cite_note-1

http://en.wikipedia.org/wiki/Tree_(data_structure)

http://en.wikipedia.org/wiki/Syntax

http://en.wikipedia.org/wiki/String_(computer_science)

http://en.wikipedia.org/wiki/Context-free_grammar

http://en.wikipedia.org/wiki/Phrase_structure_grammar

http://en.wikipedia.org/wiki/Phrase_structure_grammar


2


grammars) or in terms of the dependency relation of dependency grammars. Parse trees

are distinct from abstract syntax trees (also known simply as syntax trees), in that their

structure and elements more concretely reflect the syntax of the input language.

5. Define Three-Address Code. (Nov 2012)

Three-address code is a sequence of statements of the general form

x := y op z

where x, y and z are names, constants, or compiler-generated temporaries; op stands

for any operator, such as fixed or floating-point arithmetic operator, or a logical

operator on boolean-valued data.

Three-address code is a linearized representation of a syntax tree or a dag in which

explicit names correspond to the interior nodes of the graph.

The three-address code is generated using semantic rules that are similar to those

for constructing syntax trees for generating postfix notation.

6. Differentiate phase and pass. (Nov2012)

In an implementation of a compiler, portions of one or more phases are combined into a

module called a pass.

A pass reads the source program or the output of the previous pass,

makes the transformations specified by its phases, and writes output into an intermediate

file, which may then be read by a subsequent pass.

If several phases are grouped into one pass, then the operation of the phases may be

interleaved, with control alternating among several phases.

7. State the function of a intermediate code generator? (April 2013)

1. A Compiler for different machines can be created by attaching different back end

to the existing front ends of each machine.

2. A Compiler for different source languages can be created by proving different

front ends for corresponding source languages t existing back end.

3. A machine independent code optimizer can be applied to intermediate code in

order to optimize the code generation.

8. What is basic block? (April 2013)

A basic block is a sequence of consecutive statement in which flow of control enters at

the beginning leaves at the end with out halt or possibility of branching except at the end.

9. List the types of parser for grammars? (May 2014)

LR Parser (L: left to right scanning of the input and R: constructing right most derivation

in reverse)

SLR: Simple LR

Canonical LR

http://en.wikipedia.org/wiki/Dependency_grammar

http://en.wikipedia.org/wiki/Abstract_syntax_tree


3


LALR: Lookahead LR

10. Define machine word. (May 2014)

In computing, word is a term for the natural unit of data used by a

particular processor design. A word is basically a fixed-sized group of digits (binary or

decimal) that are handled as a unit by the instruction set or the hardware of the processor.

The number of digits in a word (the word size, word width, or word length) is an

important characteristic of any specific processor design or computer architecture.

EXPLAIN THE ROLE OF A PARSER? (5 MARKS)

• Accepts string of tokens from lexical analyzer (usually one token at a time)

• Verifies whether or not string can be generated by grammar

• Reports syntax errors (recovers if possible)

Parser obtains a string of tokens from the lexical analyzer and verifies that it can be generated

by the language for the source program. The parser should report any syntax errors in an

intelligible fashion.

The two types of parsers employed are: 1. Top down parser: which build parse trees from top (root) to bottom (leaves) 2. Bottom up parser: which build parse trees from leaves and work up the root. Therefore there are two types of parsing methods– top-down parsing and bottom-up parsing.

http://en.wikipedia.org/wiki/Computing

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Numerical_digit

http://en.wikipedia.org/wiki/Instruction_set

http://en.wikipedia.org/wiki/Computer_architecture

http://www.mec.ac.in/resources/notes/notes/compiler/module2/module2%5Ctdp.htm

http://www.mec.ac.in/resources/notes/notes/compiler/module2/module2%5Cbup.htm


4



5



6


WHAT IS MEANT BY CONTEXT FREE GRAMMAR? EXPLAIN IT (11 MARKS)

Many programming language constructs have an inherently recursive structure that can be

defined by context-free grammars.

A context-free grammar (grammar for short) consists of terminals; nonterminals, a start symbol,

and productions.

1. Terminals are the basic symbols from which strings are formed. The word "token" is a synonym

for "terminal" each of the keywords if, then, and else is a terminal.

2. Nonterminals are syntactic variables that denote sets of strings. Eg: stmt and expr are non

terminals. The nonterminals define sets of strings that help define the language generated by the

grammar. 'They also impose a hierarchical structure on the language that is useful for both syntax

analysis and translation.

3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of strings it

denotes is the language defined by the grammar.

4. The productions of a grammar specify the manner in which the terminals and non terminals can

be combined to form strings. Each production consists of a nonterminals, followed by an .arrow

(sometimes the symbol: = is used in place of the arrow), followed by a string of nonterminals and

terminals.

Eg:

In this grammar, the terminal symbols are

The nonterminals symbols are expr and op, and expr is the start symbol


7


NOTATIONAL CONVENTIONS

1. These symbols are terminals:

i) Lower-case letters early in the alphabet such as a, b, c.

ii) Operator symbols such as +, -, etc.

iii) Punctuation symbols such as parentheses, comma, etc.

iv) The digits 0, I, ..., 9.

v) Boldface strings such as id or if.

2. These symbols are non terminals:

i) Upper-case letters early in the alphabet such as A. B, C.

ii) The letter S, which, when it appears, is usually the start symbol.

iii) Lower-case italic names such as expr or stmt.

3. Upper-case letters late in the alphabet, such as X,Y, Z, represent grammar symbols, that is, either

nonterminals or terminals.

4. Lower-case letters late in the alphabet, chiefly u, v, ... , z, represent strings of terminals.

5. Lower-case Greek letters represent strings of grammar symbols.

6. If A a1, Aa2, Aa3, ……Aak are all productions with, A on the left (A-productions),

write as Aa1|a2|a3|…|ak the alternatives for A.

7. Unless otherwise stated, the left side of the first production is the start symbol.

DERIVATIONS

Derivational view gives a precise description of the top-down construction of a parse tree. The

central idea is that a production is treated as a rewriting rule in which the nonterminals on the left is

replaced by the string on the right side of the production.

For example, consider the following grammar for arithmetic expressions, with the nonterminals E

representing an expression.


8


The production E - E signifies that an expression preceded by a minus sign is also an expression.

This production can be used to generate more complex expressions from simpler expressions by

allowing us to replace any instance of an E by - E.

Given a grammar G with starts symbol S + relation to define L(G), the language generated by G.

Strings in L(G) may contain only terminal symbols of G. A string of terminals w is in L (G) if and

only if S + w. The string w is called a sentence of G. A language that can be, generated by a

grammar is said to be a context-free language. If two grammars generate, the same language, the

grammars are said to be equivalent.

PARSE TREES AND DERIVATIONS

A parse tree may be viewed as a graphical representation for a derivation that filters out the choice

regarding replacement order. Each interior node of a parse tree is labeled by some nonterminals A,

and that the children of the node are labeled, from left to right, by the symbols in the right side of

the production by which this A was replaced in the derivation.

The leaves of the parse tree are labeled by nonterminals or terminals and read from left to right,

they constitute a sentential form, called the yield or frontier of the -tree.

Parse Tree for –(id+id):


9


Building the parse tree for derivation:

The sentence id + id * id has the two distinct leftmost derivations:

Two parse tree for id + id * id

Ambiguity:

A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Put

another, way, an ambiguous grammar is one that produces more than one leftmost or more than one

rightmost derivation for the same sentence


10


EXPLAIN BRIEFLY ABOUT WRITING A GRAMMAR? (6 MARKS)

An efficient non-back tracking form of top-down parser is called a predictive parser.

Recursive-Descent Parsing

Top-down parsing can be viewed as an attempt to find, a left most derivation for an input string.

Equivalently, it can be viewed as an attempt to construct a parse tree for the input starting from the

root and creating the nodes of the parse tree in-preorder.


11


Recursive descent involves backtracking, that is, making repeated scans of the input. Backtracking

is rarely needed to parse programming language constructs.

In situations like natural language parsing, backtracking is still not very efficient.

Consider the grammar

ScAd

Aab|a

An input string w=cad, steps in top-down parse are as:


12


A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go

into an infinite loop.


13


EXPLAIN BRIEFLY ABOUT PREDICTIVE PARSERS? (6 MARKS)

In many cases, by carefully writing a grammar, eliminating left recursion from it, and left factoring

the resulting grammar, we can obtain a grammar that can be parsed by a recursive-descent parser

that needs no backtracking; i.e., a predictive parser, To construct a predictive parser, we must.

Know; given the current input symbol and, the nonterminal A to be expanded, with one of the

terminals. Flow-of-control constructs in most programming languages, with their distinguishing

keywords, are usually detectable in this way.


14


For example, if we have the productions

stmt if expr then stmt else stmt

| while expr do stmt

| begin stmt_list end

then the keywords if, while, and begin tell us which alternative is the only one that could possibly

succeed to find a statement.

Transition Diagrams for Predictive Parsers

In the case of the parser,

There is one diagram for each nonterminal. The labels of edges are tokens and nonterminals. A

transition on a token (terminal) means we should take that transition if that token is the next input

symbol. A transition on a nonterminal A is a call of the procedure for A.

To construct the transition diagram of a predictive parser from a grammar, first eliminate left

recursion from the grammar, and then left factor the grammar. Then for each nonterminals A do the

following:

1. Create an initial and final (return) state.

2. For each production A X1 X2 ... Xn, create a path from the initial to

the final state, with edges labeled X1 X2, ….Xn.

The predictive parser working

It begins in the start state for the start symbol.

If after some actions it is in state s with an edge labeled by terminal a to state t, and if the next input

symbol is a, then the parser moves the input cursor one position right and goes to state t.

If, on the other hand, the edge is labeled by a nonterminal A, the parser instead goes to the start

state for A, without moving the input cursor.

If it ever reaches the final state for A, it immediately goes to state t, in effect having read A from the

input during the time it moved from state s to t.

Finally, if there is an edge from s to t labeled ε, then from state s the parser immediately goes to

state t, without advancing the input.

Transition diagrams can be simplified by substituting diagrams in one another; these substitutions

are similar to the transformations on grammars.


15


AβA‘

A‘αA‘|ε

Consider the following grammar for arithmetic expressions,

ET

E‘+TE‘|ε

TFT ‘

T‘ *FT‘| ε

F (E)| id

Simplified transition diagram

EXPLAIN BRIEFLY ABOUT NON PREDICTIVE PARSER? EXPLAIN ITS ALGORITHM

(6 MARKS)

It is possible to build a non recursive predictive parser by maintaining a stack explicitly, rather than

implicitly through recursive calls. The key problem during predictive parser is that of determining

the production to be applied for a non terminal.


16


A table-driven predictive parser has an input buffer. a stack, a parsing table; and an output stream.

The input buffer contains the string to be parsed, followed by $, a symbol used as a right end

marker to indicate the end of the input string. The Stack contains a sequence of grammar symbols

with $ on the bottom, indicating the bottom of the stack. Initially, the stack contains the start

symbol of the grammar on top of $. The parsing table is a two dimensional array M [A, a], where A

is a nonterminal, and a is a terminal or the symbol.

The program considers X, the symbol on top of the stack, and a the current input symbol. These two

symbols determine the action of the parser. There are three possibilities.

1. If X = a = $, the parser halts and announces successful completion of parsing.

2. If X = a * $, the parser pops X off the stack and advances the input pointer to the next input

symbol.

3. If X is a nonterminal, the program consults entry M IX, a] of the parsing table M. This entry

will be either an X-production of the grammar or an error entry. For example M[X. a] = {X

UVW, the parser replaces X on top of the stack by WVU ( U on top).

If M[X, a] = error, the parser calls an error recovery routine.

The behavior of the parser can be described in terms of its configurations, which give the stack

contents and the remaining input.

ALGORITHM

Non recursive predictive parsing.


17


Input: A string w and a parsing table M for grammar G.

Output: If w is in L(G), a leftmost derivation of w; otherwise an error indication.

Method: Initially, the parser is in a configuration in which it has $S on the stack with S, the start

symbol of G on top; and w$ in the input buffer.

FIRST and FOLLOW

The construction of a predictive parser is aided by two functions associated with a grammar G.

These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing table

for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be used as

synchronizing tokens during panic-mode error recovery.

If α is any string of grammar symbols, let F1RST (α) be the set of terminals that begin the strings

derived from α. If a ε, then ε is also in FIRST (α).

Define FOLLOW (A), for nonterminal A, to be the set of terminals a, that can appear immediately

to the right of A in some sentential form, that is; the set of terminals a such that there exists a

derivation of the form S * α A a β for some α and β. If A can be the rightmost symbol in some

sentential form, then $ is in FOLLOW (A).


18


To compute FIRST (X) for all grammar symbols X, apply the following rules until no more

terminals or ε be added to any F1RST set.

I. If X is terminal, and then FIRST(X) is {X}.

2. If X ε is a production, then add ε to FIRST(X).

3. If X is non-terminal. And XY I Y2……Yk is a production, then place a in FIRST(X) if for

some i, a is in FIRST (Yi), and ε is in all of FIRST (Y I)… FIRST (Yi-1); that is, Y 1…. Yi-1* ε.

If ε is in FIRST (Yj) for. All j = 1, 2... k, then add ε to FIRST (X).

Now, compute FIRST for any string X1 X2 … X. as follows,

Add to FIRST (X1 X2 ….Xn) all the non ε symbols of FIRST (X1).

Also add the non ε symbols of FIRST(X 2) if ε is in FIRST (X1)' the non ε symbols of FIRST(X3)

if ε is in both FIRST(X1) and FIRST(X2), and so on.

Finally, add ε to FIRST(X IX2 ... Xn) if, for all i, FIRST(X;) contains ε.

FOLLOW

To compute FOLLOW (A) for, all nonterminals A, apply the following rules until nothing can be

added to any FOLLOW set.

I. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.

2. If there is a production A αBβ, then everything in FIRST (β) except for ε is placed in

FOLLOW(B).

3. If there is a production A αB, or a production A αBβ where FIRST (β) contains ε(i.e.,

β*ε), then everything in FOLLOW(A)is in FOLLOW(B).

E TE‘

E‘ TE‘/ ε

T FT‘

T‘ *FT‘/ ε

F (E)/ id


19


Construction of a predictive parsing table.

Input. Grammar G.

Output. Parsing table M.

Method.

1. For each production A a of the grammar, do steps 2 and 3.

2. For each terminal a in FIRST (a), add A a to M[A, a].

3. If E is in FIRST (a), add A-a to M [A,b] for each terminal b in FOLLOW(A). If E is in FIRST

(a) and $ is in FOLLOW (A), add A a to M [A, $].

4. Make each undefined entry of M be error


20


WRITE AN ALGORITHM FOR CONSTRUCTING LR PARSER TABLE.***


21



22



23



24



25



26



27



28



29



30



31



32



33



34



35



36


. Explain briefly about CLR parser algorithm? (11 marks)

Constructing Canonical LR Parsing Tables:

In the SLR method, state i calls for reduction by Aa if the set of items Ii contains item [A α]

and a is in FOLLOW (A). In some situations, however, when state i appears on top of the stack, the

viable prefix αβ on the stack is such that βA cannot be followed by a in a right-sentential form.

Thus, the reduction by A a would be invalid on input a.


37


Construction of the sets of LR(1) items.

Input. An augmented grammar G'.

Output. The sets of LR( I) items that are the set of items valid for one or more viable prefixes of G' .

Method. The procedures closure and goto and the main routine items for constructing the sets of

items are computed.


38


Construction of the canonical LR parsing table


Output. The canonical LR parsing table functions action and goto for G'.

Method.

1. Construct C = {Io, I …. ,In}, the collection of sets of LR(l) items for G'.

2. State i of the parser is constructed from Ii. The parsing actions for state i are determined as

follows:

a) If. [A αaβ b] is in Ii and goto(Ii, a) = Ij, then set action[i, a] to "shift j." Here, a is required to

be a terminal. .

b) If [A α, a] is in Ii, A != S', then set action[i, a] to reduce A α.

c) If [S'S, $] is in Ii, then set action Ii, $1 to accept If a conflict results from the above rules, the

grammar is said not to be LR(l), and the algorithm is said to fail.

3. The goto transitions for state i are determined as" follows: If goto(Ii, A) =Ij, then goto[i, A] = j.

4. All entries not defined by rules (2) and (3) are made "error."

5. The initial state of the parser is the one constructed from the set containing item[S''S, $].

The table formed from the parsing action and goto function's produced by algorithm is called the

canonical LR (l) parsing table. An LR parser using this table is called a canonical LR (l) parser. If

the parsing action function has no multiply-defined entries, then the given grammar is called an LR

(J) grammar.


39



40


Constructing LALR Parsing Tables

This method is often used in practice because the tables obtained by it are considerably smaller than

the canonical LR tables, yet most common syntactic constructs of programming languages can be

expressed conveniently by an LALR grammar. The same is almost true for SLR grammars. But

there are a few constructs that cannot be conveniently handled by SLR techniques.

For a comparison of parser size, the SLR and LALR tables for a grammar always have the same

number of states, and this number is typically several hundred states for a language like Pascal. The

canonical LR table would typically have several thousand states for the same size language. Thus it

is much easier and more economical to construct SLR and LALR tables than the canonical LR

tables.

Consider the grammar

Whose sets of LR(I) items were shown in the goto graph.

Take a pair of similar looking states such as I4 and I7. Each of these states has only items with first

component C d. In I4, the lookaheads are c or d; in I7, $ is the only lookahead.


41


To see the difference between the roles of I4 and I7 ,in the parser, note that grammar generates the

regular set c*dc*d. When reading an input cc ... cdcc ... cd, the parser shifts the first group of c‘s

and their following d onto the stack, entering state 4 after reading the d. The parser then calls for a

reduction by C d provided the next input symbol is c or d.

The requirement that c or d follow makes sense since these are the symbols that could begin strings

in c*d. If $ follows the first d, we have an input like ccd, which is not in the language, and state 4

correctly declares an error if $ is the next input.

The parser enters state 7 after reading the second d. Then, the parser must see $ on the input, or it

started with a string not of the form c*dc*d. It thus makes sense that state 7 should reduce by C d

on input $ and declare error on inputs c or d.

Let us now replace I4 and I7 by I47, the union of 14 and 17, consisting of the set of three items

represented by [C d c/d/$]. The goto's on d to I4 or 17 from Io, I2,I3 and I6 now enter I47. The

action of state 47 is to reduce on any input. The revised parser behaves essentially like the original,

although it might reduce d to C in circumstances where the original would declare error, for

example, on input like ccd or cdcdc. The error will eventually be caught in fact; it will be caught

before any more input symbols are shifted.

More generally, we can look for sets of LR (I) items having the same core, ie set of first

components and merge these sets with common cores into one set of items.

For example, I4 and I7 form such a pair, with core {Cd}.

In general, a core is a set of LR(O) items for the grammar at hand, and that an LR( 1) grammar may

produce more than two sets of items with the same core.

Since the core of goto (I , X) depends only on the core of I, the goto's of merged sets can

themselves be merged. Thus, there is no problem revising the goto function as we merge sets of

items. The action functions are modified to reflect the non-error actions of all sets of items in the

merger.


42


Suppose we have an LR (l) grammar, ie, one whose sets of LR(l) items produce no, parsing action

conflicts.

If we replace all states having the same core with their union, it is possible that the resulting union

will have a conflict, but it is unlikely for the following reason: suppose in the union there is a

conflict on lookahead ‗a‘ because there is an item

[Aa', a] calling for a reduction by A α, and there is another item [B βγ,b] calling for a shift.

Then some set, of items from which the union was formed has item [A a', a], and since the cores

of all these states are the same, it must have an item [B βγ,b] for some c.

But this state has the same shift/reduce conflict on a, and the grammar was not LR(I). Thus, the

merging of states with common cores can never produce a shift/reduce conflict that was not present

in one of the original states, because shift actions depend only on the core, not the lookahead.

10. Explain briefly about LALR Parser? (11 marks)

An easy, but space-consuming LALR table construction.

Input. An augmented grammar G';

Output... The LALR parsing table functions action and goto for G'.

Method.

1. Construct C = {Io, I1.....In}, the collection of sets of LR (I) items.

2. For each core present among the set of LR (1) items, find all sets having that core, and replace

these sets by their union.

3. Let C' = {Jo, J1.....Jm} be the resulting sets of LR(I) items. The parsing actions for state i are

constructed from Ji in the same manner as in algorithm. If there is a parsing action conflict, the

algorithm fails to produce a parser, and the grammar is said not to be LALR (1).

4. The goto table is constructed as follows. If J is the union of one or more sets of LR(I) items, ie J

= I1 U I2 U .. U Ik, then the cores of goto(Il,X), goto(I2, X), ..., goto(Ik,X) are the same, since

I1,12, ... ,Ik all have the same core. Let K be the union of all sets of items having the same core as

goto(I1,X).Then goto (J, X) =K.

The table produced by the algorithm is called the LALR parsing table for G. If there are no parsing

action conflicts, then the given grammar is said to be an LALR(1) grammar.


43


The collection of sets of items constructed in step (3) is called the LALR( I) collection.

Consider the grammar whose goto graph is taken previous:

Consider the augmented grammar

The kernels of the sets, of LR (0) items for this grammar


44


Determining lookaheads.

Input. The kernel K of a set of LR(O) items I and a grammar symbol x.

Output. The lookaheads spontaneously generated by items in I for kernel items in goto(I,X) and the

items in I from which lookaheads are propagated to kernel items in

goto (I, X).

Method. It uses, a dummy lookahead symbol # to detect situations in which lookaheads propagate'"

Efficient computation of the kernels of the LALR (I) collection.


Output. The kernels of the LALR( I) collection of sets of items· for G'.

Method.

1. Construct the kernels of the sets of LR(0) items for G.

2. Apply the algorithm to the kernel of each set of LR (0) items and grammar symbol X to

determine which lookaheads are spontaneously generated for kernel items in goto(I, X), and from

which items in 1 lookaheads are propagated to kernel items in goto (I, X).


45


3. Initialize a table that gives, for each kernel item in each set of items, the associated lookaheads.

Initially, each item has associated with it only those lookaheads that we determined in (2) were

generated spontaneously.

4. Make repeated passes over the kernel items in all sets. When we visit an item i, we look up the

kernel items to which i propagates its lookaheads, using information tabulated in (2). The current

set of lookaheads for i is added to those already associated with each of the items to which i

propagates its lookaheads. We continue making passes over the kernel items until no more new

lookaheads are propagated. 0


46



47


INTERMEDIATE CODE GENERATION

What are the types of intermediate representations? (5 marks)

Compiler generate an easy represent form of source language called intermediate

language and its leads to efficient code generation.

INTERMEDIATE

CODE

There are three types of intermediate representation:-

1. Syntax Trees

2. Postfix notation

3. Three Address Code

Semantic rules for generating three-address code from common programming language

constructs are similar to those for constructing syntax trees of for generating postfix

notation.

Graphical Representations

A syntax tree depicts the natural hierarchical structure of a source program. A DAG

(Directed Acyclic Graph) gives the same information but in a more compact way because

common sub-expressions are identified. A syntax tree for the assignment statement a:=b*-

c+b*-c appear in the figure.

The expression a:=b*-c + b*-c

PARSER STATIC

CHECKER

INTERMEDIATE CODE

GENERATOR CODE GENERATOR


48


Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the

in which a node appears immediately after its children. The postfix notation for the syntax

tree in the fig is

a b c uminus + b c uminus * + assign

The edges in a syntax tree do not appear explicitly in postfix notation. They can be

recovered in the order in which the nodes appear and the no. of operands that the operator at

a node expects. The recovery of edges is similar to the evaluation, using a staff, of an

expression in postfix notation.


49



50


Representation of syntax trees

12. What are the types of three address statements? (11 marks)

Types of Three-Address Statements

Three-address statements are akin to assembly code. Statements can have symbolic labels

and there are statements for flow of control. A symbolic label represents the index of a

three-address statement in the array holding intermediate code. Actual indices can be

substituted for the labels either by making a separate pass, or by using ‖back patching,‖

discussed in Section 8.6. Here are the common three-address statements used in the

remainder of this book:

1. Assignment statements of the form x: = y op z, where op is a binary arithmetic or

logical operation.

2. Assignment instructions of the form x:= op y, where op is a unary operation. Essential

unary operations include unary minus, logical negation, shift operators, and conversion

operators that, for example, convert a fixed-point number to a floating-point number.

3. Copy statements of the form x: = y where the value of y is assigned to x.

4. The unconditional jump goto L. The three-address statement with label L is the next to

be executed.

5. Conditional jumps such as if x relop y goto L. This instruction applies a relational

operator (<, =, >=, etc.) to x and y, and executes the statement with label L next if x stands

in relation relop to y. If not, the three-address statement following if x relop y goto L is

executed next, as in the usual sequence.

6. param x and call p, n for procedure calls and return y, where y representing a returned

value is optional. Their typical use is as the sequence of three-address statements

param x1

param x2


51


param xn

call p, n

generated as part of a call of the procedure p(x,, x~,..., x‖). The integer n indicating the

number of actual parameters in ‖call p, n‖ is not redundant because calls can be nested. The

implementation of procedure calls is outline d in Section 8.7.

7. Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of these sets x to the

value in the location i memory units beyond location y. The statement x[i]:=y sets the

contents of the location i units beyond x to the value of y. In both these instructions, x, y,

and i refer to data objects.

8. Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of

these sets the value of x to be the location of y. Presumably y is a name, perhaps a

temporary, that denotes an expression with an I-value such as A[i, j], and x is a pointer name

or temporary. That is, the r-value of x is the l-value (location) of some object!. In the

statement x: = ~y, presumably y is a pointer or a temporary whose r- value is a location. The

r-value of x is made equal to the contents of that location. Finally, +x: = y sets the r-value of

the object pointed to by x to the r-value of y.

The choice of allowable operators is an important issue in the design of an

intermediate form. The operator set must clearly be rich enough to implement the operations

in the source language. A small operator set is easier to implement on a new target machine.

However, a restricted instruction set may force the front end to generate long sequences of

statements for some source, language operations. The optimizer and code generator may

then have to work harder if good code is to be generated.

Explain the process of syntax directed translation of three address code? (6 marks)

Syntax-Directed Translation into Three-Address Code

When three-address code is generated, temporary names are made up for the interior nodes

of a syntax tree. The value of non-terminal E on the left side of E E1 + E will be

computed into a new temporary t. In general, the three- address code for id: = E consists of

code to evaluate E into some temporary t, followed by the assignment id.place: = t. If an


52


expression is a single identifier, say y, then y itself holds the value of the expression. For the

moment, we create a new name every time a temporary is needed; techniques for reusing

temporaries are given .

The S-attributed definition in Fig generates three-address code for assignment statements.

Given input a: = b* – c + b* – c, it produces the code in Fig. 8.5(a). The synthesized

attribute S.code represents the three- address code for the assignment S. The non-terminal E

has two attributes:

1. E.place, the name that will hold the value of E, and

2. E.code, the sequence of three-address statements evaluating E.

The function newtemp returns a sequence of distinct names t1, t2,... in response to

successive calls. For convenience, we use the notation gen(x ‘: =‘ y ‘+‘ z) in Fig. 8.6 to

represent the three-address statement x: = y + z. Expressions appearing instead of variables

like x, y, and z are evaluated when passed to gen, and quoted operators or operands, like ‘+‘,

are taken literally. In practice, three- address statements might be sent to an output file,

rather than built up into the code attributes. Flow-of-control statements can be added to the

language of assignments in Fig. 8.6 by productions and semantic rules) like the ones for

while statements in Fig. 8.7. In the figure, the code for S - while E do S, is generated using‘

new attributes S.begin and S.after to mark the first statement in the code for E and the

statement following the code for S, respectively.


53


These attributes represent labels created by a function new label that returns a new label

every time it is called. Note that S.after becomes the label of the statement that comes after

the code for the while statement. We assume that a non-zero expression represents true; that

is, when the value of F becomes zero, control leaves the while statement. f:expressions that


54


govern the flow of control may in general be Boolean expressions containing relational and

logical operators. The semantic rules for while statements in Section 8.6 differ from those in

Fig. 8.7 to allow for flow of contro1 within Boolean expressions. Postfix notation -an be

obtained by adapting the semantic rules in Fig. 8.6 (or see Fig. 2.5). 1he postfix notation for

an identifier is the identifier itself. The rules for the other productions concatenate only the

operator after the code for the operands. For example, associated with the production E – E,

is the semantic rule

E.code:= E1.code || ’uminus’

1n general, the intermediate form produced by the syntax-directed translations in this

chapter can he changed by making similar modifications to the semantic rules.

6. List out and discuss the different type of intermediate code.

There are three types of intermediate representation:-

1. Syntax Trees

2. Postfix notation

3. Three Address Code

Semantic rules for generating three-address code from common programming language

constructs are similar to those for constructing syntax trees of for generating postfix

notation.

Graphical Representations

A syntax tree depicts the natural hierarchical structure of a source program. A DAG

(Directed Acyclic Graph) gives the same information but in a more compact way because


55


common sub-expressions are identified. A syntax tree for the assignment statement a:=b*-

c+b*-c appear in the figure.

. fig8.2

Posix (Postfix notation) is a linearized representation of a syntax tree; it is a list of the

nodes of the in which a node appears immediately after its children. The postfix notation for

the syntax tree in the fig is

The edges in a syntax tree do not appear explicitly in postfix notation. They can be

recovered in the order in which the nodes appear and the no. of operands that the operator at

a node expects. The recovery of edges is similar to the evaluation, using a staff, of an

expression in postfix notation.


X:= Y Op Z

where x, y, and z are names, constants, or compiler-generated temporaries; op stands for any

operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on

Boolean-valued data. Note that no built-up arithmetic expressions are permitted, as there is

only one operator on the right side of a statement. Thus a source language expression like

x+y*z MIGHT be translated into a sequence


56


t1 := y * z

t2 : = x + t1

where t1 and t2 are compiler-generated temporary names. This unraveling of complicated

arithmetic expressions and of nested flow-of-control statements makes three-address code

desirable for target code generation and optimization. The use of names for the intermediate

values computed by a program allow- three-address code to be easily rearranged – unlike

postfix notation. three-address code is a linearized representation of a syntax tree or a dag in

which explicit names correspond to the interior nodes of the graph.

Eg: a=b+c+d

t1 := b+c

t2:= t1+d

a= t2

Implementations of three-Address Statements

A three-address statement is an abstract form of intermediate code. In a compiler, these

statements can be implemented as records with fields for the operator and the operands.

Three such representations are quadruples, triples, and indirect triples.

Quadruples

A quadruple is a record structure with four fields, which we call op, arg l, arg 2, and result.

The op field contains an internal code for the operator. The three-address statement x:= y op

z is represented by placing y in arg 1. z in arg 2. and x in result.

For example : consider the input statement x:= -a *b + -a * b

The TAC is

t1 := uminus a

t2 := t1 * b

t3 := -a

t4 := t3 + t4

x := t5

op Arg1 Arg2 Result

(0) uminus a t1

(1) * t1 b t2


57


fig8.8(a)Qudraples

Triples

To avoid entering temporary names into the symbol table. we might refer to a temporary value

bi the position of the statement that computes it. If we do so, three-address statements can be

represented by records with only three fields: op, arg 1 and arg2, as in Fig. 8.8(b). The fields arg

l and arg2, for the arguments of op, are either pointers to the symbol table (for programmer-

defined names or constants) or pointers into the triple structure (for temporary values). Since

three fields are used, this intermediate code format is known as triples.‘ Except for the treatment

of programmer-defined names, triples correspond to the representation of a syntax tree or dag by

an array of nodes, as in Fig. 8.4triples

fig8.8(b) triples

(2) uminus a t3

(3) * t3 b t4

(4) + t2 t4 t5

(5) := t5 x

number op Arg1 Arg2

(0) uminus a

(1) * (0) b

(2) uminus a

(3) * (2) b

(4) + (1) (3)

(5) := x (4)


58


Parenthesized numbers represent pointers into the triple structure, while symbol-table

pointers are represented by the names themselves. In practice, the information needed to

interpret the different kinds of entries in the arg 1 and arg2 fields can be encoded into the op

field or some additional fields. The triples in Fig. 8.8(b) correspond to the quadruples in Fig.

8.8(a). Note that the copy statement a:= t5 is encoded in the triple representation by placing

a in the arg 1 field and using the operator assign. A ternary operation like x[ i ]: = y requires

two entries in the triple structure, as shown in Fig. 8.9(a), while x: = y[i] is naturally

represented as two operations in Fig. 8.9(b).

Indirect Triples

Another implementation of three-address code that has been considered is that of listing

pointers to triples, rather than listing the triples themselves. This implementation is naturally

called indirect triples. For example, let us use an array statement to list pointers to triples in

the desired order.


59


How declarations are translated into intermediate code? (11 marks)

As the sequence of declarations in a procedure or block is examined, we can lay out

storage for names local to the procedure. For each local name, we create a symbol-table

entry with information like the type and the relative address of the storage for the name. The

relative address consists of an offset from the base of the static data area or the field for

local data in an activation record. When the front end generates addresses, it may have a

target machine in mind. Suppose that addresses of consecutive integers differ by 4 on a

byte- addressable machine. The address calculations generated by the front end may

therefore include multiplications by 4. The instruction set of the target machine may also

favor certain layouts of data objects, and hence their addresses. We ignore alignment of data

objects here;

DECLARATIONS IN A PROCEDURE

The syntax of languages such as C, Pascal, and FORTRAN, allows all the declarations in a

single procedure to be processed as a group. In this case, a global variable, say offset, can

keep track of the next avai1able relative address. Non-terminal P generates a sequence of

declarations of the form id: T. Before ‘.he first declaration is considered, offset is set to 0.

As each new name is seen, that name is entered in the symbol table with offset equal to the

current value of offset, and offset is incremented by the width of the data object denoted by

that name. The procedure enter (name, type, offset) creates a symbol-table entry for name,

gives it type and relative address offset in its data area. We use synthesized attributes type

and width for non-terminal T to indicate the type and width, or number of memory units

statement

(0) (11)

(1) (12)

(2) (13)

(3) (14)

(4) (15)

(5) (16)

number op Arg1 Arg2

(0) uminus a

(1) * (11) b

(2) uminus a

(3) * (13) b

(4) + (12) (14)

(5) := x (15)


60


taken by objects of that type. Attribute type represents a type expression constructed from

the basic type‘s integer and real by applying the type constructors‘ pointer and array. If type

expressions are represented by graphs, then attribute type might be a pointer to the node

representing a type expression. Integers have width 4 and real have width 8. The width of an

array is obtained by multiplying the width of each element by the number of elements in the

array.- The width of each pointer is assumed to be 4.

P ->D

D -> D ; D

D -> id : T {enter (id.name, T.type, offset);

Offset:= offset + T.width }

T ->integer {T.type :=integer;

T.width :=4}

T -> real {T.type := real;

T.width := 8}

T -> array [num ] of T1 {T.type :=array(num.val, T1.type);

T.width :=num.val X T1.width}

T -> ^T1 {T.type :=pointer (T.type);

T.width:=4}

In Pascal and C, a pointer may be seen before we learn the type of the object

pointed to Storage allocation for such types is simpler if all pointers have the same width.

The initialization of offset in the translation scheme of Fig. 8.1 is more evident if the first

production appears on one line as:

P {offset:= 0 } D

Non-terminals generating a. called marker non-terminals in Section 5.6, can be used to

rewrite productions so that all actions appear at the ends of right sides. Using a marker non-

terminal M, (8.2) can be restated as:

P → M D

M →ε (offset:= 0}

Keeping Track of Scope Information

In a language with nested procedures, names local to each procedure can be

assigned relative addresses using the approach of Fig. 8.11 . When a nested procedure is

seen, processing of declarations in the enclosing procedure is temporarily suspended. This

approach will he illustrated by adding semantic rules to the following language.

P → D

D → D;D | id: T proc id; D;S

The production for non-terminals S for statements and T for types are not shown because

we focus on declarations. The non-terminal T has synthesized attributes type and width, as

in the translation scheme of Fig. For simplicity, suppose that there is a separate symbol table

for each procedure in the language.


61


The semantic rules are defined in terms of the following operations:

1. mktable (previous) creates a new symbol table and returns a pointer to the new table. The

argument previous points to a previously created symbol table, presumably that for the

enclosing procedure. The pointer previous is placed in a header for the new symbol table,

along with additional information such as the nesting depth of a procedure. We can also

number the procedures in the order they are declared and keep this number in the header.

2. enter (table, name, type, offset) creates a new entry for name name in the symbol table

pointed to by table. Again, enter places type and relative address offset in fields within the

entry.

3. addwidth (table, width) records the cumulative width of all the entries table in the header

associated with this symbol table.

4. enterproc (table, name, newtable) creates a new entry for procedure name in the symbol

table pointed to by table. The argument newtable points to the symbol table for this

procedure name.

The translation scheme in Fig. S. l3 shows how data can be laid out in one pass, using a

stack tblptr to hold pointers to symbol tables of the enclosing procedures. With the symbol

tables to tblptr will contain pointers to the tables for -ort, quicksort, and partition when the

declarations in partition are considered. The pointer to the current symbol table is on top.

The other stack offset is the natural generalization to nested procedures of attribute offset.

The top element of offset is the next available relative address for a local of the current

procedure. All semantic actions in the sub-trees for B and C in

A B C {actionA}


62


are done before actionA the end of the production occurs. Hence, the action associated with

the marker M in Fig. 8.l3 is the first to be done. The action for non-terminal M initializes

stack tblptr with a symbol table for the outermost scope, created by operation mktable(nil).

The action also pushes relative address 0 onto stack offset.

The non-terminal V plays a similar role when a procedure declaration appears. Its action

uses the operation mktable(top(tblptr)) to create a new symbol table. Here the argument

top(tblptr) gives the enclosing scope of the new table. A pointer to the new table is pushed

above that for the enclosing scope. Again, 0 is pushed onto offset.

For each variable declaration id: T. an entry is created for id in the current symbol

table. This declaration leaves the stack pointer unchanged; the top of stack offset is

incremented by T.width. When the action on the right side of D proc id: N D,; S occurs. The

width of all

Declarations generated by D1 is on top of stack offset.‘, it is recorded using addwidth. and

offset are then popped, and we revert to examining the declarations in the closing procedure.

At this point, the name of the enclosed procedure is entered into the symbol table of its

enclosing procedure.

P → M D {addwidth(top(tblptr), top(offset));

Pop(tblptr); pop(offset)}

M → ε { t := mktable(nil);

Push(t,tblptr); push(0,offset)}

D → D1 ;D2

D → proc id ; N D1 ;S { t := top(tblptr);

addwidth(t.top(offset));

pop(tblptr); pop(offset);

enterproc(top(tblptr), id.name, t)}

D → id : T {enter(top(tblptr),id.name,T.type,top(offset));

top(offset) := top(offset) +T.width }

N → ε { t := mktable(top(tblptr));

Push(t, tblptr); push(0, offset)}

Field Names in Records

The following production allows non-terminal T to generate records in addition to basic

types, pointers, and arrays:

T → record D end

The actions in the translation scheme of Fig. S.I4 emphasize the similarity between the

layout of records as a language construct and activation records. Since procedure definitions

do not affect the width computations in Fig. 8.13, we overlook the fact that the above

production also allows procedure definitions to appear within records.


63


T → record L D end {T.type := record(top(tblptr));

T.width := top(offset);

Pop(tblptr); pop(offset) }

L→ ε { t:= mktable(nil);

Push(t, tblptr); push (0, offset) }

EXPLAIN BRIEFLY ABOUT ASSIGNMENT STATEMENTS? (11 MARKS)

The Assignment statement mainly deals with the expression. The expression can be

of type integer, real, array and record.

In the translation of assignments into TAC, names can be looked up in the symbol

table as follows.

CFG SEMANTIC ACTION

S → id: = E { p := lookup(id.name);

if p != nil then

emit(p ′ :=′ E.place) else error }

E → E1 + E2 { E. place := newtemp;

emit(E.place ′ :=′ E1.place ′ +′ E2.place) }

E → E1 ∗ E2 { E. placer:= newtemp;

emit(E.place ′ :=′ E1.place ′ *′ E2.place) }

E → −E1 { E. place := newtemp;

emit(E.place ′ :=′ ′uminus′ E1.place) }

E → (E1) { E.place := E1.place }

E → id { p := lookup(id.name);

if p != nil then

E.place := p

else error; }

Three address statements using names for pointers to their symbol table entries.

The lexeme for the name represented by id,the attribute as id.name


64


The operation lookup(id.name),check if there is an entry for the occurrence of the name

in the symbol table.

If so,a pointer of the entry is returned .

Otherwise returns nil to indicate that no entry is found.

The semantic action use procedure emit or append to 3 address statements to an output

file, code attributes for non-terminals.

Sid := E

The non-terminal S represents the name modified lookup operation first checks if name

appears in the current symbol table, accessible through table pointer.

If not ,lookup uses the pointer in the header of a table to find the symbol table.

If the name cannot be found, then lookup returns nil


65



66



67



68



69



70



71


CONTROL-FLOW REPRESENTATION OF BOOLEAN EXPRESSIONS

We start by presenting the translation for flow-of-control statements generated

by the following grammar

S → if E then S

| if E then S1 else S2

| while E do S

In the translation, we assume that a three-address code statement can have a

symbolic label, and that the function newlabel generates such labels.

We associate with E two labels using inherited attributes

E.true, the label to which control flows if E is true.

E.false, the label to which control flows if E is false.

We associate to S the inherited attribute S.next that represents the label

attached to the first statement after the code for S.


72


PRODUCTION SEMANTIC RULES

S → if E then S1

E.true := newlabel;

E.false := S.next;

S1.next := S.next;

S.code := E.code || gen(E.true ′ :′) || S1.code

S → if E then S1 else S2

E.true := newlabel;

E.false := newlabel;

S1.next := S.next;

S2.next := S.next;

S.code := E.code || gen(E.true ′ :′) || S1.code ||

gen(′goto′ S.next) ||

gen(E.false ′ :′) || S2.code

S → while E doS1

S.begin := newlabel;

E.true := newlabel;

E.false := S.next;

Explain the Boolean expressions in Intermediate generation? (11 marks)

Boolean expressions are composed, of the Boolean operators (and, or, and not)

applied to elements that are Boolean variables or relational expressions. In turn, relational

expressions are of the form E1 relop E2, where E) and E2 are arithmetic expressions.

Where relop is any of <, <=, =, !=, >, or >=.

Assume that ‗or‘ and ‗and‘ are left-associative, and that or has lowest precedence, then and, then

not.


73


METHODS OF TRANSLATING BOOLEAN EXPRESSION

The first method is to encode true and false numerically and to evaluate a Boolean expression to an

arithmetic expression. Often 1 is used to denote true and 0 to denote false

The second method of implementing Boolean expressions is by flow of control, representing the

value of a Boolean expression by a position reached in a program.

Numerical Representation


74


emit places address statements into output file

nextstat that gives the index of the next three-address code statement and is

incremented by emit.

We use the attribute op to determine which of the comparison operators is

represented by relop.

SHORT CIRCUIT CODE

In this TAC we have used goto to jump on some specific statement

This style of evaluation is called Short-circuit or Jumping code


75


MIXED MODE BOOLEAN EXPRESSION

Boolean Expressions often contain Arithmetic sub-expressions—e.g.

(a+b)<c.

On the other hand, if true = 1 and false = 0, then (a<b)+(b<a) can be an

arithmetic expression with value 0 if a=b and 1 otherwise.

Consider the following Grammar:

E → E+E | E and E | E relop E | id

E+E, produces an arithmetic result, and the arguments can be mixed;

E and E, produces a Boolean result, and both arguments must be Boolean;

E relop E, produces a Boolean result, and the arguments can be mixed;

id is assumed of type arithmetic

To generate code we use a synthesized attribute E.type, that will be either

arith or bool.

Boolean Expressions will have inherited attributes E.true and E.false useful for the

jumping code.

Arithmetic Expressions will have the synthesized attribute E.place standing for the

(temporary) variable holding the value of E.

The global variable nextstat gives the index of the next three-address code

statement and is incremented by gen.

The semantic rule for E → E1+E2

E.type := arith;

If E1.type := arith and E2.type := arith then begin


76


E.place := newtemp;

E.code := E1.code || E2.code ||

gen(E.place′ :=′ E1.place ′ +′ E2.place)

end

esle if E1.type := arith and E2.type := bool then begin

E.place:= newtemp;

E2.true := newlabel;

E2.false := newlabel;

E.code := E1.code || E2.code ||

gen(E2.true′ :′ E.place ′ :=′ E1.place + 1) || gen(′goto′ nextstat + 1) ||

gen(E2.false′ :′ E.place ′ :=′ E1.place)

Write short notes on Case statements? (6 marks)

Consider the following switch statement.

Switch E

begin

case V1: S1

case V2 : S2

………..

case Vn-1 : Sn-1

default : Sn

end

The translation of switch code is

Evaluate the expression.

Find which value in the list of cases is same as the value of expression.

Execute the statement associated with the value found.

If any value is not matched, then default value displays the statement.

COMPILER PROCESS To implement a conditional goto is to create a table of pairs

It consists of a value.


77


Transfer of code for the corresponding statement.

A compiler to compare the value of expression with each value in the table

If no other match is found, the last (default) entry is sure to match.

SYNTAX DIRECTED TRANSLATION OF CASE STATEMENTS To translate in the form the keyword switch generate 2 labels

Test and next

A new temporary variable t.

The expression E, generate code to evaluate E into t.

After processing E, generate the jump goto test.

To translate in the form the keyword case

Create a new label Li and enter into the symbol table.

A pointer to symbol table entry and value Vi is case constant.

Statement case Vi : Si, creates label Li, followed by code for Si, followed by jump goto

next.

end terminate the body of switch statement.

CASE STATEMENT TAC

Switch E

begin

case V1: S1

case V2: S2

...

case Vn-1: Sn-1

default: Sn

end

Code to evaluate E into t

goto test

L1:code for S1

goto next

L2: code for S2

goto next

...

Ln-1: code for Sn-1

goto next

Ln: code for Sn

goto next

test: if t = V1 goto L1

if t = V1 goto L1

……….

if t = Vn-1 goto Ln-1

goto Ln

next:

Explain briefly about back patching (11 marks)

To implementing syntax-directed definitions, compute the translations given in the

definition.

To generating three address codes in a single pass for Boolean expressions and flow of

control statements is that we may not know the labels that control must go to at the time

jump statements are generated.


78


Each such statement will be put on a list of goto statements whose labels will be filled in

when the proper label can be determined.

This subsequent filling of addresses for the determined labels is called Backpatching.

Backpatching can be used to generate code for boolean expression and flow of control

statements in one pass.

Backpatching ,we generate quadruples into a quadruple array and Labels are indices to

this array.

To manipulate list if labels ,we use three functions:

makelist(i) -- creates a new list containing only i, an index into the array of quadruples

and returns pointer to the list it has made.

merge(i,j) – concatenates the lists pointed to by i and j ,and returns a pointer to the

concatenated list.

backpatch(p,i) – inserts i as the target label for each of the statements on the list pointed

to by p.

Translation scheme for Boolean expression. The grammar be:

E → E1 or M E2

E → E1 and M E2

E → not E1

E → (E1)

E → id1 relop id2

E → false

E → true

M → ε

Two synthesized attributes truelist and falselist of non-terminal E are used to generate

jumping code for Boolean expressions.

E.truelist : Contains the list of all the jump statements left incomplete to be filled by the

label for the start of the code for E=true.

E.falselist : Contains the list of all the jump statements left incomplete to be filled by the

label for the start of the code for E=false.

The variable nextquad holds the index of the next quadruple to follow.

M.quad represents records the number of first statement(index).

The semantic actions as

E → E1 or M E2

{backpatch(E1.falselist, M.quad)

E.truelist = merge(E1.truelist, E2.truelist);

E.falselist = E2.falselist);}

E → E1 and M E2

{ backpatch(E1.truelist, M.quad);

E.truelist = E2.truelist);

E.falselist = merge(E1.falselist, E2.falselist);}

E → not E1


79


{ E.truelist = E1.falselist;

E.falselist = E1.truelist; }

E → (E1)

{ E.truelist = E1.truelist;

E.falselist = E1.falselist;}

E → id1 relop id2

{ E.truelist = makelist(nextquad);

E.falselist = makelist(nextquad +1 );

emit(if id1.place relop id2.place goto __ );

emit(goto ___);}

E → true

{ E.truelist = makelist(nextquad);

emit(goto ___);}

E → false

{ E.falselist = makelist(nextquad);

emit(goto ___);}

M → ε

{ M.Quad = nextquad;}

When a compiler encounters a statement like goto L, it looks for the label L in the symbol table.

If the jump is backward, that is, L has already been encountered, and then the symbol table will

have an entry giving the number of the first quadruple generated for the statement labeled L.

Generate a goto three-address statement with that quadruple number as target.

If the jump is forward, this may be the first occurrence of the label L, and if

so enter L into the symbol table. In any case, if the statement labeled L has not yet been

encountered, then generate a goto quadruple with unspecified target and add the quadruple

generated to a list of quadruples whose target is L. A pointer to this list appears in the symbol table

entry for L.

The syntax of labeled statements with productions such as

S LABEL: S

LABEL id

The semantic action associated with LABEL id is to

1. Install that identifier in the symbol table if it is not already there,

2. record that the quadruple referred to by this label is the current value of NEXTQUAD, and

finally

3. Back patch the list of goto's whose targets are the label just discovered.

Structured Flow-of-Control Constructs

A more complex example-of-flow of control concerns nested, or structured, control statements.


80


.

Not only do Boolean expressions need two lists of jumps that occur when the expression is true and

when it is false, but statements also need lists of jumps (NEXT lists) to the code that follows them

in the execution sequence.

Scheme to Implement the Translation

The non terminal E has the two translation fields E. TRUE and E. FALSE. L and S each

need a list of unfilled quadruples which must eventually be completed by back patching.

These lists are pointed to by the translation fields L.NEXT and S. NEXT. S.NEXT is a pointer to a

list of all conditional and unconditional jumps to the quadruple following the statement S in

execution order, and L.NEXT is defined similarly

Each M has a translation M. QUAD, which is the number of the first quadruple following.

E.TRUE is back patched to go to the beginning of S(I) by making jumps on the ETRUE list go to

M(2) .QUAD

A more compelling argument for using S. NEXT and L. NEXT comes, generating code for the

conditional statement if E then S(I) else S(2). When we finish executing S(1), we have no idea

where to go next, since it may well be to the quadruple following S(2), whose index is not known

until we finish generating code for S(2). The use of the marker non terminal M solves this problem

as well


81


WRITE SHORT NOTES ON PROCEDURE CALLS? (5 MARKS)

Simple procedure call statement

S call id (elist)

elist elist, E

elist E

Translation includes

Calling sequence actions taken on entry to and exit from each procedure.

Arguments are evaluated and put in a known places(return address) location to which the

called routine must transfer after it is finished.

Static allocation return address is placed after code sequence itself.

Parameters passed by reference.


82


3 address code generates statements needed to evaluate those arguments that are simple

names then the list.

For separate evaluation:

Save E.place for each expression E in id(E,E,E,E,..)

Data structure used is queue.

Semantics:

1. S call id (elist)

{for each item p on queue do

Gen (param p);

Gen (call id.place);}

2. elist elist, E

{ append E.place to end of queue}

3. elist E

{initialize queue to contain only E.place}

Queue is emptied & single pointer is given to symbol table denoting value of E.

TWO MARKS

1. What are parsers?

Parser

• Accepts string of tokens from lexical analyzer (usually one token at a time)

• Verifies whether or not string can be generated by grammar

• Reports syntax errors (recovers if possible)

THE ROLE OF A PARSER

Parser obtains a string of tokens from the lexical analyzer and verifies that it can be generated

by the language for the source program. The parser should report any syntax errors in an

intelligible fashion.

2. What are the two types of Parser?


83


The two types of parsers employed are:

1. Top down parser: which build parse trees from top (root) to bottom (leaves)

2. Bottom up parser: which build parse trees from leaves and work up the root.

Therefore there are two types of parsing methods– top-down parsing and bottom-up parsing.

3. Mention the basic issues in parsing.

There are two important issues in parsing.

Specification of syntax

Representation of input after parsing.

4. Why lexical and syntax analyzers are separated out?

Reasons for separating the analysis phase into lexical and syntax analyzers:

Simpler design.

Compiler efficiency is improved.

Compiler portability is enhanced.

5. Define a context free grammar.

A context free grammar G is a collection of the following

V is a set of non terminals

T is a set of terminals

S is a start symbol

P is a set of production rules

G can be represented as G = (V, T, S, P)

http://www.mec.ac.in/resources/notes/notes/compiler/module2/module2%5Ctdp.htm

http://www.mec.ac.in/resources/notes/notes/compiler/module2/module2%5Cbup.htm


84


Production rules are given in the following form

Non terminal → (V U T)*

6. Briefly explain the concept of derivation.

Derivation from S means generation of string w from S. For constructing derivation two things

are important.

i) Choice of non terminal from several others.

ii) Choice of rule from production rules for corresponding non terminal.

Instead of choosing the arbitrary non terminal one can choose

i) Either leftmost derivation – leftmost non terminal in a sentinel form

ii) Or rightmost derivation – rightmost non terminal in a sentinel form

7. Define ambiguous grammar.

A grammar G is said to be ambiguous if it generates more than one parse tree for some sentence

of language L (G).

i.e. both leftmost and rightmost derivations are same for the given sentence..

8. List the properties of LR parser.

1. LR parsers can be constructed to recognize most of the programming languages for which the

context free grammar can be written.

2. The class of grammar that can be parsed by LR parser is a superset of class of grammars that

can be parsed using predictive parsers.

3. LR parsers work using non backtracking shift reduce technique yet it is efficient one.

9. Mention the types of LR parser.


85


SLR parser- simple LR parser

LALR parser- look ahead LR parser

Canonical LR parser

10. What are the problems with top down parsing?

The following are the problems associated with top down parsing:

Backtracking

Left recursion

Left factoring

Ambiguity

11. Write the algorithm for FIRST and FOLLOW.

FIRST

1. If X is terminal, and then FIRST(X) IS {X}.

2. If X → ε is a production, then add ε to FIRST(X).

3. If X is non terminal and X → Y1,Y2..Yk is a production, then place a in FIRST(X) if for

some i , a is in FIRST(Yi) , and ε is in all of FIRST(Y1),…FIRST(Yi-1);

FOLLOW

1. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.

2. If there is a production A → αBβ, then everything in FIRST (β) except for ε is placed in

FOLLOW (B).

3. If there is a production A → αB, or a production A→ αBβ where FIRST (β) contains ε, then

everything in FOLLOW (A) is in FOLLOW (B).

12. List the advantages and disadvantages of operator precedence parsing.

Advantages


86


This type of parsing is simple to implement.

Disadvantages

1. The operator like minus has two different precedence (unary and binary).Hence it is hard to

handle tokens like minus sign.

2. This kind of parsing is applicable to only small class of grammars.

13. What is dangling else problem?

Ambiguity can be eliminated by means of dangling-else grammar which is show below:

stmt → if expr then stmt

| if expr then stmt else stmt

| other

14. Write short notes on YACC.

YACC is an automatic tool for generating the parser program.

YACC stands for yet another Compiler- Compiler which is basically the utility available from

UNIX.

Basically YACC is LALR parser generator.

It can report conflict or ambiguities in the form of error messages.

15. What is meant by handle pruning?

A rightmost derivation in reverse can be obtained by handle pruning.

If w is a sentence of the grammar at hand, then w = γn, where γn is the nth right-sentential form

of some as yet unknown rightmost derivation

S = γ0 => γ1…=> γn-1 => γn = w

16. Define LR (0) items.

An LR (0) item of a grammar G is a production of G with a dot at some position of the right

side. Thus, production A → XYZ yields the four items


87


A→.XYZ

A→X.YZ

A→XY.Z

A→XYZ.

17. What is meant by viable prefixes?

The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser

are called viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a

right sentential form that does not continue past the right end of the rightmost handle of that

sentential form.

18. Define handle.

A handle of a string is a substring that matches the right side of a production, and whose

reduction to the nonterminal on the left side of the production represents one step along the

reverse of a rightmost derivation.

A handle of a right – sentential form γ is a production A→β and a position of γ where the string

β may be found and replaced by A to produce the previous right-sentential form in a rightmost

derivation of γ. That is , if S =>αAw =>αβw,then A→β in the position following α is a handle

of αβw.

19. What are kernel & non-kernel items?

Kernel items, which include the initial item, S'→ .S, and all items whose dots are not at the left

end.

Non-kernel items, which have their dots at the left end.

20. What is phrase level error recovery?

Phrase level error recovery is implemented by filling in the blank entries in the predictive

parsing table with pointers to error routines. These routines may change, insert, or delete

symbols on the input and issue appropriate error messages. They may also pop from the stack.

21. What are different kinds of errors encountered during compilation?

Compiler Errors


88


• Lexical errors (e.g. misspelled word)

• Syntax errors (e.g. unbalanced parentheses, missing semicolon)

• Semantic errors (e.g. type errors)

• Logical errors (e.g. infinite recursion)

Error Handling

• Report errors clearly and accurately

• Recover quickly if possible

• Poor error recover may lead to avalanche of errors

22. What are different error recovery strategies?

Error Recovery strategies

• Panic mode: discard tokens one at a time until a synchronizing token is found

• Phrase-level recovery: Perform local correction that allows parsing to continue

• Error Productions: Augment grammar to handle predicted, common errors

• Global Production: Use a complex algorithm to compute least-cost sequence of changes

leading to parseable code

23. Explain Recursive descent parsing.

Recursive descent parsing: corresponds to finding a leftmost derivation for an input string

Equivalent to constructing parse tree in pre-order

Example:

Grammar: S ! cAd A ! ab j a

Input: cad

Problems:

1. Backtracking involved () buffering of tokens required)

2. left recursion will lead to infinite looping

3. Left factors may cause several backtracking steps

24. Give an example of ambiguous grammar.

Ambiguous grammar:

E ::= E ‖_‖ E | E ‖+‖ E | ‖1‖ | ‖(‖ E ‖)‖

Unambiguous grammar

E ::= E ‖+‖ T | T


89


T ::= T ‖_‖ F | F

F ::= ‖1‖ | ‖(‖ E ‖)‖

25. What is left recursion? How it is eliminated?

26. What is left factoring?

Left Factoring

• Rewriting productions to delay decisions

• Helpful for predictive parsing

• Not guaranteed to remove ambiguity

A αβ1 | αβ2

A αA‘

A‘ β1 | β2


90


27. What is top down parsing?

Top down Parsing

• Can be viewed two ways:

– Attempt to find leftmost derivation for input string

– Attempt to create parse tree, starting from at root, creating nodes in preorder

• General form is recursive descent parsing

– May require backtracking

– Backtracking parsers not used frequently because not needed

28. What is predictive parsing?

• A special case of recursive-descent parsing that does not require backtracking

• Must always know which production to use based on current input symbol

• Can often create appropriate grammar:

– removing left-recursion

– left factoring the resulting grammar

29. Define LL (1) grammar.

LL (1) Grammars

• Algorithm covered in class can be applied to any grammar to produce a parsing table

• If parsing table has no multiply-defined entries, grammar is said to be ―LL(1)‖

– First ―L‖, left-to-right scanning of input

– Second ―L‖, produces leftmost derivation

– ―1‖ refers to the number of lookahead symbols needed to make decisions

30. List the three kinds of intermediate representation.


91


The three kinds of intermediate representations are

i. Syntax trees

ii. Postfix notation

iii. Three address code

31. How can you generate three-address code?

The three-address code is generated using semantic rules that are similar to those

for constructing syntax trees for generating postfix notation.

32. What is a syntax tree? Draw the syntax tree for the assignment statement

a := b * -c + b * -c.

A syntax tree depicts the natural hierarchical structure of a source program.

Syntax tree:

assign

a +

* *

b uminus b uminus

c c

33. Define three-address code.


x := y op z

where x, y and z are names, constants, or compiler-generated temporaries; op stands

for any operator, such as fixed or floating-point arithmetic operator, or a logical

operator on boolean-valued data.

Three-address code is a linearized representation of a syntax tree or a dag in which

explicit names correspond to the interior nodes of the graph.

34. Construct three address codes for the following

Position: = initial + rate * 60

temp1:= inttoreal (60)

temp2:= id3 * temp1

temp3:= id2 + temp2


92


id1:= temp3

35. What are triples?

The fields arg1,and arg2 for the arguments of op, are either pointers to the

symbol table or pointers into the triple structure then the three fields used in the

intermediate code format are called triples.

In other words the intermediate code format is known as triples.

36. Draw the DAG for a: = b * -c + b * -c

assign

a +

*

b uminus

c

37. List the types of three address statements.

The types of three address statements are

a. Assignment statements

b. Assignment Instructions

c. Copy statements

d. Unconditional Jumps

e. Conditional jumps

f. Indexed assignments

g. Address and pointer assignments

h. Procedure calls and return

38. What are the various methods of implementing three-address statements?

i. Quadruples

ii. Triples

iii. Indirect triples


93


39. What is meant by declaration?

The process of declaring keywords, procedures, functions, variables, and statements with

proper syntax is called declaration.

40. How semantic rules are defined?

The semantic rules are defined by the following ways

a. mktable(previous)

b. enter(table,name,type,offset)

c. addwidth(table, width)

d. enterproc(table,name,newtable)

41. What are the two primary purposes of Boolean Expressions?

They are used to compute logical values

o They are used as conditional expressions in statements that alter the flow of

control, such as if-then, if-then-else, or while-do statements.

42. Define Boolean Expression.

Expressions which are composed of the Boolean operators (and, or, and not) applied to

elements that are Boolean variables or relational expressions are known as Boolean

expressions

43. What are the two methods to represent the value of a Boolean expression?

i. The first method is to encode true and false numerically and to evaluate a

Boolean expression analogously to an arithmetic expression.

ii. The second principal method of implementing Boolean expression is by flow

of control that is representing the value of a Boolean expression by a position

reached in a program.

44. What do you mean by viable prefixes?

Viable prefixes are the set of prefixes of right sentinels forms that can appear on the stack

of shift/reduce parser are called viable prefixes. It is always possible to add terminal

symbols to the end of the viable prefix to obtain a right sentential form.

45. What is meant by Shot-Circuit or jumping code?

We can also translate a Boolean expression into three-address code without generating

code for any of the Boolean operators and without having the code necessarily evaluate the

entire expression. This style of evaluation is sometimes called ―short-circuit‖ or ―jumping‖

code.


94


46. What is the intermediate code representation for the expression a or b and not c?

(Or) Translate a or b and not c into three address code.

Three-address sequence is

t1 := not c

t2 := b and t1

t3 := a or t2

47. Explain the following functions:

i) makelist(i) ii) merge(p1,p2) iii) backpatch(p,i)

i. makelist (i) creates a new list containing only I, an index into the array

of quadruples; makelist returns a pointer to the list it has made.

ii. merge(p1,p2) concatenates the lists pointed to by p1 and p2 , and returns a

pointer to the concatenated list.

iii. backpatch(p,i) inserts i as the target label for each of the statements on the list

pointed to by p.

48. Define back patching.

Back patching is the activity of filling up unspecified information of labels using

appropriate semantic actions in during the code generation process.

49. What is handle pruning?

• Repeat the following process, starting from string of tokens until obtain start symbol:

– Locate handle in current right-sentential form

– Replace handle with left side of appropriate production

• Two problems that need to be solved:

– How to locate handle

– How to choose appropriate production

50. What are LR parsers?

LR Parsers

• LR Parsers us an efficient, bottom-up parsing technique useful for a large class of

CFGs

• Too difficult to construct by hand, but automatic generators to create them exist

(e.g. Yacc)

• LR(k) grammars


95


– ―L‖ refers to left-to-right scanning of input

– ―R‖ refers to rightmost derivation (produced in reverse order)

– ―k‖ refers to the number of lookahead symbols needed for decisions (if

omitted, assumed to be 1)

–

51. What are the benefits of LR parsers?

Benefits of LR Parsing

• Can be constructed to recognize virtually all programming language construct for

which a CFG can be written

• Most general non-backtracking shift-reduce parsing method known

• Can be implemented efficiently

• Handles a class of grammars that is a superset of those handled by predictive

parsing

• Can detect syntactic errors as soon as possible with a left-to-right scan of input

52. What are three types of LR parsers?

Three methods:

a. SLR (simple LR)

i. Not all that simple (but simpler than other two)!

ii. Weakest of three methods, easiest to implement

b. Constructing canonical LR parsing tables

i. Most general of methods

ii. Constructed tables can be quite large

c. LALR parsing table (lookahead LR)

i. Tables smaller than canonical LR

ii. Most programming language constructs can be handled

53. What are the benefits of intermediate code generation?

A Compiler for different machines can be created by attaching different back end

to the existing front ends of each machine.

A Compiler for different source languages can be created by proving different

front ends for corresponding source languages t existing back end.

A machine independent code optimizer can be applied to intermediate code in

order to optimize the code generation.

54. Mention the functions that are used in back patching.


96


makelist(i) creates the new list. The index i is passed as an argument to this

function where I is an index to the array of quadruple.

merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It

returns the pointer to the concatenated list.

backpatch(p,i) inserts i as target label for the statement pointed by pointer p.

55. What is the intermediate code representation for the expression a or b and not c?

The intermediate code representation for the expression a or b and not c is the three

address sequence

t1 := not c

t2 := b and t1

t3:= a or t2

56. What are the various methods of implementing three address statements?

The three address statements can be implemented using the following methods.

Quadruple: a structure with almost four fields such as

operator(OP),arg1,arg2,result.

Triples: the use of temporary variables is avoided by referring the pointers in

the symbol table.

Indirect triples: the listing of triples has been done and listing pointers are

used instead of using statements.


97


subject: language translators › notes › cse › iii year › language translators › unit...

Documents