cs1352_may09

7/28/2019 CS1352_MAY09

1/14

MAY/JUNE-'09/CS1352-Answer Key

CS1352 Principles of Compiler Design

University Question Key

May/June 2009

PART-A

1. What are the issues to be considered in the design of lexical analyzer?

Simpler design Compiler efficiency is improved Compiler portability is enhanced

2. Define concrete and abstract syntax with example.

Abstract syntax tree is the tree in which node represents an operator and thechildren represents operands. Parse tree is called a concrete syntax tree, which showshow the start symbol of a grammar derives a string in the language.

Abstract syntax tree, or simple syntax tree, differ from parse tree becausesuperficial distinctions of form, unimportant for translation, do not appear insyntax tree.

3. Derive the string and construct a syntax tree for the input string ceaedbeusing the grammar S->SaA|A, A->AbB|B, B->cSd|e.

Derivation:S=> A (S->A)

=> AbB (A->AbB)=> BbB (A->B)=> cSdbB (B->cSd)

=> cSaAdbB (S->SaA)=> cAaAdbB (S->A)=> cBaAdbB (A->B)=> ceaAdbB (B->e)=>ceaBdbB (A->B)=>ceaedbB (B->e)=>ceaedbe (B->e)

4. List the factors to be considered for top-down parsing.Top down parsing is an attempt to find a leftmost derivation for an inputstring.

Left recursive grammar can cause a top-down parser to go into an indefiniteloop on writing procedure.

Backtracking overhead may occur Due to backtracking, it may reject some valid sentences

- 1 -

http://engineerportal.blogspot.in/

7/28/2019 CS1352_MAY09

2/14


Left factoring Ambiguity The order in which alternates are tried can affect the language accepted When failure is reported, we have very little idea where the error actually

occurred

5. Why is it necessary to generate intermediate code instead of generating

target program itself?

a. Retargeting can be facilitated: A Compiler for different machines can becreated by attaching different back end to the existing front ends of eachmachine.

b. A machine independent code optimizer can be applied to intermediatecode in order to optimize the code generation.

6. Define back patching

Back patching is the activity of filling up unspecified information of labels

using appropriate semantic actions in during the code generation process. In thesemantic actions the functions used are mklist(i), merge_list(p1,p2) andbackpatch(p,i).

Source: L2: x= y+1if a or b then L3:

if c then After Backpatching:x= y+1 100: if a goto 103

Translation: 101: if b goto 103if a go to L1 102: goto 106if b go to L1 103: if c goto 105go to L3 104: goto 106

L1: if c goto L2 105: x=y+1goto L3 106:

7. List the issues in code generation.

Input to the code enerator Target programs Memory anagement

Instruction selectionRegister allocation

Choice of evaluation order Approaches to code generation.

8. Write the steps for constructing leaders in basic blocks.

Leaders: The first statement of basic blocks. The first statement is a leader Any statement that is the target of a conditional or unconditional goto is a

leader Any statement that immediately follows a goto or conditional goto statement

is a leader.

- 2 -


7/28/2019 CS1352_MAY09

3/14


9. What are the issues in static allocation?Here, names are bound to storage as the program is compiled, so there is no

need for a run-time support package. The size of the data object and constraints on its position in memory must be

known at compile time.

Recursive procedures are restricted Data structures cannot be created dynamically.

10.What is meant by copy-restore?A hybrid between call-by-value and call-by-reference is copy-restore (also

known as copy-in copy-out, or value-result).a. Before control flows to the called procedure, the actual parameters are

evaluated. The r-values of the actuals are passed to the called procedure as incall-by-value. In addition, the l-values of those actual parameters having l-values are determined before the call.

b. When control returns, the current r-values of the formal parameters are

copied back into the l-values of the actuals, using the l-values computedbefore the call. Only actuals having l-values are copied.

PART B

11.a. i. Explain the need for dividing the compilation process into various phasesand explain its functions. (8)

The process of compilation is very complex. So it comes out to becustomary from the logical as well as implementation point of view to partition thecompilation process into several phases. A phase is a logically cohesiveoperation that takes as input one representation of source program and produces as

output another representation. (2)Source program is a stream of characters: E.g.pos = init + rate * 60 (4) lexical analysis: groups characters into non-separable units, called token, and

generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere

(symbol table). Syntax analysis: checks whether the token stream meets the grammatical

specification of the language and generates the syntax tree. Semantic analysis: checks whether the program has a meaning (e.g. if pos is

a record and init and rate are integers then the assignment does not make asense).

:=

id1 +

id2

*

id3 60

Syntax analysis

:=

id1 +

id2

*

id3 inttoreal

60

Semantic analysis

- 3 -


7/28/2019 CS1352_MAY09

4/14


Intermediate code generation, intermediate code is something that is both closeto the final machine code and easy to manipulate (for optimization). One example isthe three-address code:

dst = op1 op op2 The three-address code for the assignment statement:

temp1 = inttoreal(60);temp2 = id3 * temp1;temp3 = id2 + temp2;id1 = temp3

Code optimization: produces better/semantically equivalent code.temp1 = id3 * 60.0id1 = id2 + temp1

Code generation: generates assemblyMOVF id3, R2MULF #60.0, R2MOVF id2, R1

ADDF R2, R1MOVF R1, id1Symbol Table Creation / Maintenance

Contains Info (storage, type, scope, args) on Each Meaningful Token,typically Identifiers

Data Structure Created / Initialized During Lexical AnalysisUtilized / Updated During Later Analysis & Synthesis

Error HandlingDetection of Different Errors Which Correspond to All PhasesEach phase should know somehow to deal with error, so that compilation

can proceed, to allow further errors to be detectedSource Program

1

2

3

Symbol-table

Manager 4

5

6

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Error Handler

Intermediate Code

Generator

Code Optimizer

Code Generator

Target Program (2)

- 4 -


7/28/2019 CS1352_MAY09

5/14


ii. Explain how abstract stack machines can be used as translators. (8)

The front end of a compiler constructs an intermediate representation ofsource program from which the back end generates the target program. One popularform of intermediate representation is code for an abstract stack machine. Arithmetic instructions

L-values and r-values stack manipulation translation of expressions control flow translation of statements emitting a translation

(OR)

b. What is syntax directed translation? How it is used for translation ofexpressions?

Syntax directed translation

Syntax directed translation scheme is a syntax directed definition in which the neteffect of semantic actions is to print out a translation of the input to a desired outputform. This is accomplished by including emit statements in semantic actions thatwrite out text fragments of the output, as well as string-valued attributes that computetext fragments to be fed into emit statements.Syntax directed definition:

It specifies the translation of a construct in terms of attributes associated with itssyntactic components. It uses CFG to specify the syntactic structure of the input. With eachgrammar symbol, it associates a set of attributes and with each production, a set ofsemantic rules for computing the values of attributes associated with the symbolsappearing in that production. Translation is an input-output mapping. Annotated parse tree Synthesized attributes depth-first traversals Translation schemes Emitting a translation

12.a. Given the following grammar S->AS|b, A->SA|a. Construct a SLR parsingtable for the string baab.

Given grammar:1. S->AS2. S->b.

3. A->SA4. A->a

Augmented grammar:S->SS->ASS->bA->SAA->a

- 5 -


7/28/2019 CS1352_MAY09

6/14


I0: S->.SS->.ASS->.bA->.SAA->.a

I1: goto(I0, S)S->S.A->S.AA->.SAA->.aS->.ASS->.b

I2: goto(I0, A)S->A.SS->.ASS->.b

A->.SAA->.aI3: goto(I0, b)

S->b.I4: goto(I0, a)

A->a.I5: goto(I1, A)

A->SA.S->A.SS->.ASS->.bA->.SAA->.a

I6: goto(I1, S)

First(S)={b, a} First(A)={a, b}Follow(S)={$,a,b} Follow(A)={a,b}

Action

A->S.AA->.SAA->.aS->.ASS->.b

goto(I1, a)=I4goto(I1, b=I3I7: goto(I2, S)

S->AS.A->S.AA->.SAA->.aS->.ASS->.b

goto(I2, A)=I2goto(I2, b)=I3

goto(I2, a)=I4goto(I5, A)=I2goto(I5, S)=I7goto(I5, a)=I4goto(I5,b)=I3goto(I6, A)=I5goto(I6, S)=I6goto(I6, a)=I4goto(I6,b)=I3goto(I7, A)=I5goto(I7, S)=I6goto(I7, a)=I4goto(I7,b)=I3

GotoStates

01

234

5

6

7

a b

S4 S3S4 S3

S4 S3r2 r2r4 r4r3 r3s4 s3S4 S3S4 S3r1 r1

$ S A

1 2acc 6 5

7 2r2

7 2

6 5

r1 6 5

- 6 -


7/28/2019 CS1352_MAY09

7/14


Parsing the string baab:

0 baab$ shift 30b3 aab$ reduce by S->b

0S1 aab$ shift 40S1a4 ab$ reduce by A->a0S1A5 ab$ reduce by A->SA0A2 ab$ shift 40A2a4 b$ reduce by A->a0A2A2 b$ shift 30A2A2b3 $ reduce by S->b0A2A2S7 $ reduce by S->AS0A2S7 $ reduce by S->AS0S1 $ accept

(OR)

b. Consider the grammar E->E+T | T, T->T*F | F, F->(E)|id. Using predictive

parsing, parse the string id+id*id.

Eliminating left recursion: (2)E->TEE->+TE | T->FTT->*FT | F-> (E) | id

Calculation of First: (2)

First (E) = First (T) = First (F) = {(, id}First (E) = {+, }First (T) = {*, }

Calculation of Follow: (2)

Follow (E) = Follow (E) = {), $}Follow (T) = Follow (T) = {+,), $}Follow (F) = {+, *,), $}

Predictive parsing table:(5)

Non Input Symbolterminal id + * ( ) $

E E->TE E->TEE E->+TE E-> E-> T T->FT T->FTT T-> T->*FT T-> T-> F F->id F->(E)

- 7 -


7/28/2019 CS1352_MAY09

8/14


Moves made by predictive parser on id + id*id: (5)

Stack Input Output$E id+id*id$

$ET id+id*id$ E->TE$ETF id+id*id$ T->FT$ETid id+id*id$ F->id$ET +id*id$$E +id*id$ T-> $ET+ +id*id$ E->+TE$ET id*id$$ETF id*id$ T->FT$ETid id*id$ F->id$ET *id$$ETF* *id$ T->*FT

$ETF id$$ETid id$ F->id$ET $$E $ T-> $ $ E->

13.a. Explain in detail how three address codes are generated and implemented.It is one of the intermediate representations. It is a sequence of statements

of the form x:= y op z, where x, y, and z are names, constants or compiler-generated temporaries and op is an operator which can be arithmetic or a logicaloperator. E.g. x+y*z is translated as t1=y*z and t2=x+t1. (4)

Reason for the term three-address code is that each statement usually containsthree addresses, two for the operands and one for the result. (2)Common three address statements: (4)

x:=y op z (assignment statements) x:= op y (assignment statements) x:=y (copy statements) goto L (unconditional jump) Conditional jumps like if x relop y goto L param x, call p,n and return y for procedure callsindexed assignments x:=y[i] and x[i]:= y

address and pointer assignments x:=&y, x:=*y and *x:=yImplementation: (6) Quadruples

Record with four fields, op, arg1, arg2 and result Triples

Record with three fields, op, arg1, arg2 to avoid entering temporarynames into symbol table. Here, refer the temporary value by the position of thestatement that computes it.

- 8 -


7/28/2019 CS1352_MAY09

9/14


Indirect triples

List the pointers to triples rather than listing the triples

For a: = b* -c + b * -cQuadruples

Op arg1 arg2 result(0) uminus c t1(1) * b t1 t2(2) uminus c t3(3) * b t3 t4(4) + t2 t4 t5(5) := t5 aTriples

Op arg1 arg2(0) uminus c(1) * b (0)

(2) uminus c(3) * b (2)(4) + (1) (3)(5) assign a (4)Indirect Triples

Op arg1 arg2 Statement(14) uminus c (0) (14)(15) * b (14) (1) (15)(16) uminus c (2) (16)(17) * b (16) (3) (17)(18) + (15) (17) (4) (18)

(19) assign a (18) (5) (19)(OR)

b. Explain the role of declaration statements in intermediate code generation.

When a sequence of declarations in a procedure or block is examined, layout thestorage for names local to the procedures.Dealing with declarations in Procedures:

P procedure id ; block ;Semantic Rule (2)begin = newlabel;Enter into symbol-table in the entry of the procedure name the begin label.P.code =gen(begin :) || block.code ||

gen(pop return_address) || gen(goto return_address) S call idSemantic Rule

Look up symbol table to find procedure name. Find its begin label called proc_beginreturn = newlabel;

S.code = gen(pushreturn); gen(goto proc_begin) || gen(return :)Using a global variable offset

- 9 -


7/28/2019 CS1352_MAY09

10/14


Computing the types and relative addresses of declared names: (4)

P M D { }M {offset:=0 }D id : T {enter(id.name, T.type, offset)

offset:=offset + T.width }T real {T.type = real; T.width = 8; }T integer {T.type = integer; T.width = 4; }T array [ num ] of T1

{T.type=array(1..num.val,T1.type)T.width = num.val * T1.width}

TT1 {T.type =pointer(T1.type);T1.width = 4}

Keeping Track of Scope Information (4)

Nested Procedure Declarations

For each procedure we should create a symbol table.mktable(previous) create a new symbol table where previous is the parent symboltable of this new symbol table and returns a pointer to the new tableenter(symtable,name,type,offset) create a new entry for a variable in the given

symbol table.enterproc(symtable,name,newsymbtable) create a new entry for the procedure in thesymbol table of its parent.addwidth(symtable,width) puts the total width of all entries in the symbol tableinto the header of that table.We will have two stacks:

tblptr to hold the pointers to the symbol tables of enclosing procedures

offset to hold the current offsets in the symbol tables in tblptr stack.Top element is the next available relative address for a local of the currentprocedure.

Processing declarations in nested procedures (4)

P M D { addwidth(top(tblptr), top(offset));pop(tblptr);pop(offset) }M { t:=mktable(null); push(t, tblptr);push(0, offset)}D D1 ; D2 ...D proc id ; N D ; S { t:=top(tblpr); addwidth(t,top(offset));

pop(tblptr);pop(offset);enterproc(top(tblptr), id.name, t)}

N {t:=mktable(top(tblptr));push(t,tblptr);push(0,offset);}D id : T {enter(top(tblptr), id.name, T.type, top(offset);

top(offset):=top(offset) + T.width

Field names in records

T-> record L D end

L->

(2)

{T.type:=record(top(tblptr)); T.width:=top(offset);pop(tblptr); pop(offset);}{t:=mktable(nil); push(t, tblptr); push(0, offset);}

- 10 -


7/28/2019 CS1352_MAY09

11/14


14.a. Design a simple code generator and explain with example.It generates target code for a sequence of three address statements. (2)

Assumptions:

For each operator in three address statement, there is a corresponding target

language operator. Computed results can be left in registers as long as possible. E.g.a=b+c: (4)

Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1; Add c, Ri where Ri has b and result in Ri. Cost=2; Mov c, Rj; Add Rj, Ri; Cost=3;

Register descriptor: Keeps track of what is currently in each registerAddress descriptor: Keeps tracks of the location where the current value of the name canbe found at run time. (2)Code generation algorithm: For x= y op z (6)

Invoke the function getreg to determine the location L, where the result of y

op z should be stored (register or memory location) Check the address descriptor for y to determine y Generate the instruction op z, L where z is the current location of z If the current values of y and/or z have no next uses, alter register descriptor

Getreg: (2)

If y is in a register that holds the values of no other names and y is not live,

return register of y for L If failed, return empty register If failed, if X has next use, find an occupied register and empty it If X is not used in the block, or suitable register is found, select memory

location of x as L(OR)

b. Write short notes on: i. Peep hole optimization

Peephole optimization is a simple and effective technique for locallyimproving target code. This technique is applied to improve the performance of thetarget program by examining the short sequence of target instructions and replacingthese instructions by shorter or faster sequence, whenever is possible.Peep hole is a small, moving window on the target program.

Local in nature Pattern driven

Limited by the size of the windowCharacteristics of peephole optimization:

Redundant instruction elimination Flow of control optimization Algebraic simplification Use of machine idioms

Constant Foldingx := 32x := x + 32 becomes x := 64

- 11 -


7/28/2019 CS1352_MAY09

12/14


Unreachable CodeAn unlabeled instruction immediately following an unconditional jump isremoved.goto L2x := x + 1 unneeded

Flow of control optimizationsUnnecessary jumps are eliminated.goto L1

L1: goto L2 becomes goto L2 Algebraic Simplification

x := x + 0 unneeded Dead code elimination

x := 32 where x not used after statementy := x + y y := y + 32

Reduction in strengthReplace expensive operations by equivalent cheaper ones

x := x * 2 x := x + x

ii: Issues in code generation

Input to the code generatorIntermediate representation of the source program, like linear

representations such as postfix notation, three address representations such asquadruples, virtual machine representations such as stack machine code andgraphical representations such as syntax trees and dags.

Target programsIt is the output such as absolute machine language, relocatable

machine language or assembly language. Memory managementMapping of names in the source program to addresses of data object in run

time memory is done by front end and the code generator. Instruction selection

Nature of the instruction set of the target machine determines the difficulty ofinstruction selection.

Register allocationInstructions involving registers are shorter and faster. The use of

registers is being divided into two sub problems:o During register allocation, we select the set of variables that will reside

in registers at a point in the programo During a subsequent register assignment phase, we pick the specific

register that a variable will reside in Choice of evaluation order

The order in which computations are performed affect the efficiency oftarget code. Approaches to code generation

- 12 -


7/28/2019 CS1352_MAY09

13/14


15.a. Explain with an example how basic blocks are optimized.Code improving transformations:

Structure-preserving transformationso Common sub expression eliminationo Dead-code eliminations

Algebraic transformations like reduction in strength.Structure preserving transformations: (8)It is implemented by constructing a dag for a basic block. Common sub

expression can be detected by noticing, as a new node m is about to be added,whether there is an existing node n with the same children, in the same order, andwith the same operator. If so, n computes the same value as m and may be used in itsplace.E.g. DAG for the basic block

d:=b*ce:=a+bb:=b*c

a:=e-d is given by

For dead-code elimination, delete from a dag any root (root with no ancestors) thathas no live variables. Repeated application of this will remove all nodes from the dag thatcorresponds to dead code.Use of algebraic identities: (8)

e.g. x+0 = 0+x=xx-0 = xx*1 = 1*x = xx/1 = x

Reduction in strength:Replace expensive operator by a cheaper one.

x ** 2 = x * x

Constant folding:Evaluate constant expressions at compile time and replace them by their values.

Can use commutative and associative lawsE.g. a=b+c

e=c+d+bIC: a=b+c

t=c+de=t+b

- 13 -


7/28/2019 CS1352_MAY09

14/14


If t is not needed outside the block, change this toa=b+ce=a+d

using both the associativity and commutativity of +.

(OR)b. Explain the storage allocation strategies used in run time environments.

Static allocation lays out storage for all data objects during compile time Stack allocation manages the run-time storage as a stack Heap allocation allocates and deallocates storages as needed at runtime from heap

areaStatic allocation: (4)

Names are bound to storage at compile time No need for run-time support package When a procedure is activated, its names are bound to same storage location. Compiler must decide where activation records should go.

Limitations: size must be known at compile time recursive procedures are restricted data structures cant be created dynamically

Stack allocation: (6) Activation records are pushed and popped as activations begin and end. Locals are bound to fresh storage in each activation and deleted when the

activation ends. Call sequence and return sequence caller and callee Dangling references

Heap allocation: (6)Stack allocation cannot be used if either of the following is possible:

1. The values of local names must be retained when an activation ends2. A called activation outlives the caller. Allocate pieces of memory for activation records, which can be deallocated in any

order Maintain linked list of free blocks Fill a request for size s with a block of size s, where s is the smallest size

greater than or equal to s Use heap manager, which takes care of defragmentation and garbage collection.

- 14 -

cs1352_may09

Documents