cs1352_may09
TRANSCRIPT
-
7/28/2019 CS1352_MAY09
1/14
MAY/JUNE-'09/CS1352-Answer Key
CS1352 Principles of Compiler Design
University Question Key
May/June 2009
PART-A
1. What are the issues to be considered in the design of lexical analyzer?
Simpler design Compiler efficiency is improved Compiler portability is enhanced
2. Define concrete and abstract syntax with example.
Abstract syntax tree is the tree in which node represents an operator and thechildren represents operands. Parse tree is called a concrete syntax tree, which showshow the start symbol of a grammar derives a string in the language.
Abstract syntax tree, or simple syntax tree, differ from parse tree becausesuperficial distinctions of form, unimportant for translation, do not appear insyntax tree.
3. Derive the string and construct a syntax tree for the input string ceaedbeusing the grammar S->SaA|A, A->AbB|B, B->cSd|e.
Derivation:S=> A (S->A)
=> AbB (A->AbB)=> BbB (A->B)=> cSdbB (B->cSd)
=> cSaAdbB (S->SaA)=> cAaAdbB (S->A)=> cBaAdbB (A->B)=> ceaAdbB (B->e)=>ceaBdbB (A->B)=>ceaedbB (B->e)=>ceaedbe (B->e)
4. List the factors to be considered for top-down parsing.Top down parsing is an attempt to find a leftmost derivation for an inputstring.
Left recursive grammar can cause a top-down parser to go into an indefiniteloop on writing procedure.
Backtracking overhead may occur Due to backtracking, it may reject some valid sentences
- 1 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
2/14
MAY/JUNE-'09/CS1352-Answer Key
Left factoring Ambiguity The order in which alternates are tried can affect the language accepted When failure is reported, we have very little idea where the error actually
occurred
5. Why is it necessary to generate intermediate code instead of generating
target program itself?
a. Retargeting can be facilitated: A Compiler for different machines can becreated by attaching different back end to the existing front ends of eachmachine.
b. A machine independent code optimizer can be applied to intermediatecode in order to optimize the code generation.
6. Define back patching
Back patching is the activity of filling up unspecified information of labels
using appropriate semantic actions in during the code generation process. In thesemantic actions the functions used are mklist(i), merge_list(p1,p2) andbackpatch(p,i).
Source: L2: x= y+1if a or b then L3:
if c then After Backpatching:x= y+1 100: if a goto 103
Translation: 101: if b goto 103if a go to L1 102: goto 106if b go to L1 103: if c goto 105go to L3 104: goto 106
L1: if c goto L2 105: x=y+1goto L3 106:
7. List the issues in code generation.
Input to the code enerator Target programs Memory anagement
Instruction selectionRegister allocation
Choice of evaluation order Approaches to code generation.
8. Write the steps for constructing leaders in basic blocks.
Leaders: The first statement of basic blocks. The first statement is a leader Any statement that is the target of a conditional or unconditional goto is a
leader Any statement that immediately follows a goto or conditional goto statement
is a leader.
- 2 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
3/14
MAY/JUNE-'09/CS1352-Answer Key
9. What are the issues in static allocation?Here, names are bound to storage as the program is compiled, so there is no
need for a run-time support package. The size of the data object and constraints on its position in memory must be
known at compile time.
Recursive procedures are restricted Data structures cannot be created dynamically.
10.What is meant by copy-restore?A hybrid between call-by-value and call-by-reference is copy-restore (also
known as copy-in copy-out, or value-result).a. Before control flows to the called procedure, the actual parameters are
evaluated. The r-values of the actuals are passed to the called procedure as incall-by-value. In addition, the l-values of those actual parameters having l-values are determined before the call.
b. When control returns, the current r-values of the formal parameters are
copied back into the l-values of the actuals, using the l-values computedbefore the call. Only actuals having l-values are copied.
PART B
11.a. i. Explain the need for dividing the compilation process into various phasesand explain its functions. (8)
The process of compilation is very complex. So it comes out to becustomary from the logical as well as implementation point of view to partition thecompilation process into several phases. A phase is a logically cohesiveoperation that takes as input one representation of source program and produces as
output another representation. (2)Source program is a stream of characters: E.g.pos = init + rate * 60 (4) lexical analysis: groups characters into non-separable units, called token, and
generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere
(symbol table). Syntax analysis: checks whether the token stream meets the grammatical
specification of the language and generates the syntax tree. Semantic analysis: checks whether the program has a meaning (e.g. if pos is
a record and init and rate are integers then the assignment does not make asense).
:=
id1 +
id2
*
id3 60
Syntax analysis
:=
id1 +
id2
*
id3 inttoreal
60
Semantic analysis
- 3 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
4/14
MAY/JUNE-'09/CS1352-Answer Key
Intermediate code generation, intermediate code is something that is both closeto the final machine code and easy to manipulate (for optimization). One example isthe three-address code:
dst = op1 op op2 The three-address code for the assignment statement:
temp1 = inttoreal(60);temp2 = id3 * temp1;temp3 = id2 + temp2;id1 = temp3
Code optimization: produces better/semantically equivalent code.temp1 = id3 * 60.0id1 = id2 + temp1
Code generation: generates assemblyMOVF id3, R2MULF #60.0, R2MOVF id2, R1
ADDF R2, R1MOVF R1, id1Symbol Table Creation / Maintenance
Contains Info (storage, type, scope, args) on Each Meaningful Token,typically Identifiers
Data Structure Created / Initialized During Lexical AnalysisUtilized / Updated During Later Analysis & Synthesis
Error HandlingDetection of Different Errors Which Correspond to All PhasesEach phase should know somehow to deal with error, so that compilation
can proceed, to allow further errors to be detectedSource Program
1
2
3
Symbol-table
Manager 4
5
6
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Error Handler
Intermediate Code
Generator
Code Optimizer
Code Generator
Target Program (2)
- 4 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
5/14
MAY/JUNE-'09/CS1352-Answer Key
ii. Explain how abstract stack machines can be used as translators. (8)
The front end of a compiler constructs an intermediate representation ofsource program from which the back end generates the target program. One popularform of intermediate representation is code for an abstract stack machine. Arithmetic instructions
L-values and r-values stack manipulation translation of expressions control flow translation of statements emitting a translation
(OR)
b. What is syntax directed translation? How it is used for translation ofexpressions?
Syntax directed translation
Syntax directed translation scheme is a syntax directed definition in which the neteffect of semantic actions is to print out a translation of the input to a desired outputform. This is accomplished by including emit statements in semantic actions thatwrite out text fragments of the output, as well as string-valued attributes that computetext fragments to be fed into emit statements.Syntax directed definition:
It specifies the translation of a construct in terms of attributes associated with itssyntactic components. It uses CFG to specify the syntactic structure of the input. With eachgrammar symbol, it associates a set of attributes and with each production, a set ofsemantic rules for computing the values of attributes associated with the symbolsappearing in that production. Translation is an input-output mapping. Annotated parse tree Synthesized attributes depth-first traversals Translation schemes Emitting a translation
12.a. Given the following grammar S->AS|b, A->SA|a. Construct a SLR parsingtable for the string baab.
Given grammar:1. S->AS2. S->b.
3. A->SA4. A->a
Augmented grammar:S->SS->ASS->bA->SAA->a
- 5 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
6/14
MAY/JUNE-'09/CS1352-Answer Key
I0: S->.SS->.ASS->.bA->.SAA->.a
I1: goto(I0, S)S->S.A->S.AA->.SAA->.aS->.ASS->.b
I2: goto(I0, A)S->A.SS->.ASS->.b
A->.SAA->.aI3: goto(I0, b)
S->b.I4: goto(I0, a)
A->a.I5: goto(I1, A)
A->SA.S->A.SS->.ASS->.bA->.SAA->.a
I6: goto(I1, S)
First(S)={b, a} First(A)={a, b}Follow(S)={$,a,b} Follow(A)={a,b}
Action
A->S.AA->.SAA->.aS->.ASS->.b
goto(I1, a)=I4goto(I1, b=I3I7: goto(I2, S)
S->AS.A->S.AA->.SAA->.aS->.ASS->.b
goto(I2, A)=I2goto(I2, b)=I3
goto(I2, a)=I4goto(I5, A)=I2goto(I5, S)=I7goto(I5, a)=I4goto(I5,b)=I3goto(I6, A)=I5goto(I6, S)=I6goto(I6, a)=I4goto(I6,b)=I3goto(I7, A)=I5goto(I7, S)=I6goto(I7, a)=I4goto(I7,b)=I3
GotoStates
01
234
5
6
7
a b
S4 S3S4 S3
S4 S3r2 r2r4 r4r3 r3s4 s3S4 S3S4 S3r1 r1
$ S A
1 2acc 6 5
7 2r2
7 2
6 5
r1 6 5
- 6 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
7/14
MAY/JUNE-'09/CS1352-Answer Key
Parsing the string baab:
0 baab$ shift 30b3 aab$ reduce by S->b
0S1 aab$ shift 40S1a4 ab$ reduce by A->a0S1A5 ab$ reduce by A->SA0A2 ab$ shift 40A2a4 b$ reduce by A->a0A2A2 b$ shift 30A2A2b3 $ reduce by S->b0A2A2S7 $ reduce by S->AS0A2S7 $ reduce by S->AS0S1 $ accept
(OR)
b. Consider the grammar E->E+T | T, T->T*F | F, F->(E)|id. Using predictive
parsing, parse the string id+id*id.
Eliminating left recursion: (2)E->TEE->+TE | T->FTT->*FT | F-> (E) | id
Calculation of First: (2)
First (E) = First (T) = First (F) = {(, id}First (E) = {+, }First (T) = {*, }
Calculation of Follow: (2)
Follow (E) = Follow (E) = {), $}Follow (T) = Follow (T) = {+,), $}Follow (F) = {+, *,), $}
Predictive parsing table:(5)
Non Input Symbolterminal id + * ( ) $
E E->TE E->TEE E->+TE E-> E-> T T->FT T->FTT T-> T->*FT T-> T-> F F->id F->(E)
- 7 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
8/14
MAY/JUNE-'09/CS1352-Answer Key
Moves made by predictive parser on id + id*id: (5)
Stack Input Output$E id+id*id$
$ET id+id*id$ E->TE$ETF id+id*id$ T->FT$ETid id+id*id$ F->id$ET +id*id$$E +id*id$ T-> $ET+ +id*id$ E->+TE$ET id*id$$ETF id*id$ T->FT$ETid id*id$ F->id$ET *id$$ETF* *id$ T->*FT
$ETF id$$ETid id$ F->id$ET $$E $ T-> $ $ E->
13.a. Explain in detail how three address codes are generated and implemented.It is one of the intermediate representations. It is a sequence of statements
of the form x:= y op z, where x, y, and z are names, constants or compiler-generated temporaries and op is an operator which can be arithmetic or a logicaloperator. E.g. x+y*z is translated as t1=y*z and t2=x+t1. (4)
Reason for the term three-address code is that each statement usually containsthree addresses, two for the operands and one for the result. (2)Common three address statements: (4)
x:=y op z (assignment statements) x:= op y (assignment statements) x:=y (copy statements) goto L (unconditional jump) Conditional jumps like if x relop y goto L param x, call p,n and return y for procedure callsindexed assignments x:=y[i] and x[i]:= y
address and pointer assignments x:=&y, x:=*y and *x:=yImplementation: (6) Quadruples
Record with four fields, op, arg1, arg2 and result Triples
Record with three fields, op, arg1, arg2 to avoid entering temporarynames into symbol table. Here, refer the temporary value by the position of thestatement that computes it.
- 8 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
9/14
MAY/JUNE-'09/CS1352-Answer Key
Indirect triples
List the pointers to triples rather than listing the triples
For a: = b* -c + b * -cQuadruples
Op arg1 arg2 result(0) uminus c t1(1) * b t1 t2(2) uminus c t3(3) * b t3 t4(4) + t2 t4 t5(5) := t5 aTriples
Op arg1 arg2(0) uminus c(1) * b (0)
(2) uminus c(3) * b (2)(4) + (1) (3)(5) assign a (4)Indirect Triples
Op arg1 arg2 Statement(14) uminus c (0) (14)(15) * b (14) (1) (15)(16) uminus c (2) (16)(17) * b (16) (3) (17)(18) + (15) (17) (4) (18)
(19) assign a (18) (5) (19)(OR)
b. Explain the role of declaration statements in intermediate code generation.
When a sequence of declarations in a procedure or block is examined, layout thestorage for names local to the procedures.Dealing with declarations in Procedures:
P procedure id ; block ;Semantic Rule (2)begin = newlabel;Enter into symbol-table in the entry of the procedure name the begin label.P.code =gen(begin :) || block.code ||
gen(pop return_address) || gen(goto return_address) S call idSemantic Rule
Look up symbol table to find procedure name. Find its begin label called proc_beginreturn = newlabel;
S.code = gen(pushreturn); gen(goto proc_begin) || gen(return :)Using a global variable offset
- 9 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
10/14
MAY/JUNE-'09/CS1352-Answer Key
Computing the types and relative addresses of declared names: (4)
P M D { }M {offset:=0 }D id : T {enter(id.name, T.type, offset)
offset:=offset + T.width }T real {T.type = real; T.width = 8; }T integer {T.type = integer; T.width = 4; }T array [ num ] of T1
{T.type=array(1..num.val,T1.type)T.width = num.val * T1.width}
TT1 {T.type =pointer(T1.type);T1.width = 4}
Keeping Track of Scope Information (4)
Nested Procedure Declarations
For each procedure we should create a symbol table.mktable(previous) create a new symbol table where previous is the parent symboltable of this new symbol table and returns a pointer to the new tableenter(symtable,name,type,offset) create a new entry for a variable in the given
symbol table.enterproc(symtable,name,newsymbtable) create a new entry for the procedure in thesymbol table of its parent.addwidth(symtable,width) puts the total width of all entries in the symbol tableinto the header of that table.We will have two stacks:
tblptr to hold the pointers to the symbol tables of enclosing procedures
offset to hold the current offsets in the symbol tables in tblptr stack.Top element is the next available relative address for a local of the currentprocedure.
Processing declarations in nested procedures (4)
P M D { addwidth(top(tblptr), top(offset));pop(tblptr);pop(offset) }M { t:=mktable(null); push(t, tblptr);push(0, offset)}D D1 ; D2 ...D proc id ; N D ; S { t:=top(tblpr); addwidth(t,top(offset));
pop(tblptr);pop(offset);enterproc(top(tblptr), id.name, t)}
N {t:=mktable(top(tblptr));push(t,tblptr);push(0,offset);}D id : T {enter(top(tblptr), id.name, T.type, top(offset);
top(offset):=top(offset) + T.width
Field names in records
T-> record L D end
L->
(2)
{T.type:=record(top(tblptr)); T.width:=top(offset);pop(tblptr); pop(offset);}{t:=mktable(nil); push(t, tblptr); push(0, offset);}
- 10 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
11/14
MAY/JUNE-'09/CS1352-Answer Key
14.a. Design a simple code generator and explain with example.It generates target code for a sequence of three address statements. (2)
Assumptions:
For each operator in three address statement, there is a corresponding target
language operator. Computed results can be left in registers as long as possible. E.g.a=b+c: (4)
Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1; Add c, Ri where Ri has b and result in Ri. Cost=2; Mov c, Rj; Add Rj, Ri; Cost=3;
Register descriptor: Keeps track of what is currently in each registerAddress descriptor: Keeps tracks of the location where the current value of the name canbe found at run time. (2)Code generation algorithm: For x= y op z (6)
Invoke the function getreg to determine the location L, where the result of y
op z should be stored (register or memory location) Check the address descriptor for y to determine y Generate the instruction op z, L where z is the current location of z If the current values of y and/or z have no next uses, alter register descriptor
Getreg: (2)
If y is in a register that holds the values of no other names and y is not live,
return register of y for L If failed, return empty register If failed, if X has next use, find an occupied register and empty it If X is not used in the block, or suitable register is found, select memory
location of x as L(OR)
b. Write short notes on: i. Peep hole optimization
Peephole optimization is a simple and effective technique for locallyimproving target code. This technique is applied to improve the performance of thetarget program by examining the short sequence of target instructions and replacingthese instructions by shorter or faster sequence, whenever is possible.Peep hole is a small, moving window on the target program.
Local in nature Pattern driven
Limited by the size of the windowCharacteristics of peephole optimization:
Redundant instruction elimination Flow of control optimization Algebraic simplification Use of machine idioms
Constant Foldingx := 32x := x + 32 becomes x := 64
- 11 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
12/14
MAY/JUNE-'09/CS1352-Answer Key
Unreachable CodeAn unlabeled instruction immediately following an unconditional jump isremoved.goto L2x := x + 1 unneeded
Flow of control optimizationsUnnecessary jumps are eliminated.goto L1
L1: goto L2 becomes goto L2 Algebraic Simplification
x := x + 0 unneeded Dead code elimination
x := 32 where x not used after statementy := x + y y := y + 32
Reduction in strengthReplace expensive operations by equivalent cheaper ones
x := x * 2 x := x + x
ii: Issues in code generation
Input to the code generatorIntermediate representation of the source program, like linear
representations such as postfix notation, three address representations such asquadruples, virtual machine representations such as stack machine code andgraphical representations such as syntax trees and dags.
Target programsIt is the output such as absolute machine language, relocatable
machine language or assembly language. Memory managementMapping of names in the source program to addresses of data object in run
time memory is done by front end and the code generator. Instruction selection
Nature of the instruction set of the target machine determines the difficulty ofinstruction selection.
Register allocationInstructions involving registers are shorter and faster. The use of
registers is being divided into two sub problems:o During register allocation, we select the set of variables that will reside
in registers at a point in the programo During a subsequent register assignment phase, we pick the specific
register that a variable will reside in Choice of evaluation order
The order in which computations are performed affect the efficiency oftarget code. Approaches to code generation
- 12 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
13/14
MAY/JUNE-'09/CS1352-Answer Key
15.a. Explain with an example how basic blocks are optimized.Code improving transformations:
Structure-preserving transformationso Common sub expression eliminationo Dead-code eliminations
Algebraic transformations like reduction in strength.Structure preserving transformations: (8)It is implemented by constructing a dag for a basic block. Common sub
expression can be detected by noticing, as a new node m is about to be added,whether there is an existing node n with the same children, in the same order, andwith the same operator. If so, n computes the same value as m and may be used in itsplace.E.g. DAG for the basic block
d:=b*ce:=a+bb:=b*c
a:=e-d is given by
For dead-code elimination, delete from a dag any root (root with no ancestors) thathas no live variables. Repeated application of this will remove all nodes from the dag thatcorresponds to dead code.Use of algebraic identities: (8)
e.g. x+0 = 0+x=xx-0 = xx*1 = 1*x = xx/1 = x
Reduction in strength:Replace expensive operator by a cheaper one.
x ** 2 = x * x
Constant folding:Evaluate constant expressions at compile time and replace them by their values.
Can use commutative and associative lawsE.g. a=b+c
e=c+d+bIC: a=b+c
t=c+de=t+b
- 13 -
http://engineerportal.blogspot.in/
-
7/28/2019 CS1352_MAY09
14/14
MAY/JUNE-'09/CS1352-Answer Key
If t is not needed outside the block, change this toa=b+ce=a+d
using both the associativity and commutativity of +.
(OR)b. Explain the storage allocation strategies used in run time environments.
Static allocation lays out storage for all data objects during compile time Stack allocation manages the run-time storage as a stack Heap allocation allocates and deallocates storages as needed at runtime from heap
areaStatic allocation: (4)
Names are bound to storage at compile time No need for run-time support package When a procedure is activated, its names are bound to same storage location. Compiler must decide where activation records should go.
Limitations: size must be known at compile time recursive procedures are restricted data structures cant be created dynamically
Stack allocation: (6) Activation records are pushed and popped as activations begin and end. Locals are bound to fresh storage in each activation and deleted when the
activation ends. Call sequence and return sequence caller and callee Dangling references
Heap allocation: (6)Stack allocation cannot be used if either of the following is possible:
1. The values of local names must be retained when an activation ends2. A called activation outlives the caller. Allocate pieces of memory for activation records, which can be deallocated in any
order Maintain linked list of free blocks Fill a request for size s with a block of size s, where s is the smallest size
greater than or equal to s Use heap manager, which takes care of defragmentation and garbage collection.
- 14 -