cs1352_may09

Upload: sridharanc23

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 CS1352_MAY09

    1/14

    MAY/JUNE-'09/CS1352-Answer Key

    CS1352 Principles of Compiler Design

    University Question Key

    May/June 2009

    PART-A

    1. What are the issues to be considered in the design of lexical analyzer?

    Simpler design Compiler efficiency is improved Compiler portability is enhanced

    2. Define concrete and abstract syntax with example.

    Abstract syntax tree is the tree in which node represents an operator and thechildren represents operands. Parse tree is called a concrete syntax tree, which showshow the start symbol of a grammar derives a string in the language.

    Abstract syntax tree, or simple syntax tree, differ from parse tree becausesuperficial distinctions of form, unimportant for translation, do not appear insyntax tree.

    3. Derive the string and construct a syntax tree for the input string ceaedbeusing the grammar S->SaA|A, A->AbB|B, B->cSd|e.

    Derivation:S=> A (S->A)

    => AbB (A->AbB)=> BbB (A->B)=> cSdbB (B->cSd)

    => cSaAdbB (S->SaA)=> cAaAdbB (S->A)=> cBaAdbB (A->B)=> ceaAdbB (B->e)=>ceaBdbB (A->B)=>ceaedbB (B->e)=>ceaedbe (B->e)

    4. List the factors to be considered for top-down parsing.Top down parsing is an attempt to find a leftmost derivation for an inputstring.

    Left recursive grammar can cause a top-down parser to go into an indefiniteloop on writing procedure.

    Backtracking overhead may occur Due to backtracking, it may reject some valid sentences

    - 1 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    2/14

    MAY/JUNE-'09/CS1352-Answer Key

    Left factoring Ambiguity The order in which alternates are tried can affect the language accepted When failure is reported, we have very little idea where the error actually

    occurred

    5. Why is it necessary to generate intermediate code instead of generating

    target program itself?

    a. Retargeting can be facilitated: A Compiler for different machines can becreated by attaching different back end to the existing front ends of eachmachine.

    b. A machine independent code optimizer can be applied to intermediatecode in order to optimize the code generation.

    6. Define back patching

    Back patching is the activity of filling up unspecified information of labels

    using appropriate semantic actions in during the code generation process. In thesemantic actions the functions used are mklist(i), merge_list(p1,p2) andbackpatch(p,i).

    Source: L2: x= y+1if a or b then L3:

    if c then After Backpatching:x= y+1 100: if a goto 103

    Translation: 101: if b goto 103if a go to L1 102: goto 106if b go to L1 103: if c goto 105go to L3 104: goto 106

    L1: if c goto L2 105: x=y+1goto L3 106:

    7. List the issues in code generation.

    Input to the code enerator Target programs Memory anagement

    Instruction selectionRegister allocation

    Choice of evaluation order Approaches to code generation.

    8. Write the steps for constructing leaders in basic blocks.

    Leaders: The first statement of basic blocks. The first statement is a leader Any statement that is the target of a conditional or unconditional goto is a

    leader Any statement that immediately follows a goto or conditional goto statement

    is a leader.

    - 2 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    3/14

    MAY/JUNE-'09/CS1352-Answer Key

    9. What are the issues in static allocation?Here, names are bound to storage as the program is compiled, so there is no

    need for a run-time support package. The size of the data object and constraints on its position in memory must be

    known at compile time.

    Recursive procedures are restricted Data structures cannot be created dynamically.

    10.What is meant by copy-restore?A hybrid between call-by-value and call-by-reference is copy-restore (also

    known as copy-in copy-out, or value-result).a. Before control flows to the called procedure, the actual parameters are

    evaluated. The r-values of the actuals are passed to the called procedure as incall-by-value. In addition, the l-values of those actual parameters having l-values are determined before the call.

    b. When control returns, the current r-values of the formal parameters are

    copied back into the l-values of the actuals, using the l-values computedbefore the call. Only actuals having l-values are copied.

    PART B

    11.a. i. Explain the need for dividing the compilation process into various phasesand explain its functions. (8)

    The process of compilation is very complex. So it comes out to becustomary from the logical as well as implementation point of view to partition thecompilation process into several phases. A phase is a logically cohesiveoperation that takes as input one representation of source program and produces as

    output another representation. (2)Source program is a stream of characters: E.g.pos = init + rate * 60 (4) lexical analysis: groups characters into non-separable units, called token, and

    generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere

    (symbol table). Syntax analysis: checks whether the token stream meets the grammatical

    specification of the language and generates the syntax tree. Semantic analysis: checks whether the program has a meaning (e.g. if pos is

    a record and init and rate are integers then the assignment does not make asense).

    :=

    id1 +

    id2

    *

    id3 60

    Syntax analysis

    :=

    id1 +

    id2

    *

    id3 inttoreal

    60

    Semantic analysis

    - 3 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    4/14

    MAY/JUNE-'09/CS1352-Answer Key

    Intermediate code generation, intermediate code is something that is both closeto the final machine code and easy to manipulate (for optimization). One example isthe three-address code:

    dst = op1 op op2 The three-address code for the assignment statement:

    temp1 = inttoreal(60);temp2 = id3 * temp1;temp3 = id2 + temp2;id1 = temp3

    Code optimization: produces better/semantically equivalent code.temp1 = id3 * 60.0id1 = id2 + temp1

    Code generation: generates assemblyMOVF id3, R2MULF #60.0, R2MOVF id2, R1

    ADDF R2, R1MOVF R1, id1Symbol Table Creation / Maintenance

    Contains Info (storage, type, scope, args) on Each Meaningful Token,typically Identifiers

    Data Structure Created / Initialized During Lexical AnalysisUtilized / Updated During Later Analysis & Synthesis

    Error HandlingDetection of Different Errors Which Correspond to All PhasesEach phase should know somehow to deal with error, so that compilation

    can proceed, to allow further errors to be detectedSource Program

    1

    2

    3

    Symbol-table

    Manager 4

    5

    6

    Lexical Analyzer

    Syntax Analyzer

    Semantic Analyzer

    Error Handler

    Intermediate Code

    Generator

    Code Optimizer

    Code Generator

    Target Program (2)

    - 4 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    5/14

    MAY/JUNE-'09/CS1352-Answer Key

    ii. Explain how abstract stack machines can be used as translators. (8)

    The front end of a compiler constructs an intermediate representation ofsource program from which the back end generates the target program. One popularform of intermediate representation is code for an abstract stack machine. Arithmetic instructions

    L-values and r-values stack manipulation translation of expressions control flow translation of statements emitting a translation

    (OR)

    b. What is syntax directed translation? How it is used for translation ofexpressions?

    Syntax directed translation

    Syntax directed translation scheme is a syntax directed definition in which the neteffect of semantic actions is to print out a translation of the input to a desired outputform. This is accomplished by including emit statements in semantic actions thatwrite out text fragments of the output, as well as string-valued attributes that computetext fragments to be fed into emit statements.Syntax directed definition:

    It specifies the translation of a construct in terms of attributes associated with itssyntactic components. It uses CFG to specify the syntactic structure of the input. With eachgrammar symbol, it associates a set of attributes and with each production, a set ofsemantic rules for computing the values of attributes associated with the symbolsappearing in that production. Translation is an input-output mapping. Annotated parse tree Synthesized attributes depth-first traversals Translation schemes Emitting a translation

    12.a. Given the following grammar S->AS|b, A->SA|a. Construct a SLR parsingtable for the string baab.

    Given grammar:1. S->AS2. S->b.

    3. A->SA4. A->a

    Augmented grammar:S->SS->ASS->bA->SAA->a

    - 5 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    6/14

    MAY/JUNE-'09/CS1352-Answer Key

    I0: S->.SS->.ASS->.bA->.SAA->.a

    I1: goto(I0, S)S->S.A->S.AA->.SAA->.aS->.ASS->.b

    I2: goto(I0, A)S->A.SS->.ASS->.b

    A->.SAA->.aI3: goto(I0, b)

    S->b.I4: goto(I0, a)

    A->a.I5: goto(I1, A)

    A->SA.S->A.SS->.ASS->.bA->.SAA->.a

    I6: goto(I1, S)

    First(S)={b, a} First(A)={a, b}Follow(S)={$,a,b} Follow(A)={a,b}

    Action

    A->S.AA->.SAA->.aS->.ASS->.b

    goto(I1, a)=I4goto(I1, b=I3I7: goto(I2, S)

    S->AS.A->S.AA->.SAA->.aS->.ASS->.b

    goto(I2, A)=I2goto(I2, b)=I3

    goto(I2, a)=I4goto(I5, A)=I2goto(I5, S)=I7goto(I5, a)=I4goto(I5,b)=I3goto(I6, A)=I5goto(I6, S)=I6goto(I6, a)=I4goto(I6,b)=I3goto(I7, A)=I5goto(I7, S)=I6goto(I7, a)=I4goto(I7,b)=I3

    GotoStates

    01

    234

    5

    6

    7

    a b

    S4 S3S4 S3

    S4 S3r2 r2r4 r4r3 r3s4 s3S4 S3S4 S3r1 r1

    $ S A

    1 2acc 6 5

    7 2r2

    7 2

    6 5

    r1 6 5

    - 6 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    7/14

    MAY/JUNE-'09/CS1352-Answer Key

    Parsing the string baab:

    0 baab$ shift 30b3 aab$ reduce by S->b

    0S1 aab$ shift 40S1a4 ab$ reduce by A->a0S1A5 ab$ reduce by A->SA0A2 ab$ shift 40A2a4 b$ reduce by A->a0A2A2 b$ shift 30A2A2b3 $ reduce by S->b0A2A2S7 $ reduce by S->AS0A2S7 $ reduce by S->AS0S1 $ accept

    (OR)

    b. Consider the grammar E->E+T | T, T->T*F | F, F->(E)|id. Using predictive

    parsing, parse the string id+id*id.

    Eliminating left recursion: (2)E->TEE->+TE | T->FTT->*FT | F-> (E) | id

    Calculation of First: (2)

    First (E) = First (T) = First (F) = {(, id}First (E) = {+, }First (T) = {*, }

    Calculation of Follow: (2)

    Follow (E) = Follow (E) = {), $}Follow (T) = Follow (T) = {+,), $}Follow (F) = {+, *,), $}

    Predictive parsing table:(5)

    Non Input Symbolterminal id + * ( ) $

    E E->TE E->TEE E->+TE E-> E-> T T->FT T->FTT T-> T->*FT T-> T-> F F->id F->(E)

    - 7 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    8/14

    MAY/JUNE-'09/CS1352-Answer Key

    Moves made by predictive parser on id + id*id: (5)

    Stack Input Output$E id+id*id$

    $ET id+id*id$ E->TE$ETF id+id*id$ T->FT$ETid id+id*id$ F->id$ET +id*id$$E +id*id$ T-> $ET+ +id*id$ E->+TE$ET id*id$$ETF id*id$ T->FT$ETid id*id$ F->id$ET *id$$ETF* *id$ T->*FT

    $ETF id$$ETid id$ F->id$ET $$E $ T-> $ $ E->

    13.a. Explain in detail how three address codes are generated and implemented.It is one of the intermediate representations. It is a sequence of statements

    of the form x:= y op z, where x, y, and z are names, constants or compiler-generated temporaries and op is an operator which can be arithmetic or a logicaloperator. E.g. x+y*z is translated as t1=y*z and t2=x+t1. (4)

    Reason for the term three-address code is that each statement usually containsthree addresses, two for the operands and one for the result. (2)Common three address statements: (4)

    x:=y op z (assignment statements) x:= op y (assignment statements) x:=y (copy statements) goto L (unconditional jump) Conditional jumps like if x relop y goto L param x, call p,n and return y for procedure callsindexed assignments x:=y[i] and x[i]:= y

    address and pointer assignments x:=&y, x:=*y and *x:=yImplementation: (6) Quadruples

    Record with four fields, op, arg1, arg2 and result Triples

    Record with three fields, op, arg1, arg2 to avoid entering temporarynames into symbol table. Here, refer the temporary value by the position of thestatement that computes it.

    - 8 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    9/14

    MAY/JUNE-'09/CS1352-Answer Key

    Indirect triples

    List the pointers to triples rather than listing the triples

    For a: = b* -c + b * -cQuadruples

    Op arg1 arg2 result(0) uminus c t1(1) * b t1 t2(2) uminus c t3(3) * b t3 t4(4) + t2 t4 t5(5) := t5 aTriples

    Op arg1 arg2(0) uminus c(1) * b (0)

    (2) uminus c(3) * b (2)(4) + (1) (3)(5) assign a (4)Indirect Triples

    Op arg1 arg2 Statement(14) uminus c (0) (14)(15) * b (14) (1) (15)(16) uminus c (2) (16)(17) * b (16) (3) (17)(18) + (15) (17) (4) (18)

    (19) assign a (18) (5) (19)(OR)

    b. Explain the role of declaration statements in intermediate code generation.

    When a sequence of declarations in a procedure or block is examined, layout thestorage for names local to the procedures.Dealing with declarations in Procedures:

    P procedure id ; block ;Semantic Rule (2)begin = newlabel;Enter into symbol-table in the entry of the procedure name the begin label.P.code =gen(begin :) || block.code ||

    gen(pop return_address) || gen(goto return_address) S call idSemantic Rule

    Look up symbol table to find procedure name. Find its begin label called proc_beginreturn = newlabel;

    S.code = gen(pushreturn); gen(goto proc_begin) || gen(return :)Using a global variable offset

    - 9 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    10/14

    MAY/JUNE-'09/CS1352-Answer Key

    Computing the types and relative addresses of declared names: (4)

    P M D { }M {offset:=0 }D id : T {enter(id.name, T.type, offset)

    offset:=offset + T.width }T real {T.type = real; T.width = 8; }T integer {T.type = integer; T.width = 4; }T array [ num ] of T1

    {T.type=array(1..num.val,T1.type)T.width = num.val * T1.width}

    TT1 {T.type =pointer(T1.type);T1.width = 4}

    Keeping Track of Scope Information (4)

    Nested Procedure Declarations

    For each procedure we should create a symbol table.mktable(previous) create a new symbol table where previous is the parent symboltable of this new symbol table and returns a pointer to the new tableenter(symtable,name,type,offset) create a new entry for a variable in the given

    symbol table.enterproc(symtable,name,newsymbtable) create a new entry for the procedure in thesymbol table of its parent.addwidth(symtable,width) puts the total width of all entries in the symbol tableinto the header of that table.We will have two stacks:

    tblptr to hold the pointers to the symbol tables of enclosing procedures

    offset to hold the current offsets in the symbol tables in tblptr stack.Top element is the next available relative address for a local of the currentprocedure.

    Processing declarations in nested procedures (4)

    P M D { addwidth(top(tblptr), top(offset));pop(tblptr);pop(offset) }M { t:=mktable(null); push(t, tblptr);push(0, offset)}D D1 ; D2 ...D proc id ; N D ; S { t:=top(tblpr); addwidth(t,top(offset));

    pop(tblptr);pop(offset);enterproc(top(tblptr), id.name, t)}

    N {t:=mktable(top(tblptr));push(t,tblptr);push(0,offset);}D id : T {enter(top(tblptr), id.name, T.type, top(offset);

    top(offset):=top(offset) + T.width

    Field names in records

    T-> record L D end

    L->

    (2)

    {T.type:=record(top(tblptr)); T.width:=top(offset);pop(tblptr); pop(offset);}{t:=mktable(nil); push(t, tblptr); push(0, offset);}

    - 10 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    11/14

    MAY/JUNE-'09/CS1352-Answer Key

    14.a. Design a simple code generator and explain with example.It generates target code for a sequence of three address statements. (2)

    Assumptions:

    For each operator in three address statement, there is a corresponding target

    language operator. Computed results can be left in registers as long as possible. E.g.a=b+c: (4)

    Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1; Add c, Ri where Ri has b and result in Ri. Cost=2; Mov c, Rj; Add Rj, Ri; Cost=3;

    Register descriptor: Keeps track of what is currently in each registerAddress descriptor: Keeps tracks of the location where the current value of the name canbe found at run time. (2)Code generation algorithm: For x= y op z (6)

    Invoke the function getreg to determine the location L, where the result of y

    op z should be stored (register or memory location) Check the address descriptor for y to determine y Generate the instruction op z, L where z is the current location of z If the current values of y and/or z have no next uses, alter register descriptor

    Getreg: (2)

    If y is in a register that holds the values of no other names and y is not live,

    return register of y for L If failed, return empty register If failed, if X has next use, find an occupied register and empty it If X is not used in the block, or suitable register is found, select memory

    location of x as L(OR)

    b. Write short notes on: i. Peep hole optimization

    Peephole optimization is a simple and effective technique for locallyimproving target code. This technique is applied to improve the performance of thetarget program by examining the short sequence of target instructions and replacingthese instructions by shorter or faster sequence, whenever is possible.Peep hole is a small, moving window on the target program.

    Local in nature Pattern driven

    Limited by the size of the windowCharacteristics of peephole optimization:

    Redundant instruction elimination Flow of control optimization Algebraic simplification Use of machine idioms

    Constant Foldingx := 32x := x + 32 becomes x := 64

    - 11 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    12/14

    MAY/JUNE-'09/CS1352-Answer Key

    Unreachable CodeAn unlabeled instruction immediately following an unconditional jump isremoved.goto L2x := x + 1 unneeded

    Flow of control optimizationsUnnecessary jumps are eliminated.goto L1

    L1: goto L2 becomes goto L2 Algebraic Simplification

    x := x + 0 unneeded Dead code elimination

    x := 32 where x not used after statementy := x + y y := y + 32

    Reduction in strengthReplace expensive operations by equivalent cheaper ones

    x := x * 2 x := x + x

    ii: Issues in code generation

    Input to the code generatorIntermediate representation of the source program, like linear

    representations such as postfix notation, three address representations such asquadruples, virtual machine representations such as stack machine code andgraphical representations such as syntax trees and dags.

    Target programsIt is the output such as absolute machine language, relocatable

    machine language or assembly language. Memory managementMapping of names in the source program to addresses of data object in run

    time memory is done by front end and the code generator. Instruction selection

    Nature of the instruction set of the target machine determines the difficulty ofinstruction selection.

    Register allocationInstructions involving registers are shorter and faster. The use of

    registers is being divided into two sub problems:o During register allocation, we select the set of variables that will reside

    in registers at a point in the programo During a subsequent register assignment phase, we pick the specific

    register that a variable will reside in Choice of evaluation order

    The order in which computations are performed affect the efficiency oftarget code. Approaches to code generation

    - 12 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    13/14

    MAY/JUNE-'09/CS1352-Answer Key

    15.a. Explain with an example how basic blocks are optimized.Code improving transformations:

    Structure-preserving transformationso Common sub expression eliminationo Dead-code eliminations

    Algebraic transformations like reduction in strength.Structure preserving transformations: (8)It is implemented by constructing a dag for a basic block. Common sub

    expression can be detected by noticing, as a new node m is about to be added,whether there is an existing node n with the same children, in the same order, andwith the same operator. If so, n computes the same value as m and may be used in itsplace.E.g. DAG for the basic block

    d:=b*ce:=a+bb:=b*c

    a:=e-d is given by

    For dead-code elimination, delete from a dag any root (root with no ancestors) thathas no live variables. Repeated application of this will remove all nodes from the dag thatcorresponds to dead code.Use of algebraic identities: (8)

    e.g. x+0 = 0+x=xx-0 = xx*1 = 1*x = xx/1 = x

    Reduction in strength:Replace expensive operator by a cheaper one.

    x ** 2 = x * x

    Constant folding:Evaluate constant expressions at compile time and replace them by their values.

    Can use commutative and associative lawsE.g. a=b+c

    e=c+d+bIC: a=b+c

    t=c+de=t+b

    - 13 -

    http://engineerportal.blogspot.in/

  • 7/28/2019 CS1352_MAY09

    14/14

    MAY/JUNE-'09/CS1352-Answer Key

    If t is not needed outside the block, change this toa=b+ce=a+d

    using both the associativity and commutativity of +.

    (OR)b. Explain the storage allocation strategies used in run time environments.

    Static allocation lays out storage for all data objects during compile time Stack allocation manages the run-time storage as a stack Heap allocation allocates and deallocates storages as needed at runtime from heap

    areaStatic allocation: (4)

    Names are bound to storage at compile time No need for run-time support package When a procedure is activated, its names are bound to same storage location. Compiler must decide where activation records should go.

    Limitations: size must be known at compile time recursive procedures are restricted data structures cant be created dynamically

    Stack allocation: (6) Activation records are pushed and popped as activations begin and end. Locals are bound to fresh storage in each activation and deleted when the

    activation ends. Call sequence and return sequence caller and callee Dangling references

    Heap allocation: (6)Stack allocation cannot be used if either of the following is possible:

    1. The values of local names must be retained when an activation ends2. A called activation outlives the caller. Allocate pieces of memory for activation records, which can be deallocated in any

    order Maintain linked list of free blocks Fill a request for size s with a block of size s, where s is the smallest size

    greater than or equal to s Use heap manager, which takes care of defragmentation and garbage collection.

    - 14 -