system programming handout#03

10

Click here to load reader

Upload: sunita-aher

Post on 07-Jan-2017

245 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: SYSTEM PROGRAMMING HANDOUT#03

System Programming

Walchand Institute of Technology

Aim:

Parser generator using YACC

Theory:

Language processor development tools (LPDTs) focusing on generation of the

analysis phase of language processors.

Figure 1 shows a schematic

language processor whose source

two inputs:

1. Specification

2. Specification

phase.

Figure 1: A Language Processor Development Tool (LPDT)

It generates programs that perform lexical, syntax and semantic analysis

source program and construct the IR. These programs collectively form the

analysis phase of the language processor.

Sunita M. Dol, CSE Dept

Technology, Solapur

HANDOUT#03

Parser generator using YACC (Yet Another Compiler Compiler)

guage processor development tools (LPDTs) focusing on generation of the

analysis phase of language processors.

shows a schematic of an LPDT which generates the analysis phase

whose source language is L. The LPDT requires the following

Specification of a grammar of language L

Specification of semantic actions to be performed in the analysis

: A Language Processor Development Tool (LPDT)

It generates programs that perform lexical, syntax and semantic analysis

source program and construct the IR. These programs collectively form the

language processor.

Sunita M. Dol, CSE Dept

Page 1

(Yet Another Compiler Compiler)

guage processor development tools (LPDTs) focusing on generation of the

an LPDT which generates the analysis phase of a

is L. The LPDT requires the following

semantic actions to be performed in the analysis

: A Language Processor Development Tool (LPDT)

It generates programs that perform lexical, syntax and semantic analysis of the

source program and construct the IR. These programs collectively form the

Page 2: SYSTEM PROGRAMMING HANDOUT#03

System Programming

Walchand Institute of Technology

Two LPDTs are widely used in practice. These are, the lexical analyzer generator

LEX, and the parser generator YACC. The input to these tools is a specification

the lexical and syntactic constructs

on recognizing the constructs. The specification consists

rules of the form

<

where < semantic action >

matching < string specification

erate C programs which contain the code for scanning and parsing, respectively,

and the semantic actions contained in the specification.

A YACC generated parser can use a LEX generated scanner as a routine i

scanner and parser use same con

Figure 2 shows a schematic for developing the analysis phase

language L using LEX and YACC. The analysis phase processes the source

program to build an intermediate represen

Figure

YACC

Each string specification in the input to YACC resembles a grammar production.

The parser generated by

The actions associated with a string specification are executed when a reduction is

made according to the specification. An attribute is associated with every

nonterminal symbol. The value

The attribute can be given any user

Sunita M. Dol, CSE Dept

Technology, Solapur

Two LPDTs are widely used in practice. These are, the lexical analyzer generator

LEX, and the parser generator YACC. The input to these tools is a specification

the lexical and syntactic constructs of L, and the semantic actions to be performed

nizing the constructs. The specification consists of a set

<string specification> {< semantic action>}

> consists of C code. This code is executed when a string

string specification > is encountered in the input. LEX and YACC gen

erate C programs which contain the code for scanning and parsing, respectively,

and the semantic actions contained in the specification.

A YACC generated parser can use a LEX generated scanner as a routine i

scanner and parser use same conventions concerning the representation

Figure 2 shows a schematic for developing the analysis phase

L using LEX and YACC. The analysis phase processes the source

n intermediate representation.

Figure 2: using LEX and YACC

Each string specification in the input to YACC resembles a grammar production.

by YACC performs reductions according to this grammar.

The actions associated with a string specification are executed when a reduction is

made according to the specification. An attribute is associated with every

bol. The value of this attribute can be manipulated during parsing.

The attribute can be given any user-designed structure. A symbol ‘$n’ in the action

Sunita M. Dol, CSE Dept

Page 2

Two LPDTs are widely used in practice. These are, the lexical analyzer generator

LEX, and the parser generator YACC. The input to these tools is a specification of

L, and the semantic actions to be performed

a set of translation

string specification> {< semantic action>}

C code. This code is executed when a string

> is encountered in the input. LEX and YACC gen-

erate C programs which contain the code for scanning and parsing, respectively,

A YACC generated parser can use a LEX generated scanner as a routine if the

ventions concerning the representation of tokens.

Figure 2 shows a schematic for developing the analysis phase of a compiler for

L using LEX and YACC. The analysis phase processes the source

Each string specification in the input to YACC resembles a grammar production.

YACC performs reductions according to this grammar.

The actions associated with a string specification are executed when a reduction is

made according to the specification. An attribute is associated with every

ute can be manipulated during parsing.

designed structure. A symbol ‘$n’ in the action

Page 3: SYSTEM PROGRAMMING HANDOUT#03

System Programming

Walchand Institute of Technology

part of a translation rule refers to the attribute

string specification. ‘$$’ represents the attr

specification.

Example

Figure 3 shows sample input to YACC. The input consists

which only two are shown. The routine gendesc builds a descriptor containing the

name and type of an id or constant. The routine gencode takes an op

attributes of two operands, generates code and returns with the attribute

Figure

Parsing of the* string b+c*d where b, c and d are

generated by YACC from the input

C routines:

Gendesc (id#1);

Gendesc (id#2);

Gendesc (id#3);

Gencode (*, [c,real], [d, real]);

Gencode (+, [b, real], [t, real]);

where an attribute has the f

used to store the result of

Compiling the Example Program

To create the desk calculator example program, do the following:

Sunita M. Dol, CSE Dept

Technology, Solapur

a translation rule refers to the attribute of the nth symbol in the RHS

string specification. ‘$$’ represents the attribute of the LHS symbol

Figure 3 shows sample input to YACC. The input consists of four com

which only two are shown. The routine gendesc builds a descriptor containing the

an id or constant. The routine gencode takes an op

two operands, generates code and returns with the attribute

Figure 3: A sample YACC specification

the* string b+c*d where b, c and d are of type real, using the p

YACC from the input of above Figure leads to following

Gencode (*, [c,real], [d, real]);

Gencode (+, [b, real], [t, real]);

where an attribute has the form <name>, <type> and t is the name

of c*d in the code generated by the first call on gencode.

Compiling the Example Program

To create the desk calculator example program, do the following:

Sunita M. Dol, CSE Dept

Page 3

symbol in the RHS of the

the LHS symbol of the string

four components, of

which only two are shown. The routine gendesc builds a descriptor containing the

an id or constant. The routine gencode takes an operator and the

two operands, generates code and returns with the attribute

: A sample YACC specification

type real, using the parser

Figure leads to following-calls on the

and t is the name of a location

the first call on gencode.

To create the desk calculator example program, do the following:

Page 4: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 4

1. Process the yacc grammar file using the -d optional flag (which informs the

yacc command to create a file that defines the tokens used in addition to the

C language source code):

yacc -d calc.yacc

2. Use the ls command to verify that the following files were created:

y.tab.c

The C language source file that the yacc command created for the parser

y.tab.h

A header file containing define statements for the tokens used by the parser

3. Process the lex specification file:

lex calc.lex

4. Use the ls command to verify that the following file was created:

lex.yy.c

The C language source file that the lex command created for the lexical

analyzer

5. Compile and link the two C language source files:

cc y.tab.c lex.yy.c

6. Use the ls command to verify that the following files were created:

y.tab.o

The object file for the y.tab.c source file

lex.yy.o

The object file for the lex.yy.c source file

a.out

The executable program file

To run the program directly from the a.out file, type:

$ a.out

Program:

YACC program

The following example shows the contents of the calc.yacc file. This file has

entries in all three sections of a yacc grammar file: declarations, rules, and

programs.

Page 5: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 5

%{

#include <stdio.h>

int regs[26];

int base;

%}

%start list

%union { int a; }

%type <a> expr number

%token DIGIT LETTER

%left '|'

%left '&'

%left '+' '-'

%left '*' '/' '%'

%left UMINUS /*supplies precedence for unary minus */

%% /* beginning of rules section */

list: /*empty */

|

list stat '\n'

|

list error '\n'

{

yyerrok;

} ;

stat: expr

{

printf("%d\n",$1);

}

|

LETTER '=' expr

{

regs[$1] = $3;

} ;

Page 6: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 6

expr: '(' expr ')'

{

$$ = $2;

}

|

expr '*' expr

{

$$ = $1 * $3;

}

|

expr '/' expr

{

$$ = $1 / $3;

}

|

expr '%' expr

{

$$ = $1 % $3;

}

|

expr '+' expr

{

$$ = $1 + $3;

}

|

expr '-' expr

{

$$ = $1 - $3;

}

|

expr '&' expr

{

$$ = $1 & $3;

}

Page 7: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 7

|

expr '|' expr

{

$$ = $1 | $3;

}

|

'-' expr %prec UMINUS

{

$$ = -$2;

}

|

LETTER

{

$$ = regs[$1];

}

|

number

;

number: DIGIT

{

$$ = $1;

base = ($1==0) ? 8 : 10;

} |

number DIGIT

{

$$ = base * $1 + $2;

} ;

%%

main()

{

return(yyparse());

}

yyerror(s)

Page 8: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 8

char *s;

{

fprintf(stderr, "%s\n",s);

}

yywrap()

{

return(1);

}

The file contains the following sections:

• Declarations Section. This section contains entries that:

o Include standard I/O header file

o Define global variables

o Define the list rule as the place to start processing

o Define the tokens used by the parser

o Define the operators and their precedence

• Rules Section. The rules section defines the rules that parse the input

stream.

o %start - Specifies that the whole input should match stat.

o %union - By default, the values returned by actions and the lexical

analyzer are integers. yacc can also support values of other types,

including structures. In addition, yacc keeps track of the types, and

inserts appropriate union member names so that the resulting parser

will be strictly type checked. The yacc value stack is declared to be a

union of the various types of values desired. The user declares the

union, and associates union member names to each token and

nonterminal symbol having a value. When the value is referenced

through a $$ or $n construction, yacc will automatically insert the

appropriate union name, so that no unwanted conversions will take

place.

o %type - Makes use of the members of the %union declaration and

gives an individual type for the values associated with each part of the

grammar.

o %token - Lists the tokens which come from lex tool with their type.

Page 9: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 9

• Programs Section. The programs section contains the following

subroutines. Because these subroutines are included in this file, you do not

need to use the yacc library when processing this file.

main The required main program that calls the yyparse subroutine to

start the program.

yyerror(s) This error-handling subroutine only prints a syntax error

message.

yywrap The wrap-up subroutine that returns a value of 1 when the end

of input occurs.

LEX program

This file contains include statements for standard input and output, as well as for

the y.tab.h file. If you use the -d flag with the yacc command, the yacc program

generates that file from the yacc grammar file information. The y.tab.h file

contains definitions for the tokens that the parser program uses. In addition, the

calc.lex file contains the rules to generate these tokens from the input stream. The

following are the contents of the calc.lex file.

%{

#include <stdio.h>

#include "y.tab.h"

int c;

extern int yylval;

%}

%%

" " ;

[a-z] {

c = yytext[0];

yylval = c - 'a';

return(LETTER);

}

[0-9] {

Page 10: SYSTEM PROGRAMMING HANDOUT#03

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 10

c = yytext[0];

yylval = c - '0';

return(DIGIT);

}

[^a-z0-9\b] {

c = yytext[0];

return(c);

}

Input:

2+3

Output:

5

Conclusion:

� YACC file has entries in three sections of a yacc grammar file: declarations,

rules, and programs.

� The lexical analyzer file contains include statements for standard input and

output, as well as for the y.tab.h file. If you use the -d flag with the yacc

command, the yacc program generates that file from the yacc grammar file

information.

� The y.tab.h file contains definitions for the tokens that the parser program

uses. In addition, the lexical analyzer file contains the rules to generate

these tokens from the input stream.

� Thus the parser is implemented using YACC