parser construction tools: yacc yacc is on unix systems, it creates lalr parsers in c compiler...

10
Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 1 yacc specific ation yacc y.tab. c C compiler ly libra ry sour ce outp ut The yacc specification may ‘#include’ a lexical analyzer produced by Lex, or by other means more of your C The ly library contains the LALR parser which uses the parsing table built by yacc and calls the lexer ‘yylex’

Upload: zoe-cox

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Parser construction tools: YACC

• Yacc is on Unix systems, it creates LALR parsers in C

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 1

yacc specification

yacc y.tab.c

C compiler

ly library

source

output

The yacc specification may ‘#include’ a lexical analyzer produced by Lex, or by other means more

ofyour C

The ly library contains the LALR parser which uses the parsing table built by yacc and calls the lexer ‘yylex’

Page 2: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

The three parts of a yacc specification

1. declarations

– ordinary C, enclosed in %{ … %}, copied verbatim into y.tab.c

– declarations for use by yacc, such as %token, %left, %right, %nonassoc

2. separator – %%

3. grammar rules. Each one has

– a nonterminal name followed by a colon

– productions separated by vertical bar, possibly each with additional semantic actions and precedence information

– a final semicolon

4. separator – %%

5. supporting C routines

– there must at least be a lexical analyser named yylex

– commonly accomplished by writing #include “lex.yy.c” where the lex program has been used to build the lexer. But can be hand-written.

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 2

Page 3: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Simple Desk-Calculator example

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 3

%{#include <ctype.h>%}%token DIGIT%%line : expr ‘\n’ { printf(“%d\n”, $1); } ;expr : expr ‘+’ term { $$ = $1 + $3; } | term ;term : term ‘*’ factor { $$ = $1 * $3; } | factor ;factor : ‘(‘ expr ‘)’ { $$ = $2; } | DIGIT ;%%

yylex() { int c; c=getchar(); if (isdigit(c)) {yylval=c-’0’; return DIGIT;} return c;}

declares isdigit among others

declares the token DIGIT for use in grammar rules and also in lexer codea semantic rule

default semantic rule $$ = $1 is useful for single productions

lexer uses C variable ‘yylval’ to communicate attribute value

#include “lex.yy.c” here to use the yylex routine built by Lex

Page 4: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Ambiguous grammars in Yacc

• Yacc declarations allow for shift/reduce and reduce/reduce conflicts to be resolved using operator precedence and operator associativity information

Yacc does have default methods for resolving conflicts but it is considered wise to find out (using –v option) what conflicts arose and how they were resolved.

The declarations provide a way to override Yacc’s defaults Productions have the precedence of their rightmost terminal, unless otherwise

specified by %prec element

• the declaration keywords %left, %right and %nonassoc inform Yacc that the tokens following are to be treated as left-associative (as binary + & * commonly are), right-associative (as binary – & / often are), or non-associative (as binary < & > often are)

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 4

%left ‘+’ ‘-’%left ‘*’ ‘/’

effect is that * has higher precedence than +, so x+y*z is grouped like x+(y*z)

• the order of declarations informs yacc that the tokens should be accorded increasing precedence

Page 5: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Semantic actions in Yacc

• Each time the lexer returns a token, it can also produce an attribute value in the variable named yyval

• Attribute values for nonterminals can also be produced by semantic actions

– several C statements enclosed in { … }

– $$ refers to attribute value for lhs nonterminal

– $1, $2 etc refer to attribute values for successive rhs grammar symbols

• Desk Calculator example uses only simple arithmentic operations. True compilers can have much more complex code in their productions’ semantic actions

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 5

Page 6: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Bigger Desk-Calculator example

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 6

%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double /* double type for Yacc stack */%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ;expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;%%#include “lex.yy.c”

Page 7: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Bigger Desk-Calculator example

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 7

%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double /* double type for Yacc stack */%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%

lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ;expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;%%

#include “lex.yy.c”

Page 8: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Bigger Desk-Calculator example

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 8

%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double /* double type for Yacc stack */%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%

lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ;expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;%%#include “lex.yy.c”

Page 9: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Error handling in Yacc-generated parsers

• Rules may include error productions for selected nonterminals

– stmt : {…} | {…} | {…} | error …– error is a Yacc reserved word

• If the parser has no action for a combination of {state, input token}, then

1. it scans its stack for a state with a error production among its items

2. it pushes “error” onto its symbol stack

3. it scans input stream for a sequence reducible to – which may be empty

§ it pushes all onto its symbol stack

§ it reduces according to the error production

– which may cause semantic actions to be carried out

– often involving routines yyerror(msg) and yyerrok

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 9

Page 10: Parser construction tools: YACC Yacc is on Unix systems, it creates LALR parsers in C  Compiler Construction1

Some other free parser generatorssee eg www.thefreecountry.com/programming/compilerconstruction.html

http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 10

Name Languages Parser Type Lexers Impl. Lang Yacc Comp-atible

Some Features

Antlr C ,C++, Java, C#, Objective C, Python

Recursive Descent Y Java? N Also ASTs (Abstract Syntax Trees)

JavaCC Java Recursive Descent Y Java N

Bison C LALR N C Y Facilitates multiple parsers in one program

Yacc C LALR N C Y!

YaYacc C++ LALR N Y FreeBSD