parser construction tools: yacc yacc is on unix systems, it creates lalr parsers in c compiler...
TRANSCRIPT
Parser construction tools: YACC
• Yacc is on Unix systems, it creates LALR parsers in C
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 1
yacc specification
yacc y.tab.c
C compiler
ly library
source
output
The yacc specification may ‘#include’ a lexical analyzer produced by Lex, or by other means more
ofyour C
The ly library contains the LALR parser which uses the parsing table built by yacc and calls the lexer ‘yylex’
The three parts of a yacc specification
1. declarations
– ordinary C, enclosed in %{ … %}, copied verbatim into y.tab.c
– declarations for use by yacc, such as %token, %left, %right, %nonassoc
2. separator – %%
3. grammar rules. Each one has
– a nonterminal name followed by a colon
– productions separated by vertical bar, possibly each with additional semantic actions and precedence information
– a final semicolon
4. separator – %%
5. supporting C routines
– there must at least be a lexical analyser named yylex
– commonly accomplished by writing #include “lex.yy.c” where the lex program has been used to build the lexer. But can be hand-written.
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 2
Simple Desk-Calculator example
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 3
%{#include <ctype.h>%}%token DIGIT%%line : expr ‘\n’ { printf(“%d\n”, $1); } ;expr : expr ‘+’ term { $$ = $1 + $3; } | term ;term : term ‘*’ factor { $$ = $1 * $3; } | factor ;factor : ‘(‘ expr ‘)’ { $$ = $2; } | DIGIT ;%%
yylex() { int c; c=getchar(); if (isdigit(c)) {yylval=c-’0’; return DIGIT;} return c;}
declares isdigit among others
declares the token DIGIT for use in grammar rules and also in lexer codea semantic rule
default semantic rule $$ = $1 is useful for single productions
lexer uses C variable ‘yylval’ to communicate attribute value
#include “lex.yy.c” here to use the yylex routine built by Lex
Ambiguous grammars in Yacc
• Yacc declarations allow for shift/reduce and reduce/reduce conflicts to be resolved using operator precedence and operator associativity information
Yacc does have default methods for resolving conflicts but it is considered wise to find out (using –v option) what conflicts arose and how they were resolved.
The declarations provide a way to override Yacc’s defaults Productions have the precedence of their rightmost terminal, unless otherwise
specified by %prec element
• the declaration keywords %left, %right and %nonassoc inform Yacc that the tokens following are to be treated as left-associative (as binary + & * commonly are), right-associative (as binary – & / often are), or non-associative (as binary < & > often are)
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 4
%left ‘+’ ‘-’%left ‘*’ ‘/’
effect is that * has higher precedence than +, so x+y*z is grouped like x+(y*z)
• the order of declarations informs yacc that the tokens should be accorded increasing precedence
Semantic actions in Yacc
• Each time the lexer returns a token, it can also produce an attribute value in the variable named yyval
• Attribute values for nonterminals can also be produced by semantic actions
– several C statements enclosed in { … }
– $$ refers to attribute value for lhs nonterminal
– $1, $2 etc refer to attribute values for successive rhs grammar symbols
• Desk Calculator example uses only simple arithmentic operations. True compilers can have much more complex code in their productions’ semantic actions
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 5
Bigger Desk-Calculator example
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 6
%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double /* double type for Yacc stack */%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ;expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;%%#include “lex.yy.c”
Bigger Desk-Calculator example
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 7
%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double /* double type for Yacc stack */%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%
lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ;expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;%%
#include “lex.yy.c”
Bigger Desk-Calculator example
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 8
%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double /* double type for Yacc stack */%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%
lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ;expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ;%%#include “lex.yy.c”
Error handling in Yacc-generated parsers
• Rules may include error productions for selected nonterminals
– stmt : {…} | {…} | {…} | error …– error is a Yacc reserved word
• If the parser has no action for a combination of {state, input token}, then
1. it scans its stack for a state with a error production among its items
2. it pushes “error” onto its symbol stack
3. it scans input stream for a sequence reducible to – which may be empty
§ it pushes all onto its symbol stack
§ it reduces according to the error production
– which may cause semantic actions to be carried out
– often involving routines yyerror(msg) and yyerrok
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 9
Some other free parser generatorssee eg www.thefreecountry.com/programming/compilerconstruction.html
http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction 10
Name Languages Parser Type Lexers Impl. Lang Yacc Comp-atible
Some Features
Antlr C ,C++, Java, C#, Objective C, Python
Recursive Descent Y Java? N Also ASTs (Abstract Syntax Trees)
JavaCC Java Recursive Descent Y Java N
Bison C LALR N C Y Facilitates multiple parsers in one program
Yacc C LALR N C Y!
YaYacc C++ LALR N Y FreeBSD