syntax error handling errors can occur at many levels lexical: unknown operator
DESCRIPTION
Syntax error handling Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime: reference a NULL pointer Goals of error-handling in a parser To detect and report the presence of errors - PowerPoint PPT PresentationTRANSCRIPT
• Syntax error handling– Errors can occur at many levels
• lexical: unknown operator
• syntactic: unbalanced parentheses
• semantic: variable never declared
• runtime: reference a NULL pointer
– Goals of error-handling in a parser• To detect and report the presence of errors
• To recover from an error and detect subsequent errors
• To not slow down the processing of correct programs
Error recovery strategies
• Panic mode recovery– On discovering an error, discard input symbols
one at a time until one of a designated set of synchronizing token is found.
• Phrase-level recovery– On discovering an error, perform a local fix to
allow the parser to continue.
• Error recovery in predictive parsing– Recovery in a non-recursive predictive parser is
easier than in a recursive descent parser.– Panic mode recovery
• If a terminal on stack, pop the terminal.
• If a non-terminal on stack, shift the input until the terminal can expand.
– Phrase-level recovery• Carefully filling in the blank entries about what to
do.
– Error recover in LR parsing• Canonical LR parsers never make extra reductions
when recognizing an error.
• SLR and LALR may make extra reductions, but will never shift an erroneous input symbol on the stack.
• Panic mode recovery– Scan down stack until a state representing a major
program construct is found. Input symbols are discarded until one is found that is in the follow of the nonterminal. Trying to isolate the phrase containing the error.
• Phrase level recovery– Implement an error recovery routine for each error entry
in the table.
– Writing a parser with YACC (Yet Another Compiler Compiler).
• Generates LALR parsers
• Work with lex. YACC calls yylex to get next token.– YACC and lex must agree on the values for each token.
• Produce y.tab.c file by “yacc yaccfile”, which contains a routine yyparse().
• yyparse() returns 0 if the program is ok, non-zero otherwise
• YACC file format:declarations
%%
translation rules
%%
supporting C-routines
• The declarations part specifies tokens, non-terminals symbols, other C constructs.
– To specify token AAA BBB• %token AAA BBB
– To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token
• %token EOFnumber 0
• %token SEMInumber 101
– Non-terminals do not need to be declared unless you want to associated it with a type (will be discussed later).
• Translations rules specify the grammar productions
exp : exp PLUSnumber exp
| exp MINUSnumber exp
| exp TIMESnumber exp
| exp DIVIDEnumber exp
| LPARENnumber exp RPARENnumber
| ICONSTnumber
;
exp : exp PLUSnumber exp
;
exp : exp MINUSnumber exp
;
• Yacc environment– Yacc processes the specification file and produce a y.tab.c file.
– An integer function yyparse() is produced by Yacc.• Calls yylex() to get tokens.
• Return non-zero when an error is found.
• Return 0 if the program is accepted.
– Need main() and and yyerror() functions.
– Example: yyerror(str)
char *str;
{ printf("yyerror: %s at line %d\n", str, yyline);
}
main()
{
if (!yyparse()) {printf("accept\n");}
else printf("reject\n");
}
– YACC builds a LALR parser for the grammar.
• May have shift/reduce and reduce/reduce conflicts if there are problems with the grammar.
• Default conflict resolution:
– shift/reduce --> shift
– reduce/reduce --> first production in the state
– should always avoid reduce/reduce conflicts
• ‘yacc -v *.y’ will generate a report in file ‘y.output’.
• See example1.y
• The programmer MUST resolve all conflicts (unless you really know what you are doing).
– modify the grammar. See example2.y
– Use precedence and associativity of operators.
• Use precedence and associativity of operators.– Using keywords %left, %right, %nonassoc in
the declarations section. • All tokens on the same line are the same precedence
level and associativity.• The lines are listed in order of increasing
precedence.
%left PLUSnumber, MINUSnumber%left TIMESnumber, DIVIDEnumber
– See example3.y
• Symbol attributes– Each symbol can be associated with some
attributes.• Data structure of the attributes can be specified in the union in
the declarations. (see example4.y).
%union {
int semantic_value;
}
%token <semantic_value> ICONSTnumber 119
%type <semantic_value> exp
%type <semantic_value> term
%type <semantic_value> item
• Semantic actions associate with productions can be specified
• Semantic actions– Semantic actions associate with productions can be
specified.
item : LPARENnumber exp RPARENnumber {$$ = $2;} | ICONSTnumber {$$ = $1;} ;• $$ is the attribute associated with the left handside of the
production• $1 is the attribute associated with the first symbol in the
right handside, $2 for the second symbol, …– An action can be in anyway in the production, it is also
counted as a symbol.
– Checkout example5.y for examples with multiple types associated with different symbol.