syntax error handling errors can occur at many levels lexical: unknown operator

• Syntax error handling– Errors can occur at many levels

• lexical: unknown operator

• syntactic: unbalanced parentheses

• semantic: variable never declared

• runtime: reference a NULL pointer

– Goals of error-handling in a parser• To detect and report the presence of errors

• To recover from an error and detect subsequent errors

• To not slow down the processing of correct programs

Error recovery strategies

• Panic mode recovery– On discovering an error, discard input symbols

one at a time until one of a designated set of synchronizing token is found.

• Phrase-level recovery– On discovering an error, perform a local fix to

allow the parser to continue.

• Error recovery in predictive parsing– Recovery in a non-recursive predictive parser is

easier than in a recursive descent parser.– Panic mode recovery

• If a terminal on stack, pop the terminal.

• If a non-terminal on stack, shift the input until the terminal can expand.

– Phrase-level recovery• Carefully filling in the blank entries about what to

do.

– Error recover in LR parsing• Canonical LR parsers never make extra reductions

when recognizing an error.

• SLR and LALR may make extra reductions, but will never shift an erroneous input symbol on the stack.

• Panic mode recovery– Scan down stack until a state representing a major

program construct is found. Input symbols are discarded until one is found that is in the follow of the nonterminal. Trying to isolate the phrase containing the error.

• Phrase level recovery– Implement an error recovery routine for each error entry

in the table.

– Writing a parser with YACC (Yet Another Compiler Compiler).

• Generates LALR parsers

• Work with lex. YACC calls yylex to get next token.– YACC and lex must agree on the values for each token.

• Produce y.tab.c file by “yacc yaccfile”, which contains a routine yyparse().

• yyparse() returns 0 if the program is ok, non-zero otherwise

• YACC file format:declarations

%%

translation rules

%%

supporting C-routines

• The declarations part specifies tokens, non-terminals symbols, other C constructs.

– To specify token AAA BBB• %token AAA BBB

– To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token

• %token EOFnumber 0

• %token SEMInumber 101

– Non-terminals do not need to be declared unless you want to associated it with a type (will be discussed later).

• Yacc environment– Yacc processes the specification file and produce a y.tab.c file.

– An integer function yyparse() is produced by Yacc.• Calls yylex() to get tokens.

• Return non-zero when an error is found.

• Return 0 if the program is accepted.

– Need main() and and yyerror() functions.

– Example: yyerror(str)

char *str;

{ printf("yyerror: %s at line %d\n", str, yyline);

}

main()

{

if (!yyparse()) {printf("accept\n");}

else printf("reject\n");

}

– YACC builds a LALR parser for the grammar.

• May have shift/reduce and reduce/reduce conflicts if there are problems with the grammar.

• Default conflict resolution:

– shift/reduce --> shift

– reduce/reduce --> first production in the state

– should always avoid reduce/reduce conflicts

• ‘yacc -v *.y’ will generate a report in file ‘y.output’.

• See example1.y

• The programmer MUST resolve all conflicts (unless you really know what you are doing).

– modify the grammar. See example2.y

– Use precedence and associativity of operators.

• Use precedence and associativity of operators.– Using keywords %left, %right, %nonassoc in

the declarations section. • All tokens on the same line are the same precedence

level and associativity.• The lines are listed in order of increasing

precedence.

%left PLUSnumber, MINUSnumber%left TIMESnumber, DIVIDEnumber

– See example3.y

• Symbol attributes– Each symbol can be associated with some

attributes.• Data structure of the attributes can be specified in the union in

the declarations. (see example4.y).

%union {

int semantic_value;

}

%token <semantic_value> ICONSTnumber 119

%type <semantic_value> exp

%type <semantic_value> term

%type <semantic_value> item

• Semantic actions associate with productions can be specified

• Semantic actions– Semantic actions associate with productions can be

specified.

item : LPARENnumber exp RPARENnumber {$$ = $2;} | ICONSTnumber {$$ = $1;} ;• $$ is the attribute associated with the left handside of the

production• $1 is the attribute associated with the first symbol in the

right handside, $2 for the second symbol, …– An action can be in anyway in the production, it is also

counted as a symbol.

– Checkout example5.y for examples with multiple types associated with different symbol.

syntax error handling errors can occur at many levels lexical: unknown operator

Documents