1 cs 410 mastery in programming chapter 5 ll(1) parsing herbert g. mayer, psu cs status 7/17/2011

33
1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS Herbert G. Mayer, PSU CS status 7/17/2011 status 7/17/2011

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

1

CS 410Mastery in Programming

Chapter 5LL(1) Parsing

Herbert G. Mayer, PSU CSHerbert G. Mayer, PSU CSstatus 7/17/2011status 7/17/2011

Page 2: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

2

Syllabus GoalGoal

Grammars Formally, IntuitivelyGrammars Formally, Intuitively

BNF, EBNFBNF, EBNF

Grammar G1Grammar G1

Suitable GrammarSuitable Grammar

Uses of Grammar to ParseUses of Grammar to Parse

Recursive DescentRecursive Descent

Recursive Descent Parser For sRecursive Descent Parser For s

Sample eSample e

Sample sSample s

Page 3: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

3

GoalThe rules of a programming language L specify how to generate strings of The rules of a programming language L specify how to generate strings of

text that are in L; other strings are not part of Ltext that are in L; other strings are not part of L

The number of strings in L (i.e. the size of set { L } ) is generally The number of strings in L (i.e. the size of set { L } ) is generally unbounded for typical programming languagesunbounded for typical programming languages

One way of expressing language rules is through a grammar GOne way of expressing language rules is through a grammar G

Our goal is to become familiar with suitable grammars. Suitable means, Our goal is to become familiar with suitable grammars. Suitable means, certain rules are not allowed, such as left-recursion, circular rules, and certain rules are not allowed, such as left-recursion, circular rules, and lambda-producing rules – with exception!lambda-producing rules – with exception!

The class of grammar we use is context-free; thus, the more powerful The class of grammar we use is context-free; thus, the more powerful class of grammars with context-sensitive rules is excludedclass of grammars with context-sensitive rules is excluded

A side goal is to learn a particular notation for writing grammars, but that A side goal is to learn a particular notation for writing grammars, but that notation is simply a convenience, just a handy way of writingnotation is simply a convenience, just a handy way of writing

We’ll focus on Backus Naur Form (BNF), AKA Backus Normal Form (BNF), We’ll focus on Backus Naur Form (BNF), AKA Backus Normal Form (BNF), from the early days of the Algol-60from the early days of the Algol-60

Page 4: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

4

Grammars FormallyAA grammar G for language L, named G(L), is a quintuple of grammar G for language L, named G(L), is a quintuple of { terminals, { terminals,

nonterminals, metasymbols, start symbol, productions } nonterminals, metasymbols, start symbol, productions } defining all strings in L; each string in L is named a defining all strings in L; each string in L is named a programprogram

TerminalTerminal: A final token in the language L; e.g. : A final token in the language L; e.g. “hello”“hello”

Nonterminal SymbolNonterminal Symbol: Is a grammar symbol, used as short-hand that for a : Is a grammar symbol, used as short-hand that for a string of other symbols; must be defined at least once on the left-hand string of other symbols; must be defined at least once on the left-hand side of a production; convenient to have multiple alternatives grouped side of a production; convenient to have multiple alternatives grouped via the metasymbol via the metasymbol ||

MetasymbolMetasymbol: Symbol of the grammar itself defining action or meaning; is : Symbol of the grammar itself defining action or meaning; is not part of the language L defined by G; is a grammar short-handnot part of the language L defined by G; is a grammar short-hand

Start SymbolStart Symbol: One of the productions starts the process of generating : One of the productions starts the process of generating (defining) strings in L; doesn’t have to be the first nonterminal being (defining) strings in L; doesn’t have to be the first nonterminal being defined in G, but is convenient to be listed firstdefined in G, but is convenient to be listed first

ProductionProduction: Rule that defines a nonterminal; consists of nonterminal on : Rule that defines a nonterminal; consists of nonterminal on left-hand side being defined, specified by the “produces” metasymbol, left-hand side being defined, specified by the “produces” metasymbol, plus some string of symbols on the right-hand side that is not circularplus some string of symbols on the right-hand side that is not circular

Page 5: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

5

Grammars, Some TerminologyThe empty string is referred to as The empty string is referred to as lambdalambda. We’ll use . We’ll use

lambda as lambda as a convenience in grammar writing; a convenience in grammar writing; otherwise it is superfluous; also referred to in the otherwise it is superfluous; also referred to in the literature as literature as epsilonepsilon

LambdaLambda is superfluous as a grammar tool, except if the is superfluous as a grammar tool, except if the language allows the empty program. In all other language allows the empty program. In all other cases, rules that produce lambda can be replaced by cases, rules that produce lambda can be replaced by other rules that do not use lambda, at the expense of other rules that do not use lambda, at the expense of a more complex grammara more complex grammar

Right-hand side of a suitable production –AKA Right-hand side of a suitable production –AKA alternative– eventually starts with a terminal; could alternative– eventually starts with a terminal; could be several terminals, if several alternatives exist. The be several terminals, if several alternatives exist. The set of all distinct terminals that can start a right-hand set of all distinct terminals that can start a right-hand side is called the side is called the first setfirst set

Page 6: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

6

Grammars IntuitivelyAA grammar G is a set of grammar G is a set of rules rules to produce to produce programsprograms; programs are ; programs are

strings of characters in a programming language Lstrings of characters in a programming language L

Each rule has a name on the left-hand side, the Each rule has a name on the left-hand side, the nonterminalnonterminal that that generates at least generates at least one sequence of other symbols; those can be one sequence of other symbols; those can be terminalsterminals or or nonterminals nonterminals listed on the right-hand sidelisted on the right-hand side

TerminalTerminal is a symbol expressing a value directly, like is a symbol expressing a value directly, like 500500. Can also be . Can also be some fixed symbol, like some fixed symbol, like ++ or or ( ( or or END END . A terminal symbol cannot . A terminal symbol cannot produce other stringsproduce other strings

Nonterminal Nonterminal is a name that can be used on the right-hand-side of a is a name that can be used on the right-hand-side of a productionproduction. Occurs at least once on right-hand side of a . Occurs at least once on right-hand side of a production, and is defined by production, and is defined by nonterminalsnonterminals or or terminalsterminals

When there are multiple When there are multiple rules --AKA productions--rules --AKA productions-- for a for a nonterminalnonterminal, , we call these we call these alternativesalternatives

One of the One of the nonterminalsnonterminals is the is the start symbol. start symbol. That is where the That is where the generating process starts; often written as the first rule, but must generating process starts; often written as the first rule, but must be clearly identified somehowbe clearly identified somehow

Page 7: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

7

GrammarsExample for grammar GExample for grammar G00::

ss :: s ( s )s ( s )||

Discussion of GDiscussion of G00:: The only nonterminal symbol used in grammar G0 is s. Hence s must also be the start symbol There are 2 meta-symbols, or if we are picky 3

Metasymbol : means “left side produces the string on the right” Metasymbol | means “another alternative for s” End of all rules means it is the end of G0

Nothing else to the right of | means: “this alternative generates the empty string”, i.e. nothing, or lambda

The first alternative of the two productions in G0 is left-recursive There are 2 terminal symbols, ( and )

We can debate, whether the empty string lambda is also a terminal symbol I do not count the empty string, since this would be a case where an

infinite sequence of the same terminal symbols --of nothings-- is the same as a single occurrence; not suitable for language grammars

Page 8: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

8

BNF, EBNFWhile authoring the report on the language Algol60 in the late 1950s, John While authoring the report on the language Algol60 in the late 1950s, John

Backus developed a convenient short-hand, ably supported by ideas Backus developed a convenient short-hand, ably supported by ideas from Peter Naurfrom Peter Naur Backus Normal Form, AKA Backus Naur Form Typical metasymbols in the Algol60 report ::= | <> [] [ .. ] encloses an optional phrase; allowed once or not at all < .. > defines the non-terminal enclosed; allows disambiguation between,

say, nonterminal <start> and terminal symbol start ::= is the “produces” symbol; we’ll use a simpler one | starts another alternative for a production

The notation found wide acceptance; extended to allow multiple options, The notation found wide acceptance; extended to allow multiple options, by using the by using the { .. }{ .. } metasymbols metasymbols { .. } states that the .. part is included 0 or more times { .. }+ states that the .. part is included 1 or more times [ .. ] states that the .. part is optional, i.e. included once or not at all Hence called EBNF for Extended BNF

Page 9: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

9

Grammar G1Metasymbol Metasymbol :: means “produces”means “produces”

Metasymbol Metasymbol || means “r.h.s. also produces …” i.e. offer another means “r.h.s. also produces …” i.e. offer another alternativealternative

Nonterminals Nonterminals ee and and nn

Terminals Terminals + - * / ^ ( ) 0 1 2 3 4 5 6 7 8 9+ - * / ^ ( ) 0 1 2 3 4 5 6 7 8 9

Start Symbol Start Symbol ee

Grammar GGrammar G11

e : e + n -- addition

| e - n -- subtraction

| e * n -- multiplication

| e / n -- division

| e ^ n -- exponentiation, lots of left-recursion

| ( e ) -- grouping

| n -- non-terminal for 10 terminals

n : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 10: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

10

Grammar G2

Rewrite GRewrite G11 suitable for RD parsing, introduce metasymbols { } for suitable for RD parsing, introduce metasymbols { } for repetition 0 or more times; see Grepetition 0 or more times; see G22

expression: term { plus_op term }

plus_op : + | -

term : factor { mult_op factor }

mult_op : * | /

factor : primary { ^ primary }

primar : ( expression )| number

number : 0 | 1 | 2 | 3 | 4

| 5 | 6 | 7 | 8 | 9

note that position of semantic action effectively defines precedence; note that position of semantic action effectively defines precedence; important for ^, which is right-associative! Others are usually left-important for ^, which is right-associative! Others are usually left-associative; except in APL! We won’t cover semantics in CS 410associative; except in APL! We won’t cover semantics in CS 410

Page 11: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

11

Strings in G1 88

0+70+7

6*6-4+26*6-4+2

2*(3+2)2*(3+2)

(((7)))(((7)))

((9)+8)*(((5-4)/2)/0)((9)+8)*(((5-4)/2)/0)

Discussion of operator precedence:Discussion of operator precedence: In regular arithmetic, * and / have stronger binding than + and -, AKA

precedence; yet G1 alone cannot express that!!

i.e. the expression 2+3*4 means 14 in arithmetic, NOT 20

However: Parser discussed does not account for precedences! Can encode this in grammar too, but not covered here, since we do not include semantics discussion, i.e. code generation

Page 12: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

12

Suitable GrammarDefinition: Definition: Parsing Parsing means “analyzing a string for grammatical means “analyzing a string for grammatical

correctness, according to the rules of language L”correctness, according to the rules of language L”

Definition: A Definition: A programprogram written in language L is a string of terminal written in language L is a string of terminal symbols; these symbols are strung together according to the grammar symbols; these symbols are strung together according to the grammar rules of Lrules of L

Such a program can be Such a program can be emptyempty only if there is a way for the start symbol to only if there is a way for the start symbol to generate generate lambdalambda

We parse program strings in a We parse program strings in a top downtop down fashion. fashion. Top down means: we Top down means: we start with the topmost nonterminal, AKA start with the topmost nonterminal, AKA start symbolstart symbol, regenerating the , regenerating the terminals from the input stream one symbol (i.e. terminals from the input stream one symbol (i.e. terminalterminal) at a time. ) at a time. Other methods exist not mentioned here; yes, named Other methods exist not mentioned here; yes, named bottom-upbottom-up

When we see several alternatives during the parse that may have created When we see several alternatives during the parse that may have created this program so far, we this program so far, we look-ahead look-ahead one source symbol to determine one source symbol to determine the correct next alternativethe correct next alternative

Thus was coined the short-hand LL(1): Thus was coined the short-hand LL(1): LLeft-to-right reading symbols, eft-to-right reading symbols, LLeft-eft-to-right grammar use, to-right grammar use, 11 symbol look-ahead. Notation: symbol look-ahead. Notation: LL(1)LL(1)

Page 13: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

13

Suitable GrammarA grammar G is suitable for LL(1) parsing, if it adheres to certain A grammar G is suitable for LL(1) parsing, if it adheres to certain

restrictions, aside from being meaningful:restrictions, aside from being meaningful:

No lambda productions:No lambda productions: Except for the start symbol, no other nonterminal is allowed to generate the

empty string; reason is, a parser can always succeed finding an empty string, so there is no real information in finding lambda

You learn detail in the compiler course CS 321/322

No left-recursive rules:No left-recursive rules: In presence of left-recursive rules, the resulting parser we write would cause

infinite regress; i.e. self-recursive calls until stack overflow Detail in the compiler course

No circular productions:No circular productions: There cannot productions of the type a : a … - without intermediate productions! a : b … b : a … - with some intermediate productions!

No context-sensitive rules:No context-sensitive rules: Two or more non-terminals do not occur on the left side of a production: a b : some sequence – is not permitted

Page 14: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

14

Uses of Grammar to Parse1.) Once we have a suitable grammar G, use G to mechanically 1.) Once we have a suitable grammar G, use G to mechanically

(automatically) design a parser for language L(G). The method is (automatically) design a parser for language L(G). The method is named “Recursive Descent Parsing”; common, old method, named “Recursive Descent Parsing”; common, old method, outlined belowoutlined below

2.) Once we have a suitable grammar G, encode G directly as a data 2.) Once we have a suitable grammar G, encode G directly as a data structure. Then write a simple loop that reads the source and structure. Then write a simple loop that reads the source and traverses the data structure driven by the incoming token traverses the data structure driven by the incoming token stream, deciding at each point, which production of G to use stream, deciding at each point, which production of G to use that would allow the current source symbolthat would allow the current source symbol

3.) If indeed a person can “mechanically implement a parser for all 3.) If indeed a person can “mechanically implement a parser for all strings in L” given G, then a program can do so as well; Church strings in L” given G, then a program can do so as well; Church Thesis. These programs exist and are called Thesis. These programs exist and are called parser generatorsparser generators. . Their inventors sometimes call them “Compiler Compilers”; Their inventors sometimes call them “Compiler Compilers”; sounds fancier. A widely used industrial quality parser sounds fancier. A widely used industrial quality parser generator is YACC, so named after the tongue-in cheek phrase: generator is YACC, so named after the tongue-in cheek phrase: Yet Another Compiler Compiler. Available on Unix systemsYet Another Compiler Compiler. Available on Unix systems

Page 15: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

15

Now for the MAIN idea:

Page 16: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

16

Recursive DescentGoal: Describe an algorithm for mechanically producing a parser for Goal: Describe an algorithm for mechanically producing a parser for

language language L(G)L(G) using grammar using grammar GG

Preparation: Write a scanner, AKA lexical analyzer Preparation: Write a scanner, AKA lexical analyzer scan()scan() that reads the that reads the source program one character at a time, and returns a token source program one character at a time, and returns a token tt for each for each string of characters constituting a whole token, AKA lexeme. Lambda string of characters constituting a whole token, AKA lexeme. Lambda is not one of the possible tokens; and then:is not one of the possible tokens; and then:

For each nonterminal For each nonterminal nn defined in defined in GG,, define a recursive define a recursive function/procedure by that name function/procedure by that name n() n() –we’ll skip some nonterminals–we’ll skip some nonterminals

For each nonterminal For each nonterminal nn used on the right-hand-side in used on the right-hand-side in GG, issue a call , issue a call to to n()n()

For each terminal For each terminal tt that is required by any alternative in that is required by any alternative in GG, call , call must_be( t )must_be( t ) verify verify tt was found, and was found, and scan()scan() the next token after the next token after tt

When a production has multiple alternatives, use the mutually When a production has multiple alternatives, use the mutually exclusive first-sets of each nonterminal and the next input token exclusive first-sets of each nonterminal and the next input token tt (i.e. (i.e. look-ahead 1) to determine, which nonterminal look-ahead 1) to determine, which nonterminal nn to call; if the first-set to call; if the first-set does not resolve this: not a suitable grammar! does not resolve this: not a suitable grammar!

Page 17: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

17

Recursive Descent Parser For sGrammar GGrammar G00::

ss :: ( s ) s( s ) s||

Sample strings in L(GSample strings in L(G00):):

() () or or ((())) ((())) oror ()()() ()()() but not but not )()(

scan(): For such simple tokens –AKA lexemes– consisting of scan(): For such simple tokens –AKA lexemes– consisting of single characters single characters ’(’’(’ and and ’)’ ’)’, scanner can be as simple as , scanner can be as simple as the C/C++ function the C/C++ function getchar() getchar() Generally, tokens are multi-Generally, tokens are multi-character symbolscharacter symbols

Function Function must_be( t )must_be( t ) simply checks for expected symbol simply checks for expected symbol tt::

// assume global: char NextChar, void function scan()// assume global: char NextChar, void function scan()void must_be( char expected )void must_be( char expected ){ // must_be{ // must_be

if ( NextChar != expected ) {if ( NextChar != expected ) {printf( " Expect ‘%c', is '%c'.\n", expected, NextChar );printf( " Expect ‘%c', is '%c'.\n", expected, NextChar );} //end if} //end if

scan();scan();} //end must_be} //end must_be

Page 18: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

18

Recursive Descent Parser For parens()void scan( )void scan( ){ // scan{ // scan

next_char = getchar();next_char = getchar(); // read next input character// read next input characterif ( BLANK == next_char ) {if ( BLANK == next_char ) { // skip ’ ’// skip ’ ’

scan();scan();}else{}else{

printf( "%c", next_char );printf( "%c", next_char ); // echo the non-blank found// echo the non-blank found} //end if} //end if

} // end scan} // end scan

void parens()void parens(){ // parens{ // parens

if ( next_char == OPEN ) {if ( next_char == OPEN ) { // that is open parenthesis ‘(‘// that is open parenthesis ‘(‘scan();scan();parse_parens();parse_parens(); // recurse for nested ( (// recurse for nested ( (must_be( CLOSED );must_be( CLOSED ); // i.e. closed parenthesis ‘)’// i.e. closed parenthesis ‘)’parse_parens();parse_parens(); // recurse for sequence ( ) ( )// recurse for sequence ( ) ( )

} //end if} //end if // no more OPEN found; return// no more OPEN found; return} //end parens} //end parens

int main()int main(){ // main{ // main

scan();scan(); // get first ever token// get first ever tokenparens();parens(); // language// languageAssert( EOF, “Garbage found” );Assert( EOF, “Garbage found” );

} //end main} //end main

Page 19: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

19

Repeat of Grammar G2

expression : term { plus_op term }

plus_op : + | -

term : factor { mult_op factor }

mult_op : * | /

factor : primary { ^ primary }

primar : ( expression )| number

number : 0 | 1 | 2 | 3 | 4

| 5 | 6 | 7 | 8 | 9

Page 20: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

20

Parser For G2 expression, 1// parser for grammar G2:// parser for grammar G2:////// expression// expression : term { plus_op term }: term { plus_op term }// plus_op// plus_op : '+' | '-': '+' | '-'// term// term : factor { mult_op factor }: factor { mult_op factor }// mult_op// mult_op : '*' | '/': '*' | '/'// factor// factor : primary { ^ primary }: primary { ^ primary }// primary// primary : '(' expression ')': '(' expression ')'//// | number| number// number// number : '0' | '1' | '2' ... '9': '0' | '1' | '2' ... '9'////

#include <stdio.h>#include <stdio.h>

#define #define BLANKBLANK ' '' '#define #define EOLEOL '\n''\n'#define#define OPENOPEN '(''('#define#define CLOSEDCLOSED ')'')'

char next_char = BLANK;char next_char = BLANK; // globally used for "token"// globally used for "token"

#define ASSERT( c )#define ASSERT( c ) \\if ( next_char != c ) {if ( next_char != c ) { \\

printf( "Error, expected '%c', found '%c'\n", c, next_char );printf( "Error, expected '%c', found '%c'\n", c, next_char ); \\} else{} else{ \\

scan();scan(); \\} //end if} //end if

void scan( )void scan( ){ // scan{ // scan

next_char = getchar();next_char = getchar();if ( BLANK == next_char ) {if ( BLANK == next_char ) {

scan();scan();}else{}else{

printf( "%c", next_char );printf( "%c", next_char ); // echo non-blank found// echo non-blank found} //end if} //end if

} // end scan} // end scan

void expression();void expression(); // forward announcement!!// forward announcement!!

Page 21: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

21

Parser For G2 expression, 2

// really just scans a digit// really just scans a digit// but one is expected; if not found: error// but one is expected; if not found: errorvoid number()void number(){ // number { // number

if ( ( next_char >= '0' ) && ( next_char <= '9' ) ) {if ( ( next_char >= '0' ) && ( next_char <= '9' ) ) { scan();scan();

}else{}else{ printf( "primary expression 0,1,2 .. or '(' expected.\n" );printf( "primary expression 0,1,2 .. or '(' expected.\n" );} //end if} //end if

} //end number} //end number

// parse primary expression, either:// parse primary expression, either:// ( ... ) or a number// ( ... ) or a numbervoid primary()void primary(){ // primary{ // primary

if ( next_char == OPEN ) {if ( next_char == OPEN ) { scan();scan(); expression();expression(); ASSERT( CLOSED );ASSERT( CLOSED );

}else{}else{ number();number();

} //end if} //end if} //end primary} //end primary

Page 22: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

22

Parser For G2 expression, 3// parse highest priority operator ^// parse highest priority operator ^void factor()void factor(){ // factor{ // factor

primary();primary();while ( next_char == '^' ) {while ( next_char == '^' ) {

scan();scan(); primary();primary();

} //end while} //end while} //end factor} //end factor

// parse multiply operators; skip mult_op nonterminal// parse multiply operators; skip mult_op nonterminalvoid term()void term(){ // term{ // term

factor();factor();while ( ( next_char == '*' ) || ( next_char == '/' ) ) {while ( ( next_char == '*' ) || ( next_char == '/' ) ) {

// note: abbreviation from “mult_op()”// note: abbreviation from “mult_op()” scan();scan(); factor();factor();

} //end while} //end while} //end term} //end term

// parse adding operators + and 0, skip plus_op nonterminal// parse adding operators + and 0, skip plus_op nonterminalvoid expression()void expression(){ // expression{ // expression

term();term();while ( ( next_char == '+' ) || ( next_char == '-' ) ) {while ( ( next_char == '+' ) || ( next_char == '-' ) ) { // note: abbreviation from “add_op()”// note: abbreviation from “add_op()”

scan();scan(); term();term();

} //end while} //end while} //end expression} //end expression

Page 23: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

23

Parser For G2 expression, 4// get first token// get first token// then parse complete expression// then parse complete expression// assert no more source after expression// assert no more source after expression////int main()int main(){ // main{ // main

scan();scan();expression();expression();ASSERT( EOL );ASSERT( EOL );return 0;return 0;

} //end main} //end main

Page 24: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

24

Sample Input for expression e()

( ( ( 5 + 3* 3 ) / ( 5^6 ) - 2 ) ^ ( 2 ^ 6 ^ 7 ) )( ( ( 5 + 3* 3 ) / ( 5^6 ) - 2 ) ^ ( 2 ^ 6 ^ 7 ) )

Page 25: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

25

A Parsing VariationWe broke the general rule for Recursive Descent Parsing, namely We broke the general rule for Recursive Descent Parsing, namely

defining a recursive function for each non-terminal symbols defining a recursive function for each non-terminal symbols in Gin G

For example, we coded the scanning of operators (such as + and For example, we coded the scanning of operators (such as + and -, or the * and / ) directly in-line-, or the * and / ) directly in-line

Using a while loop to parse one or more of the [repeated] Using a while loop to parse one or more of the [repeated] operators insteadoperators instead

In such cases, the semantic actions can be associated with the In such cases, the semantic actions can be associated with the operator just scanned in a left-to-right fashionoperator just scanned in a left-to-right fashion i.e. the semantic actions are done left-associatively

An equally elegant way is to use an If-Statement and call the An equally elegant way is to use an If-Statement and call the parsing function directly recursivelyparsing function directly recursively Easily allowing right-associative semantic actions Recursion parses multiple operators of the same precedence

Page 26: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

26

Change Grammar G2 to G3

expression : term [ plus_op expression ]

plus_op : + | -

term : factor [ mult_op term ]

mult_op : * | /

factor : primary [ ^ factor ]

primary : ( expression )| number

number : 0 | 1 | 2 | 3 | 4

| 5 | 6 | 7 | 8 | 9

Page 27: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

27

Modified Parse For G3// parse highest priority operator ^// parse highest priority operator ^void factor()void factor(){ // factor{ // factor

primary();primary();if ( ‘^’ == next_char ) {if ( ‘^’ == next_char ) {

scan();scan(); factor();factor(); // <- parse repeated ^ operators// <- parse repeated ^ operators

} //end if} //end if} //end factor} //end factor

// parse multiply operators; skip mult_op nonterminal// parse multiply operators; skip mult_op nonterminalvoid term()void term(){ // term{ // term

factor();factor(); if ( ( next_char == '*' ) || ( next_char == '/' ) ) {if ( ( next_char == '*' ) || ( next_char == '/' ) ) {

// note: abbreviation from “mult_op()”// note: abbreviation from “mult_op()” scan();scan(); term(); term(); // <- parse repeated * and / operators// <- parse repeated * and / operators

} //end if} //end if} //end term} //end term

// parse adding operators + and 0, skip plus_op nonterminal// parse adding operators + and 0, skip plus_op nonterminalvoid expression()void expression(){ // expression{ // expression

term();term(); if ( ( next_char == '+' ) || ( next_char == '-' ) ) {if ( ( next_char == '+' ) || ( next_char == '-' ) ) { // note: abbreviation from “add_op()”// note: abbreviation from “add_op()”

scan();scan(); expression(); // <- parse repeated + and - operatorsexpression(); // <- parse repeated + and - operators

} //end if} //end if} //end expression} //end expression

Page 28: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

28

Data Structure and GrammarTo be handled in compiler courseTo be handled in compiler course

Possibly a future extension at CS 410/510Possibly a future extension at CS 410/510

Page 29: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

29

Grammar G4 For Statement s()

s : statement [ s ]

statement : if_statement

| assign_statement

if_statement : IF_SYM expression THEN_SYM statement

[ ELSE_SYM statement ] FI_SYM ‘;’

assign_statement : ident ‘=’ expression ‘;’

-- separate ideas:

expression : as discussed earlier

*_SYM ; these are tokens returned by scan()

Page 30: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

30

Parser For G4 Statements s(), Part 1void s();void s(); // forward announcement// forward announcement

void assign_statement()void assign_statement()

{ // assign_statement{ // assign_statement

must_be( ident );must_be( ident );

must_be( assign_sym );must_be( assign_sym );

expression();expression();

must_be( semi_sym );must_be( semi_sym );

} //end assign_statement} //end assign_statement

Page 31: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

31

Parser For G4 Statements s(), Part 2void if_statement()void if_statement(){ // if_statement{ // if_statement

must_be( if_sym );must_be( if_sym );expression();expression();must_be( then_sym );must_be( then_sym );s();s();if ( else_sym == token ) {if ( else_sym == token ) {

scan();scan();s();s();

} //end if} //end ifmust_be( fi_sym );must_be( fi_sym );must_be( semi_sym );must_be( semi_sym );

} //end if_statement} //end if_statement

Page 32: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

32

Parser For G4 Statements s(), Part 3void statement()void statement(){ // statement{ // statement

if ( if_sym == token ) {if ( if_sym == token ) {if_statement();if_statement();

}else{}else{assign_statement();assign_statement();

} //end if} //end if} //end statement} //end statement

void s()void s(){ // s{ // s

statement();statement();// use first-set: more statements?// use first-set: more statements?if ( ( if_sym == token ) || ( ident == token ) ) {if ( ( if_sym == token ) || ( ident == token ) ) {

s();s();} //end if} //end if

} //end s} //end s

Page 33: 1 CS 410 Mastery in Programming Chapter 5 LL(1) Parsing Herbert G. Mayer, PSU CS status 7/17/2011

33

References1.1. Algol-60 Report: http://www.masswerk.at/algol60/report.htmAlgol-60 Report: http://www.masswerk.at/algol60/report.htm

2.2. John Backus, John Backus, http://www-03.ibm.com/ibm/history/exhibits/builders/builders_http://www-03.ibm.com/ibm/history/exhibits/builders/builders_backus.htmlbackus.html

3.3. BNF: BNF: http://cui.unige.ch/db-research/Enseignement/analyseinfo/Abohttp://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.htmlutBNF.html

4.4. ISO EBNF: http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.htmlISO EBNF: http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html

5.5. Left-Recursion elimination, see: Herbert G Mayer, Left-Recursion elimination, see: Herbert G Mayer, “Programming Languages”, © 1988 MacMillan Publishing Co., “Programming Languages”, © 1988 MacMillan Publishing Co., ISBN: 0-02-378295-1ISBN: 0-02-378295-1

6.6. Church Thesis: http://plato.stanford.edu/entries/church-turing/Church Thesis: http://plato.stanford.edu/entries/church-turing/

7.7. YACC: http://dinosaur.compilertools.net/yacc/YACC: http://dinosaur.compilertools.net/yacc/