thoughts on assigment3 recursive-descent parsing continued...feb 13, 2017  · thoughts on...

35
Thoughts on Assigment 3 Recursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 13, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2017 Glenn G. Chappell continued

Upload: others

Post on 21-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assigment 3Recursive-Descent Parsing

CS F331 Programming LanguagesCSCE A331 Programming Language ConceptsLecture SlidesMonday, February 13, 2017

Glenn G. ChappellDepartment of Computer ScienceUniversity of Alaska [email protected]

© 2017 Glenn G. Chappell

continued

Page 2: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [1/8]

In Assignment 3 you will be writing a state-machine lexer, in the form of Lua module lexit. It will be similar to module lexer(written in class), but it will involve a different lexeme specification.

In Assignment 4, you will write a Recursive-Descent parser that uses your lexer. And in Assignment 6 you will write an interpreter that uses your parser. The result will be an implementation of a programming language called Kanchil.§ You’ve never heard of Kanchil? That’s because I invented it last

week.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 2

Page 3: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [2/8]

The basic lexical structure of Kanchil is similar to that behind lexer.lua: whitespace is all generally the same, comments are skipped, legal characters are whitespace & printable ASCII, and comments may contain any character at all.

Two notable differences from lexer.lua:§ Comments begin with “#” and go to the end of the line.§ The preferOp strategy is used to deal with the k-4 / k – 4 issue.

In particular, the longest-lexeme rule generally applies, as in lexer.lua, with one exception. If function preferOp is called, and the next lexeme begins with “+”, “-”, or “%”, then a single-character operator is returned, regardless of the following character.

See the lecture slides for February 8, slide #23, for ideas about implementing function preferOp.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 3

Page 4: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [3/8]

Kanchil has 8 lexeme categories:§ Keyword§ VariableIdentifier§ SubroutineIdentifier§ NumericLiteral§ StringLiteral§ Operator§ Punctuation§ Malformed

We look briefly at each of these.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 4

Page 5: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [4/8]

A Keyword is one of the following 13:call cr else elseif end false if input print set sub true while

A VariableIdentifier is like a C identifier with “%” prepended. There are no reserved words. Examples:%myvar %else %HelloThere %abc_39

A SubroutineIdentifier is like a C identifier with “&” prepended. Again, there are no reserved words. Examples:&mysub &true &HelloThere &abc_39

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 5

Page 6: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [5/8]

A NumericLiteral is either a sequence of one or more digits with an optional “+” or “-” prepended:7 34567 +9876 0001000 -000

… or the above followed by an exponent, which is a letter “e” or “E”, an optional “+”, and then one or more digits:7e34 34567E+1 +9876e0 0001000E1234 -000e+0

A NumericLiteral cannot include a dot (“.”). An exponent cannot include “-”.

A StringLiteral is a quote (single or double) followed by zero or more characters, followed by a quote that is the same as the one at the beginning. Any character may appear in a StringLiteral. There are no backslash escapes. Examples:"" 'abc x' "'" '_#%3'

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 6

Page 7: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [6/8]

An Operator is one of the following 17:&& || ! == != < <= > >= + - * / % [ ] :

Malformed lexemes come in 3 kinds: bad string, bad character, bad keyword.§ A bad string is like a partial StringLiteral where the input ends

before the ending quote appears. Examples (each of the following must end at the end of the input):

"abc 'xyz "x'y§ A bad character is a single illegal character, as in lexer.lua.§ A bad keyword is a sequence of one or more letters (only letters)

that does not form a Keyword. Examples:calling abcd END wHiLe

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 7

Page 8: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [7/8]

Lastly, a Punctuation is a single character that is not whitespace, not part of a comment, and not part of a lexeme in any of the other categories. Examples:& ( } , . @ =

Note that the definition of the Punctuation category means that all possible input texts can be lexed. Your lexer will never run across a character that it does not know how to deal with.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 8

Page 9: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Thoughts on Assignment 3 [8/8]

Where are we going with this? Here is a sample Kanchil program.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

# Subroutine &fibo

# Given %k, set %fibk to F(%k),

# where F(n) = nth Fibonacci no.

sub &fibo

set %a: 0 # Consecutive Fibos

set %b: 1

set %i: 0 # Loop counter

while %i < %k

set %c: %a+%b # Advance

set %a: %b

set %b: %c

set %i: %i+1 # ++counter

end

set %fibk: %a # Result

end

# Get number of Fibos to output

print "How many Fibos to print: "

input %n

cr

# Print requested number of Fibos

set %j: 0 # Loop counter

while %j < %n

set %k: %j

call &fibo

print "F("

print %j

print ") = "

print %fibk cr

set %j: %j + 1

end

9

Page 10: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewOverview of Lexing & Parsing

Two phases:§ Lexical analysis (lexing)§ Syntax analysis (parsing)

The output of a parser is often an abstractsyntax tree (AST). Specifications of these can vary.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

ParserLexemeStream

ASTor Error

cout << ff(12.6);

id op id litop

punctop

expr

binOp: <<

expr

id: cout funcCall

expr

id: ff

numLit: 12.6

expr

LexerCharacter

Streamcout << ff(12.6);

Parsing

10

Page 11: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewIntroduction to Syntax Analysis — Basics

Syntax analysis is usually based on a context-free grammar (CFG).Recall the notion of a derivation: begin with the start symbol,

apply productions one by one, ending with a string of terminals.

CFG (start symbol: item)item → “(” item “)”item → thingthing → IDthing → “%”

A general method for doing parsing is aparsing algorithm. A typical parsing algorithm is usable with any number of CFGs. When we implement a parsing algorithm to handle a particular CFG, we have a parser: a code module that does parsing.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

Derivationitem(item)((item))((thing))((ID))

Here, ID is a lexeme category. The actual string might be something like “((abc_39))”.

11

Page 12: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewIntroduction to Syntax Analysis — Categories of Parsers

Parsing algorithms can be divided into two broad categories.Top-Down Parsing Algorithms

§ Go through derivation from top to bottom, expanding nonterminals.§ Usually produce a leftmost derivation.§ Important subclass: LL (read Left-to-right, Leftmost derivation).

§ Common reason a top-down parser may not be LL: doing lookahead.§ Often hand-coded.§ Example algorithm we will look at: Recursive Descent.

Bottom-Up Parsing Algorithms§ Go through the derivation from bottom to top, collapsing substrings

to nonterminals.§ Usually produce a rightmost derivation.§ Important subclass: LR (read Left-to-right, Rightmost derivation).

§ Common reason a bottom-up parser may not be LR: doing lookahead.§ Almost always automatically generated.§ Example algorithm we will look at: Shift-Reduce.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 12

Page 13: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewIntroduction to Syntax Analysis — Categories of Grammars [1/2]

As a rule, a fast parsing algorithm is not capable of handling all CFGs. For each category of parsing algorithms (LL, LR, etc.), there is an associated category of grammars that such algorithms can handle.

The grammars that LL parsers can handle are called LL grammars. Similarly, LR parsers can handle LR grammars.

Curiously, every LL grammar is an LR grammar, while there are LR grammars that are not LL grammars. So LR parsing algorithms are more general. Note, however, that when we write a compiler for a programming language, we only need one grammar, and if it is an LL grammar, then an LL parser works fine.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 13

Page 14: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewIntroduction to Syntax Analysis — Categories of Grammars [2/2]

The following diagram shows the relationship between various categories of grammars.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

All Grammars

LR Grammars

CFGs

LL Grammars

14

Page 15: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewRecursive-Descent Parsing — Introduction, How It Works

Our first parsing algorithm: Recursive Descent.§ A top-down parsing algorithm.§ In the LL category—if multiple-lexeme lookahead is not used.§ Almost always hand-coded.§ Has been known for decades. Still in common use.§ Is implemented differently for each grammar.

Writing a Recursive-Descent parser:§ There is one parsing function for each nonterminal.§ A parsing function is responsible for parsing all strings that its

nonterminal can be expanded into.§ Code for a parsing function is a translation of the right-hand side of

the production for this nonterminal.Parsing functions are mutually recursive.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 15

Page 16: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewRecursive-Descent Parsing — Example #1: Simple

We wrote a Recursive-Descent parser based on Grammar 1, below, whose start symbol is item. We began by combining productions with a common left-hand side.

Grammar 1item → “(” item “)”item → thingthing → IDthing → “%”

Our parser does not generate an AST. It simply returns true on a correct parse, or false on error.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

See rdparser1.lua.

Grammar 1aitem → “(” item “)”

| thingthing → ID

| “%”

16

Page 17: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewRecursive-Descent Parsing — Handling Incorrect Input [1/3]

Our parser says the following are both syntactically correct:§ ((x)))§ a,b,c

Why?

From a parsing function’s point of view, these are correct strings with extra characters at the end.

The parsing functions are doing their jobs correctly, but the parser is not giving us the information we need. What can we do about this?

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 17

Page 18: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewRecursive-Descent Parsing — Handling Incorrect Input [2/3]

One common solution is to add another lexeme category: end of input. Standard notation for this: $. Then add a new start symbol, and augment the grammar with one more production, of the form newStartSymbol → oldStartSymbol $.

The following would be our new grammar, with start symbol all.

Grammar 1ball → item $item → “(” item “)”

| thingthing → ID

| “%”

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

This is only an example.I did not use this idea

in our parser.

18

Page 19: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

ReviewRecursive-Descent Parsing — Handling Incorrect Input [3/3]

Another solution is to add an extra check at the end of parsing, to see whether all lexemes have been read. If we do this, then the grammar is unchanged, and the parsing functions are the same.

A correct parse of the entire input then requires two conditions:§ The parsing function for the start symbol indicates a correct parse.§ All lexemes have been read.

The above solution works better with the interactive environment that we will eventually write. So I will be using it in all of our Recursive-Descent parsers.

Our parser now implements the above idea.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

See rdparser1.lua.

19

Page 20: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #2: More Complex [1/2]

Now let’s write a Recursive-Descent parser for the following more complex grammar, whose start symbol is still item.

Grammar 2item → “(” item “)”

| thingthing → ID { ( “,” | “:” ) ID }

| “%”| [ “*” “-” ] “[” item “]”

All strings in the old language are also in the new language. But now we can get strings like these:§ ((a,b,c:d))§ ((*-[([%])]))

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

continued

Recall:

Braces mean optional, repeatable (0 or more).

Brackets mean optional (0 or 1).

Note the difference between the following:

[ “[”

20

Page 21: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #2: More Complex [2/2]

Grammar 2item → “(” item “)” | thingthing → ID { ( “,” | “:” ) ID } | “%” | [ “*” “-” ] “[” item “]”

In a parsing function:[ … ] Brackets (optional: 0 or 1) become a conditional.

§ Check for the possible initial lexemes inside the brackets. If found, parse everything inside the brackets. Otherwise skip the brackets.

{ … } Braces (optional, repeatable: 0 or more) become a loop.§ Loop body: Check for the possible initial lexemes inside the braces.

If not found, then exit loop, moving to just after the braces. If found, parse everything inside the braces, and then REPEAT.

TO DO§ Write a Recursive-Descent parser based on Grammar 2.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

Done. See rdparser2.cpp.

21

Page 22: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #3: Expressions [1/5]

Now we bump up our standards. We wish to parse arithmetic expressions in their usual form, with variables, numeric literals, binary +, -, *, and / operators, and parentheses. When given a syntactically correct expression, our parser should return an abstract syntax tree (AST).§ All operators will be binary and left-associative, so that, for

example, “a + b + c” means “(a + b) + c”.§ Precedence will be as usual, so that “a + b * c” means

“a + (b * c)”.§ Precedence and associativity may be overridden using parentheses:

“(a + b) * c”.

Due to the limitations of our lexer, the expression “k-4” will need to be rewritten as “k - 4”.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 22

Page 23: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #3: Expressions [2/5]

We begin with the following grammar, with start symbol expr.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

Grammar 3 encodes our associativity and precedence rules.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 23

Page 24: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #3: Expressions [3/5]

To the right is part of a parsing function for nonterminal expr.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

What is wrong with this code?

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

function parse_expr()if parse_term() then

return trueelseif parse_expr() then

24

Page 25: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #3: Expressions [4/5]

function parse_expr()if parse_term() then

return trueelseif parse_expr() then

What is wrong with this code?§ First, if the call to parse_term returns false, then the position in the

input may have changed. Fixing this requires backtracking, which can lead to extreme inefficiency.

§ But even if we can solve that, there is a more serious problem. Suppose parse_expr is called with input that does not begin with a valid term. What happens? Answer: infinite recursion!

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 25

Page 26: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingExample #3: Expressions [5/5]

In fact, without lookahead, it is impossible to write a Recursive-Descent parser for Grammar 3.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

Recall that a Recursive-Descent parser requires an LL grammar. But Grammar 3 is not an LL grammar. Next we look at LL grammars. We return to the expression-parsing problem later.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 26

Page 27: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [1/8]

An LL grammar is a CFG that can be handled by an LL parsing algorithm, such as Recursive Descent, if multiple-lexeme lookahead is not done.

Recall the origin of the name: these parsers handle their input in a strictly Left-to-right order, and they go through the steps required to generate a Leftmost derivation.

Now we look at some of the properties that an LL grammar must have.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 27

Page 28: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [2/8]

Consider the following grammar.

Grammar Axx → xx “+” “b” | “a”

A parsing function would begin:

function parse_xx()if parse_xx() then

We have recursion without a base-case check.The trouble lies in the grammar. The right-hand side of the

production for xx begins with xx. This is left recursion. It is not allowed in an LL grammar.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 28

Page 29: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [3/8]

Left recursion can be more subtle. Below is a variation on Grammar A.

Grammar Axxx → yy “b” | “a”yy → xx “+”

Grammar Ax also contains left recursion. It is not LL.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 29

Page 30: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [4/8]

The grammar below illustrates a more general problem.

Grammar Bxx → “a” yy | “a” zzyy → “*”zz → “/”

We cannot even being to write a Recursive-Descent parser for Grammar B. How would the code for function parse_xx begin? Should it take the first or second option? There is no way to tell, without lookahead.

We say the first production in Grammar B is not left-factored. An LL grammar can only contain left-factored productions.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 30

Page 31: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [5/8]

Here is another problematic grammar.

Grammar Cxx → yy | zzyy → “” | “a”zz → “” | “b”

In Grammar C, the empty string can be derived from either yy or zz. So if there is no more input, then there is no basis for making the xx-or-yy decision in the first production.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 31

Page 32: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [6/8]

One last non-LL grammar.

Grammar Dxx → yy “a”yy → “a” | “”

The strings “a” and “aa” lie in the language generated by Grammar D. But imagine a Recursive-Descent parser based on Grammar D, attempting to parse these strings. What would happen?

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 32

Page 33: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [7/8]

It turns out that the problems presented by Grammars A–D illustrate all the reasons a CFG might not be LL.

Fact.* Suppose that a context-free grammar G has the following three properties.1. If A → α and A → β are productions in G, then there do not exist

two strings, one derived from α, the other derived from β, that begin with the same (terminal) symbol.

2. If A → α and A → β are productions in G, then it is not the case that the empty string can be derived from both α and β.

3. If A → α and A → β are productions in G, and the empty string can be derived from β, then there is no (terminal) symbol x that begins a string that can be derived from α, such that x can follow a string derived from A.

Then Grammar G is an LL grammar.

*Adapted from A.V. Aho, R. Sethi, and J.D. Ullman,Compilers: Principles, Techniques, and Tools, 1986, p. 192.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017

(1) does not hold for Grammars A, AA, and B; (2) does not hold

for Grammar C; and (3) does not hold for Grammar D.

33

Page 34: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingLL Grammars — Properties [8/8]

In addition:

Fact. Suppose that G is an LL grammar. Then,§ G is not ambiguous, and§ G does not contain left recursion.

In general, when there is a choice to be made, an LL parser must be able to make that choice based on the current lexeme. If this cannot be done, then the grammar is not LL.

Now suppose—as in our expression-parsing example—that we wish to write a Recursive-Descent parser, but our grammar is not LL. What can we do about this?

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 34

Page 35: Thoughts on Assigment3 Recursive-Descent Parsing continued...Feb 13, 2017  · Thoughts on Assignment 3 [5/8] A NumericLiteralis eithera sequence of one or more digits with an optional

Recursive-Descent ParsingTO BE CONTINUED …

Recursive-Descent Parsing will be continued next time.

13 Feb 2017 CS F331 / CSCE A331 Spring 2017 35