lexical and syntax analysis · lexical and syntax analysis top-down parsing . data structure easy...

80
Lexical and Syntax Analysis Top-Down Parsing

Upload: others

Post on 17-Aug-2020

12 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Lexical and Syntax Analysis

Top-Down Parsing

Page 2: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Data structure

Easy for programs

to transform

String of characters

Easy for humans to write and understand

Lexemes identified

String of tokens

Page 3: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Syntax

A syntax is a set of rules defining the valid strings of a language, often specified by a context-free grammar.

For example, a grammar E for arithmetic expressions:

e → x | y | e + e | e – e | e * e | ( e )

Page 4: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Derivations

A derivation is a proof that some string conforms to a grammar.

A leftmost derivation:

e ⇒ e + e ⇒ x + e ⇒ x + ( e ) ⇒ x + ( e * e ) ⇒ x + ( y * e ) ⇒ x + ( y * x )

Page 5: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Derivations

A rightmost derivation:

e ⇒ e + e ⇒ e + ( e ) ⇒ e + ( e * e ) ⇒ e + ( e * x ) ⇒ e + ( y * x ) ⇒ x + ( y * x )

Many ways to derive the same string: many ways to write the same proof.

Page 6: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Parse tree: motivation

Also a proof that a given input is valid according to the grammar. But a parse tree:

is more concise: we don’t write out the sentence every time a non-terminal is expanded.

abstracts over the order in which rules are applied.

Page 7: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Parse tree: intuition

If non-terminal n has a production

n → X Y Z

where X, Y, and Z are terminals or non-terminals, then a parse tree may have an interior node labelled n with three children labelled X, Y, and Z.

n

X Y Z

Page 8: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Parse tree: definition

A parse tree is a tree in which:

the root is labelled by the start symbol;

each leaf is labelled by a terminal symbol, or 𝜀;

each interior node is labelled by a non-terminal;

if n is a non-terminal labelling an interior node whose children are X1, X2, ⋯, Xn then there must exist a production n→ X1 X2 ⋯ Xn.

Page 9: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Example 1

Example input string:

A resulting parse tree according to grammar E:

x + y * x

e

x

+

* e

e

e

y

x

e

Page 10: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Example 2

The following is not a parse tree according to grammar E.

e

x

+

* e

e

e

y

x

Why? Because e → x + e is not a production in grammar E.

Page 11: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Grammar notation

Non-terminals are underlined.

Rather than writing

we may write:

(Also, symbols → and ::= will be used interchangeably.)

e → x e → e + e

e → x | e + e

Page 12: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Syntax Analysis

String of symbols

Parse tree

A parse tree is:

1. A proof that a given input is valid according to the grammar;

2. A data structure that is convenient for compilers to process.

(Syntax analysis may also report that the input string is invalid.)

Page 13: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Ambiguity

If there exists more than one parse tree for any string then the grammar is ambiguous. For example, the string x+y*x has two parse trees:

e

e + e

x e * e

y x

e

* e

e + e

x y

e

x

Page 14: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Operator precedence

Different parse trees often have different meanings, so we usually want unambiguous grammars.

Conventionally, * has a higher precedence (binds tighter) than +, so there is only one interpretation of x+y*x, namely x+(y*x).

Page 15: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Operator associativity

Binary operators are either:

Conventionally, - is left-associative, so there is only one interpretation of x-x-x-x, namely ((x-x)-x)-x.

left-associative;

right-associative;

non-associative.

Even with precedence rules, ambiguity remains, e.g. x-x-x-x.

Page 16: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Ambiguity removal

All operators are left associative, and * binds tighter than + and –.

e → x | y | e + e | e – e | e * e | ( e )

Example input:

Page 17: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Ambiguity removal

Example output:

e → e + e1

| e – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e ) | x | y

Note: ignoring bracketed expressions e1 disallows + and –

e2 disallows +, -, and *

Page 18: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Disallowed parse trees

e

* e

e + e

x y

e

x

LHS of * cannot

contain a +.

RHS of + cannot

contain a -.

e

e + e

x e - e

y x

After disambiguation, there are no parse trees corresponding to the following originals:

Page 19: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Ambiguity removal: step-by-step

Given a non-terminal e which involves operators at n levels of precedence:

Step 1: introduce n+1 new non-terminals, e0 ⋯ en.

Page 20: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Step 2a: replace each production

e → e op e

with

ei → ei op ei+1

| ei+1

if op is left-associative, or

ei → ei+1 op ei

| ei+1

if op is right-associative

Let op denote an operator with precedence i.

Page 21: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Step 2b: replace each production

e → op e

with

ei → op ei

| ei+1

Step 2c: replace each production

e → e op

with

ei → ei op

| ei+1

Page 22: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Grammar E after step 2 becomes:

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e → ( e ) | x | y

Operator Precedence

+, - 0

* 1

Construct the precedence table:

Page 23: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Step 3: replace each production

e → ⋯

with

en → ⋯

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e ) | x | y

After step 3:

Page 24: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Step 4: replace all occurrences of e0 with e.

e → e + e1

| e – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e ) | x | y

After step 4:

Page 25: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 1

Consider the following ambiguous grammar for logical propositions.

p → 0 (Zero) | 1 (One) | ~ p (Negation) | p + p (Disjunction) | p * p (Conjunction)

Now let + and * be right associative and the operators in increasing order of binding strength be : +, *, ~.

Give an unambiguous grammar for

logical propositions.

Page 26: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 2

Which of the following grammars are ambiguous?

s → if b then s | if b then s else s | skip

e → + e e | – e e | x

b → 0 b 1 | 0 1

Page 27: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Homework exercise

Consider the following ambiguous grammar G.

s → if b then s | if b then s else s | skip

Give a unambiguous grammar that accepts the same language as G.

Page 28: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Summary so far

Syntax of a language is often specified by a context-free grammar

Derivations and parse trees are proofs.

Parse trees lead to a concise definition of ambiguity.

Construction of unambiguous grammars using rules of precedence and associativity.

Page 29: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

PART 2: TOP-DOWN PARSING

• Recursive-Descent

• Backtracking

• Left-Factoring

• Predictive Parsing

• Left-Recursion Removal

• First and Follow Sets

• Parsing tables and LL(1)

Page 30: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Top-down parsing

Top-down: begin with the start symbol and expand non-terminals, succeeding when the input string is matched.

A good strategy for writing parsers:

1. Implement a syntax checker to accept or refute input strings.

2. Modify the checker to construct a parse tree – straightforward.

Page 31: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

RECURSIVE DESCENT

A popular top-down parsing technique.

Page 32: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Recursive descent

A recursive descent parser consists of a set of functions, one for each non-terminal.

The function for non-terminal n returns true if some prefix of the input string can be derived from n, and false otherwise.

Page 33: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Consuming the input

int eat(char c) { if (*next == c) { next++; return 1; } return 0; }

Consume c from input if possible.

We assume a global variable next points to the input string.

char* next;

Page 34: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Recursive descent

int N() { char* save = next;

for each N → X1 X2 ⋯ Xn

if (parse(X1) && parse(X2) && ⋯ && parse(Xn)) return 1; else next = save;

return 0; }

For each non-terminal N, introduce:

Let parse(X) denote

X() if X is a non-terminal

eat(X) if X is a terminal

Backtrack

Page 35: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 4

Consider the following grammar G with start symbol e.

Using recursive descent, write a syntax checker for grammar G.

e → ( e + e ) | ( e * e ) | v v → x | y

Page 36: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer (part 1)

int e() { char* save = next;

if (eat('(') && e() && eat('+') && e() && eat(')')) return 1; else next = save;

if (eat('(') && e() && eat('*') && e() && eat(')')) return 1; else next = save;

if (v()) return 1; else next = save;

return 0; }

Page 37: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer (part 2)

int v() { char* save = next; if (eat('x')) return 1; else next = save; if (eat('y')) return 1; else next = save; return 0; }

Page 38: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 5

How many function calls are made by the recursive descent parser to parse the following strings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

(See animation of backtracking.)

Page 39: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer

Input string Length Calls

(x*x) 5 21

((x*x)*x) 9 53

(((x*x)*x)*x) 13 117

Number of calls is quadratic in the length of the input string.

Lesson: backtracking expensive!

String length

Fun

ctio

n c

alls

Page 40: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

LEFT FACTORING

Reducing backtracking!

Page 41: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Left factoring

When two productions for a non-terminal share a common prefix, expensive backtracking can be avoided by left-factoring the grammar.

Idea: Introduce a new non-terminal that accepts each of the different suffixes.

Page 42: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Example 3

Left-factoring grammar G by introducing non-terminal r:

e → ( e r | v r → + e ) | * e ) v → x | y

Common prefix

Different suffixes

Page 43: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Effect of left-factoring

Input string Length Calls

(x*x) 5 13

((x*x)*x) 9 22

(((x*x)*x)*x) 13 31

Number of calls is now linear in the length of input string.

Lesson: left-factoring a grammar reduces backtracking.

String length

Fun

ctio

n c

alls

Page 44: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

PREDICTIVE PARSING

Eliminating backtracking!

Page 45: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Predictive parsing

Idea: know which production of a non-terminal to choose based solely on the next input symbol.

Advantage: very efficient since it eliminates all backtracking.

Disadvantage: not all grammars can be parsed in this way. (But many useful ones can.)

Page 46: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Running example

The following grammar H will be used as a running example to demonstrate predictive parsing.

Example:

e → e + e | e * e | ( e ) | x | y

x+y*(y+x)

Page 47: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Removing ambiguity

Since + and * are left-associative and * binds tighter than +, we can derive an unambiguous variant of H.

e → e + t | t t → t * f | f f → ( e ) | x | y

Page 48: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Left recursion

Problem: left-recursive grammars cause recursive descent parsers to loop forever.

int e() { char* save = next; if (e() && eat('+') && t()) return 1; next = save; if (t()) return 1; next = save; return 0; }

Call to self without consuming any input

Page 49: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Eliminating left recursion

n → 𝛼 n → 𝛼 n' ⟹

n' → 𝛼 n' ⟹ Rule 1

Rule 2

where 𝛼 does not begin with n

Let 𝛼 denote any sequence of grammar symbols.

n' → 𝜀

Rule 3 Introduce new

production

n → n 𝛼

Page 50: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Eliminating left recursion

Example before:

e → e + v | v v → x | y

and after:

e → v e' v → x | y e' → 𝜀 | + v e'

Page 51: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Example 4

Running example, after eliminating left-recursion.

e → t e' e' → + t e' | 𝜀

t → f t' t' → * f t' | 𝜀

f → ( e ) | x | y

Page 52: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

first and follow sets

Predictive parsers are built using the first and follow sets of each non-terminal in a grammar.

Page 53: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Definition of first sets

Let 𝛼 denote any sequence of grammar symbols.

If 𝛼 can derive a string beginning with terminal a then a ∊ first(𝛼).

If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).

Page 54: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Computing first sets

If a is a terminal then a ∊ first(a 𝛼).

If X1X2⋯Xn is a sequence of grammar symbols

and ∃i · a ∊ first(Xi)

and ∀j < i · 𝜀 ∊ first(Xj)

then a ∊ first(X1X2⋯ Xn ).

The empty string 𝜀 ∊ first(𝜀).

If n → 𝛼 is a production then

first( n ) = first(𝛼).

Page 55: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 6

Give all members of the sets:

e → ( e + e ) | ( e * e ) | v v → x | 𝜀

first( v )

first( e )

first( v e )

Page 56: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 7

What are the first sets for each non-terminal in the following grammar.

e → t e' e' → + t e' | 𝜀

t → f t' t' → * f t' | 𝜀

f → ( e ) | x | y

Page 57: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer

first( f ) = { ‘(‘, ‘x’, ‘y’ } first( t' ) = { ‘*’, 𝜀 } first( t ) = { ‘(‘, ‘x’, ‘y’ } first( e' ) = { ‘+’, 𝜀 } first( e ) = { ‘(‘, ‘x’, ‘y’ }

Page 58: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Definition of follow sets

Let 𝛼 and 𝛽 denote any sequence of grammar symbols.

Terminal a ∊ follow(n) if the start symbol of the grammar can derive a string of grammar symbols in which a immediately follows n.

The set follow(n) never contains 𝜀.

Page 59: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

End markers

In predictive parsing, it is useful to mark the end of the input string with a $ symbol.

((x*x)*x)$

$ is equivalent to '\0' in C.

Page 60: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Computing follow sets

If s is the start symbol of the grammar then $ ∊ follow(s).

If n → 𝛼 x 𝛽 then everything in first(𝛽) except 𝜀 is in follow(x).

If n → 𝛼 x

or n → 𝛼 x 𝛽 and 𝜀 ∊ first(𝛽)

then everything in follow(n) is in follow(x).

Page 61: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise

Give all members of the sets:

e → ( e + e ) | ( e * e ) | v v → x | 𝜀

follow( e )

follow( v )

Page 62: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 8

What are the follow sets for each non-terminal in the following grammar.

e → t e' e' → + t e' | 𝜀

t → f t' t' → * f t' | 𝜀

f → ( e ) | x | y

Page 63: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer

follow( e' ) = { $, ‘)’ } follow( e ) = { $, ‘)’ } follow( t' ) = { ‘+’, $, ‘)’ } follow( t ) = { ‘+’, $, ‘)’ } follow( f ) = { ‘*’, ‘+’, ‘)’, $ }

Page 64: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Predictive parsing table

For each non-terminal n, a parse table T defines which production of n should be chosen, based on the next input symbol a.

( + ...

e e → ( e r

r r → + e

v

Terminals

No

n-T

erm

inal

s

Production

Page 65: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Predictive parsing table

for each production n → 𝛼 for each a ∊ first(𝛼) add n → 𝛼 to T[n , a] if 𝜀 ∊ first(𝛼) then for each b ∊ follow(n) add n → 𝛼 to T[n , b]

Page 66: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 9

Construct a predictive parsing table for the following grammar.

e → t e' e' → + t e' | 𝜀

t → f t' t' → * f t' | 𝜀

f → ( e ) | x | y

Page 67: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

LL(1) grammars

If each cell in the parse table contains at most one entry then the a non-backtracking parser can be constructed and the grammar is said to be LL(1).

First L: left-to-right scanning of the input.

Second L: a leftmost derivation is constructed.

The (1): using one input symbol of look-ahead to decide which grammar production to choose.

Page 68: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 10

Write a syntax checker for the grammar of Exercise 9, utilising the predictive parsing table.

int e() { ... }

It should return a non-zero value if some prefix of the string pointed to by next conforms to the grammar, otherwise it should return zero.

Page 69: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer (part 1)

int e() { if (*next == 'x') return t() && e1(); if (*next == 'y') return t() && e1(); if (*next == '(') return t() && e1(); return 0; }

int e1() { if (*next == '+') return eat('+') && t() && e1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }

Page 70: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer (part 2)

int t() { if (*next == 'x') return f() && t1(); if (*next == 'y') return f() && t1(); if (*next == '(') return f() && t1(); return 0; }

int t1() { if (*next == '+') return 1; if (*next == '*‘) return eat('*') && f() && t1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }

Page 71: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Answer (part 3)

int f() { if (*next == 'x') return eat('x'); if (*next == 'y') return eat('y'); if (*next == '(') return eat('(') && e() && eat(')'); return 0; }

(Notice how backtracking is not required.)

Page 72: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Predictive parsing algorithm

Let s be a stack, initially containing the start symbol of the grammar, and let next point to the input string.

while (top(s) != $) if (top(s) is a terminal) { if (top(s) == *next) { pop(s); next++; } else error(); } else if (T[top(s), *next] == X → Y1⋯ Yn) { pop(s); push(s, Yn⋯ Y1) /* Y1 on top */ }

Page 73: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Exercise 11

Give the steps that a predictive parser takes to parse the following input.

x + x * y

For each step (loop iteration), show the input stream, the stack, and the parser action.

Page 74: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Acknowledgements

Plus Stanford University lecture notes by Maggie Johnson and Julie Zelenski.

Page 75: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

APPENDIX

Page 76: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Context-free grammars

Have four components:

1. A set of terminal symbols.

2. A set of non-terminal symbols.

3. A set of productions (or rules) of the form:

where n is a non-terminal and

X1⋯Xn is any sequence of terminals, non-terminals, and 𝜀.

4. The start symbol (one of the non-terminals).

n → X1⋯ Xn

Page 77: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Notation

Non-terminals are underlined.

Rather than writing

we may write:

(Also, symbols → and ::= will be used interchangeably.)

e → x e → e + e

e → x | e + e

Page 78: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Why context-free?

Regular

Context Free

Context Sensitive

Unrestricted

Nice balance between expressive power and efficiency of parsing.

Page 79: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Chomsky hierarchy

Grammar Valid productions

Unrestricted 𝛼 → 𝛽

Context-Sensitive 𝛼 x γ → 𝛼 𝛽 γ

Context-Free x → 𝛽

Regular x → t x → t z x → 𝜀

Let t range over terminals, x and z over non-terminals and , 𝛽 and γ over sequences of terminals, non-

terminals, and 𝜀.

Page 80: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand

Backus-Naur Form

BNF is a standard ASCII notation for specification of context-free grammars whose terminals are ASCII characters. For example:

<exp> ::= <exp> "+" <exp> | <exp> "-" <exp> | <var> <var> ::= "x" | "y"

The BNF notation can itself be specified in BNF.